Drop Copy at Spot Trading LLC – a year later
Last year, I discussed what Drop Copy is, some of the challenges we had building it, why and how we use it at Spot Trading, and some of the unresolved issues and challenges we experienced. In the following post below, I will explain how we overcame some of those issues, the new features we added, and our future plans on how to continue to improve our Drop Copy system.
Real time initiative
In the past, Drop Copy Service committed the Drop Copies to the database. Trade Diff Engine would go to the database and retrieve all the drop copy trades at timed intervals, and do the same with our internal trades, which were also retrieved from a database at timed intervals. These two sources were then compared, and any outlying trades were considered “diffs.” That worked fine initially, but then we saw that we were experiencing slowness whenever our number of fills was high – usually at market open and close. The reason for this was that, during the course of the trading day, the database tables got very large; therefore, matching consumed a lot of time, and Trade Diff Engine didn’t refresh as fast as it should have. While I thought this had been fixed last September, the Trade Diff Engine matching logic was still based off the database tables even though we were publishing the messages from both Drop Copy and Position Services.
Now, we publish Drop Copy updates as soon as we receive them to the Trade Diff Engine and match them up in real-time with the executions that are being published real-time from our Position Service, removing any communications with the database. This is a huge improvement over the previous logic; everything is way more responsive, even just before the market close (when things are usually busy). This design change reduced our latency from minutes down to seconds, making our system as real-time as possible.
Exchanges use a different protocol from FIX
For exchanges that use FIX Protocol to communicate information about our trades, we use a third party tool called OnixS. This provides out-of-the box functionality which: manages the logins to the exchanges, establishes heart beating with the exchange, uses exchange message parsing logic, recovers gracefully if we need to reconnect intraday to an exchange for any reason, and provides all the trade related data from the exchanges in a format that we can use.
However, not all exchanges send their Drop Copy information using FIX. BOX and PHLX, for example, use a proprietary binary protocol, and since OnixS does not support Binary at this time, we had to develop from scratch all the functionality mentioned above that OnixS provides. Unlike FIX, which uses many standardized fields across the industry, Binary does not have such standardization. Thus we developed generic session handlers that are customizable by exchange; for example, the parsers for BOX and PHLX are different because they use different fields, so we had to customize our development to handle these differences. A benefit of these efforts is if we ever had to connect to another exchange with binary drop copy, it should be relatively easy because all we would need to do would be to parse all the different fields and then put it in a format that Drop Copy understands. Then Drop Copy would commit it to the database and publish it out to the Trade Diff Engine.
In order for any exchange to know that we are ready and able to receive drop copies, we first have to establish a TCP/IP socket connection between Drop Copy Service and the exchange. This is another function that OnixS provides out-of-the-box for FIX, so for BOX and PHLX we had to create our own socket with a TCP handler to open a socket, and keep it open in case the heartbeat were missing, and reestablish that socket connection.
After the connection to the BOX or PHLX exchange is made, we login with a password. Once we have successfully logged in, the exchange begins heartbeating with us. Without heartbeats, the exchange will not send us any messages back. We built our own heartbeat handler that sends out heartbeats to the exchanges at 30 second intervals. If we detect that heartbeats are missing, Drop Copy Service would automatically reissue the login and would try to restart the heartbeat sequencing.
Once the exchange and Drop Copy Service are heartbeating with each other, we try to synchronize the sequence numbers. The first heartbeat we send tells the exchange the sequence ID that we have, then the exchange sends a heartbeat back to us letting them know the sequence ID they have. If both sequence IDs match, then we are logged in and ready for the day. If the sequence IDs do not match, and our sequence ID is lower than that of the exchange, then the exchange issues a replay to us of any messages that we’ve missed since the time we were down. If our sequence ID is higher than that of the exchange, the login will fail and we would need to manually update the sequence ID and make it match that of the exchange.
PHLX replays the entire day
Once we are logged in, if we were to subsequently disconnect from an exchange (for example, from a network outage or a Drop Copy Service crash), we’d first need to reestablish that socket connection. Once the connection is reestablished, similar to the first login of the day, we would send to the exchange the highest numbered sequence ID plus the login information (we store these sequence IDs in a file in our database). Additionally, we would make a request to them to send any messages that we’ve missed from that point. However, unlike other exchanges, PHLX doesn’t simply replay the gap of messages that was missed; instead, it replays all messages from the beginning of the day, so for PHLX we had to implement logic to filter out all of messages in the replay that we had already successfully received. This filtering logic is written in C++. We keep the sequence IDs in the cache of the Drop Copy Service, so when PHLX sends all the messages for the day, we do a comparison, and import into our system the messages we don’t have, and drop the ones we had already received earlier.
Crashing/Not connecting initially
We had a few issues with OnixS crashing after we had connected to some of the Drop Copy sessions in the morning. We upgraded to the latest version of OnixS (version 18.104.22.168) and that did the trick – it no longer crashes.
ISE hexadecimal to decimal
The exchange order ID that we receive from the ISE is in hexadecimal, yet our system uses decimal numbers. So we use some C++ functions to convert the exchange order ID from a hexadecimal number into a decimal number. This conversion makes the matching possible, and all we have to do is compare the two order IDs, and if they match, then compare the drop copy messages against the messages in our system.
ISE and ISE Gemini use the same drop copy session
When the Gemini exchange came on line, our choice was to either get a separate line for this exchange, or to share a session between the ISE and Gemini. Since we had an existing adapter for the ISE, we called the ISE and they said that we could funnel all the drop copy messages for both exchanges (ISE and Gemini) together on the same drop copy session. This kept it simple for us, because we could use the existing ISE adapter, and only had to modify it slightly because Gemini uses a few fields differently. The stream that comes through Drop Copy Service is the same. The disadvantage of our current setup is that if this line goes down, we will lose our drop copy connection to both of the ISE and Gemini exchanges. Advantages were that it was easy to modify our current adapter to accommodate the Gemini, and since each drop copy session has a cost, it’s cheaper for us each month to operate.
Clearing needs to see all our electronic trades
We need to ensure that our clearing firms get drops of every trade we make electronically. From a business standpoint, this is important so that the clearing firms can help manage our risk; from an operational standpoint, this matters so that the Drop Copy Diffs are accurate. If clearing doesn’t get drop copies of our trades, we will have Drop Copy clearing diffs. We actually experienced this situation: one of our clearing firms was not receiving drop copies for one of our user IDs, so we chronically had Drop Copy clearing diffs. It took us a lot of time and effort to figure out why our exchange diffs were all ‘clean,’ but we still had clearing diffs.
There are still two remaining issues with the Drop Copy Clearing diffs. The first type is due to the fact we are getting duplicate messages from the drop copy line from one of our clearing firms. The duplicate message (blue) is coming in around 3 minutes later than the actual trade. These duplicate messages cause diffs to appear in our Diff GUI because we did the trade once, but our system thinks clearing knows an extra trade. So we have to manually ‘ignore’ the trade from the diff GUI. We are still working the clearing firms to see why we are getting two notifications of the same trade.
ARCA-DropCopyIN 20150817-14:58:08.895744756 8=FIX.4.2 9=310 35=8 129=AAAA 128=BBBB 34=2047 49=ARCA 56=CCCC 52=20150817-14:58:08 55=DDDD 37=3377716900468059 11=224 17=3377716900399886 20=0 39=2 150=2 54=2 38=20 40=2 44=4.2 59=0 31=4.2 32=20 151=0 14=20 6=4.200000 60=20150817-14:58:08 58=Fill 9730=R 77=C 439=501 167=OPT 30=PO 1=47TY 201=0 200=201508 202=60 205=21 10=212Clearing-Drop CopyIN 20150817-14:58:08.915308344 8=FIX.4.2 9=380 35=8 49=EEEE 56=CCCC 57=u782701 34=37605357 52=20150817-14:58:08 37=OSB00197 11=25475 76=WAVE 17=186113ER 20=0 150=2 39=2 1=47TY1209 55=DDDD 167=OPT 200=201508 205=21 201=0 202=60.0000 231=100.000000 106=23 54=2 38=20 44=4.200000 32=20 31=4.200000 30=P 151=0 14=20 6=4.2000 75=20150817 60=20150817-14:58:08 58=Fill 439=GVX 311=FFFF 8028=3377716900399886 8029=Doneaway 8046=D73 10=113IN 20150817-15:01:39.423796571 8=FIX.4.2 9=360 35=EEEE 49=REDI 56=CCCC 57=u736607 129=AAAA 34=37605654 52=20150817-15:01:39 37=OHK00491 11=25969 17=190473ER 20=0 150=2 39=2 1=47TY 55=DDDD 167=OPT 200=201508 205=21 201=0 202=60.0000 231=100.000000 106=23 54=2 38=20 44=4.200000 32=20 31=4.200000 30=P 151=0 14=20 6=4.2000 75=20150817 60=20150817-14:58:08 439=GNI 8023=N 311=FFFF 8028=2OW1690 8029=RTC 8046=PU 10=096
The second type is with VIX futures. These show up as Drop Copy Clearing Diffs because the trade hits our system, but the drop copies from clearing are delayed (there is no delay on the drop copy from the exchange). This is a known issue with reporting these types of fills. The drop copy messages get batched up and then sent over; once this happens, the VIX futures diffs get matched and are no longer diffs. There isn’t much we can do with these, other than to alert if they are still a diff after a certain period of time.
Functionality to disable automated trading systems
Our automated trading systems are able to make hundreds of executions per minute, so it’s imperative that our trading and risk systems reflect accurately, and in a timely manner, each and every trade we make. Our Trading and Risk Management teams depend on having an accurate picture of our risk. If something goes awry, and we are slow to act, our risk values could be off and lead to a large financial loss. An excessive amount of Drop Copy diffs will let us know something is wrong with either our internal system or that of an exchange, and we realized we could use this information as a trigger to disable our automated trading systems. As our CIO has said, “We’d rather not be trading when we want to be trading, as opposed to trading when we don’t want to be trading.”
To this end, on the Trade Diff Engine home page, next to each of the exchanges, there is now a ‘Shutdown’ button. These buttons are grayed out initially, and unable to be clicked upon when gray. If the number of diffs (of the type where the exchange knows the fills, but we don’t have them in our database) exceeds a configurable threshold – currently 20 – the button for that exchange becomes active, and turns red (it also sends a pagerduty). Clicking on the red button will generate a pop-up window asking you to confirm that you do indeed want to shut down the algorithm that is generating those orders. If ‘yes’ is clicked, that will trigger a message to be sent from Drop Copy Service to the algorithm to shut down or disable the algorithm. If ‘no’ is clicked, the pop-up goes away and the button is still red. If the number of diffs subsequently falls below the diff threshold for that exchange, the button will revert to the grayed out, ‘inactive’ state. This functionality was written in Python and web development.
Fortunately, we have not yet had an instance during trading hours where any button turned red. However, if one or more were to turn red, our Trade Support team would take notice, disable our automated systems from trading on that exchange, and notify the Trading team.
Automate the disable feature
As stated in the section above, if the number of diffs exceeds the threshold for an exchange, the ‘shutdown’ button will turn red, after which our Support team needs to notice it’s red, and then after a few mouse clicks, manually disable trading. However, this isn’t optimal, as it’s not as fast as it should be. Because we make hundreds of trades per minute, we need to remove the human component from the disabling process, so that the algorithms get shut down immediately after a diff threshold has been breached.
Add a PagerDuty alert
We need a pager duty to be sent to provide visibility that a threshold breach took place, and as a trigger to validate that the algorithms have indeed been disabled.
Ensure all messages have been received by Trade Diff Engine
There are a few things we can do to make the Drop Copy and Trade Diff Engine system a little bit more robust. We should send Spot generated sequence IDs with each drop copy message to Trade Diff Engine so that Trade Diff Engine can detect if there are any missing sequence ID gaps. Having this implemented not only ensures that this system was working correctly, but would also help it to recover on its own from any network glitches. For example, if Drop Copy Service sent messages with sequence IDs 5, 6 and 7 for some drop copy packets, but Trade Diff Engine received only 6 and 7, but never a 5, Trade Diff Engine should be able to detect any gaps, realize 5 was missing, and then send a request to Drop Copy Service stating it hadn’t received sequence ID 5, and to resend that message. A way to do this would be to create a cache in Drop Copy Service, and have in this cache all the sequence numbers paired with the associated drop copy insert ID. Once a replay request message was received, Drop Copy would use the insert ID, go back to the database, and retrieve all the information associated with the insert ID, then parse it into a drop copy proto and play it back into the Trade Diff Engine.
Drop Copy needs to be ‘out of band’
It’s important to ensure that the drop copy system is as ‘out of band’ as possible; ideally, it would be totally independent from our Production environment. We have had instances where we lost a network component and both the Production fills and the Drop Copy fills from an options exchange were disconnected, the result being that there were no diffs for that exchange in our Trade Diff Engine GUI. Fortunately, our Drop Copy session with Clearing was on an independent path with independent hardware, so there were diffs between what Clearing knew and what we knew, so we were able to determine that something was wrong. So we need to review our architecture to ensure each of our drop copy sessions is independent from the Production system.
Remove ‘fat finger’ risk
Currently the ‘ignore’ button (which tells the Trade Diff Engine to not display those diffs) is in very close proximity to the ‘import’ button on the GUI. Clicking the wrong button would cause issues and add those trades to our database. It hasn’t happened yet, but currently it is very easy to add unwanted duplicates instead of ignoring the diffs.
Migrate to ‘open source’
Open source our Drop Copy System so that other firms and the Financial industry might benefit from this. We would need to export our code to GitHub. Unless others wanted to use OnixS, our binary adapter would need to be modified to integrate with the exchanges that use FIX protocol. In addition, some of the code is currently very specific to Spot, such as translating from symbol to spot instrument id, saving to our very specific database tables, etc. We would need to make that more generic and/or create interfaces that other firms could write to which would allow them to leverage our code base.
Thanks to Jens Kraus for his expert explanation of the latest improvements we’ve made this year.