Reasonable engineering meets unreasonable demands

The high frequency trading (HFT) industry is a very competitive one: receive some kind of information from the exchange, decide what to do, take some kind of action on the exchange. Many decisions are obvious, leading to lots of people trying to do the same thing with the “fastest” one being the winner. This kind of scenario leads to interesting engineering trade-offs, namely being the fastest some of the time is more valuable than being correct but second-best all of the time. In that sense it’s a bit like Formula 1: it’s more important to occasionally win races than it is to finish every race in the middle of the pack.

Some examples of where this plays out in HFT.

Pretty much every network protocol ensures that messages are delivered accurately by appending a checksum to the end of a message. There’s one for IP (the IP header checksum), another for TCP (TCP checksum) and another for ethernet (Frame check sequence). Historically we’ve gone out of our way to make sure that if messages get delivered, that the information within them hasn’t be damaged in transit. However, to validate the checksum you need to wait until all of the relevant bits of information (i.e., the entire message from the perspective of the protocol) have been received. HFT can’t wait that long. We know that we’re occasionally going to get big packets, and some really important information might be right at the front. Making a decision 1,000 bits earlier on a 10 Gbit/s network link is a 100ns saving. That doesn’t sound like a lot, but modern HFT stacks respond in <500ns. And if the exchange hand-offs are 1 Gbit/s, it’s a 1us saving. Of course, there’s a chance that the network frame or packet has been corrupted and the system will respond inappropriately, but there’s a trade-off we can calculate between the likelihood and the cost of getting it wrong and the value of being first, versus the value of getting it right every time and the cost of never being first.

Similar shenanigans play out in the opposite direction. TCP was beautifully defined to provide guaranteed, in-order delivery of packets. It’s a good engineering approach allowing streams of data to take multiple paths between sender and receiver, and for transparent and efficient recovery of dropped packets. But HFT isn’t reasonable. HFTs know that the most frequent and valuable messages to be sent are going to be orders to buy or sell. With most exchanges, the message format is fairly standard, with only the price, volume and the particular contract (stock, future, etc) changing, but with a bunch of header information about client numbers and the like. Aha! Chop out the constant part of the message and send that before there’s even a reason to send an order (there will probably be a reason soon enough), then send a smaller packet with just price, volume and contract when needed. Instead of sending the full message at the critical time, it will be good enough to just send the minimum details. Being shorter, it will get there sooner. (Generally, exchanges are reasonable systems and will wait for checksums.) So we build a well-engineered system (TCP) to abstract message reordering, and exploit it with unreasonable demands. Exchanges are aware of this behaviour, but they’re outnumbered. It’s not like exchanges haveĀ hostile adversaries – HFT firms aren’t out to break exchanges – but they are largely indifferent to their effect on the exchange. As long as they don’t outright break the exchange, or violate the published rules, HFT firms will use any trick they can think of to outdo each other. By-and-large these tricks aren’t bad for the market, but they must be a headache for the exchange and they increase the entry bar for new participants.

HFT engineering shares much in common with information security. HFT firms are, within a set of rules, trying to work out how a “black box” operates (the exchange), and to find ways to exploit quirks. Most exchanges have some sort of quirk; it’s infeasible to build a complex system without them. Older tricks include

  • being the first (or last) person to connect to the exchange (be the first file descriptor in the select() or epoll() call);
  • order a bunch of connections to the exchange to see which ones are faster (shorter cable; infrastructure not shared with other participants; variation in infrastructure), then cancel the slower ones;
  • if the exchange offers redundant multicast feeds for market data, arbitrate between a bunch of them to always get the fastest; even better if it’s TCP and there’s a round-robin sender underneath it;
  • look at at how the exchange leaks timing information (an increasing delay in data suggests that the exchange is busy, probably because something is happening);
  • keep your connection to the exchange busy (cache effects; priority queue ordering);
  • an order can go down one of multiple sessions with different speeds, so send a copy of the order on each one.

The downsides to this environment include:

  • there’s an incentive to sacrifice some safety for speed, increasing the likelihood that something goes wrong;
  • lots of “noise” gets generated through the exchange making it harder to pick out true signals;
  • impact on exchange performance, thereby increasing infrastructure costs for exchanges to maintain a constant level of responsiveness;
  • wasteful economic activity (these are smart people, surely they could be doing something more useful).

Stepping outside of the HFT realm, where might we elsewhere find similar ‘unreasonable’ engineering? The opening F1 example is surely one, as is the information security case. The competitive / adversarial nature of a domain seems to lend itself to this kind of thinking, but surely there’s a competitive angle to most businesses. Should we expect therefore to find unreasonable thinking in engineering where the engineering provides the competitive dimension to a business?

Something to think more about.

Leave a Reply

Your email address will not be published. Required fields are marked *