Improving the Infrastructure of a Financial Exchange System in the Cloud

Journal for High Schoolers, Journal for High Schoolers 2020


Prabhanjan Nayak, Matei Budiu, Munir Bshara, Ahmad Ghalayini


A typical financial exchange system is composed of three high-level entities: a matching engine, gateways, and traders. Traders directly communicate with the nearest gateway and submit their orders.   Each gateway performs some preprocessing on the incoming traders’ orders and relays them to the matching engine, where they get matched with other orders or are placed in an order queue. Such financial exchange systems need to provide both trading fairness and low latencies. Trading fairness requires that (1) orders submitted from different gateways are processed in a globally FIFO manner at the matching engine, and (2) gateways should keep consistency in data release, and no one gateway can release the market data ahead of the others. We denote (1) by the upstream fairness, and (2) by the downstream fairness.

Fairness is a hard problem in cloud networks because they are not fully predictable, unlike highly bespoke networks currently used by financial exchanges. As a result, latencies between the gateways and the matching engine usually have a large spread. We aim to address fairness in the cloud through (1) having accurate, synchronized clocks throughout the cloud network, and (2) buffers – both in the order upstream and market data downstream paths – in order to maintain a globally FIFO processing order. More specifically, in the order’s upstream path, a resequencing buffer is used at the matching engine to sort orders based on their gateway timestamps before the orders are processed. In the downstream path, hold-and-release buffers at the gateways ensure that all new trade information is sent out at the same time. We add short fixed artificial delays at the buffers to balance out the spread of the network latencies.

We aim to analyze the performance of CloudEx using existing experimental data as well as gather new data by conducting new experiments. Based on our analysis, we aim to improve the infrastructure design of the system.


High-frequency trading (HFT) is a method of trading stocks through computer algorithms to execute a large number of orders in a small amount of time, usually in fractions of seconds. In this kind of trading, speed and delay are important factors to consider. The current order upstream path involves the communication between the market participants to gateways to the matching engine. The order upstream path takes the order from the buyer/seller to the limit order book, which is the list of orders for a given security. The current market data downstream path involves the market trade data from the matching engine to the gateways to the market participants. 

Currently, financial exchange companies, like NASDAQ and NYSE, have matching engines that run on servers inside data centers. This operating model results in scalability problems and fairness issues. The server capacity that these companies must have available at all times depends on the peak trading hours; outside these trading times, resources are wasted. Fairness issues arise in this model as traders tend to operate closer to serves, a practice known as colocation. This practice reduces the time it takes for HFT traders’ computers to receive market data. CloudEx is a system designed to solve the “unfairness” due to uneven latencies in high frequency trading by operating in the cloud. By moving this financial exchange system to the cloud, the scalability is solved as the resources are managed dynamically. The amount of capacity required can be adapted as the need arises. To ensure that fairness is guaranteed to the traders, CloudEx has developed a clock synchronous system in the cloud to generate accurate timestamps. This clock synchronous system is employed to establish the timing requirements for fair trading, which are typically hard to fulfill in the cloud. 

The current exchange model can be “unfair” since gateways with lower latencies to matching engines (which can be due to either physical distance or network quality) can have a significant advantage in HFT. CloudEx solves these problems through the cloud by using the clock synchronous algorithm to create a “time perimeter” at the edge of the exchange but creates two buffers in the normal trade exchange process: the resequencing buffer and the H/R buffer (hold and release buffer). A delay parameter is an artificial latency that would even out the latency between traders. This delay parameter is added at the resequencing buffer for orders and hold-and-release buffers for market data.

CloudEx’s order upstream path (refer to Figure 1) involves communication between market participants to the gateway to the resequencing buffer to the matching engine. This additional resequencing buffer’s function is to eliminate the “unfairness” in the trading process. This can be accomplished by having market orders wait in the resequencing buffer until the current time exceeds the order’s gateway timestamp plus a predefined delay, in order for orders to have time to be re-ordered based on the time they arrived at their gateways. An order which finishes being processed before another order timestamped earlier is considered out-of-sequence and is detrimental to fairness. To create a fair and rapid network, out-of-sequence orders and the predefined delay (which is detrimental to total latency) should both be minimized. 

CloudEx’s market data downstream path (refer to Figure 1) involves the market trade data from the matching engine to the hold/release buffer to the market participants. Similar in purpose as the resequencing buffer, the hold/release buffer is to ensure “fair” access to market data for all traders. In the downstream path part of CloudEx, all new trade information waits in the hold and release buffer until a predefined time, at which it is released by all the gateways simultaneously. Waiting for too long in the hold and release buffer is suboptimal for a speedy network, and waiting for too little results in new information reaching gateways after it was supposed to be released, which is unfair.

Figure 1 – the System Design of CloudEx

Understanding Unfairness in CloudEx

Experiment Setup


To improve upon the Infrastructure of a Financial Exchange System in the Cloud, we analyzed experiments utilizing:

  • 24 Traders
  • 8 Gateways
  • 1 Matching Engine Virtual Machine

These experiments were conducted on the Google Cloud. Each trader was a computer programmed to send 2,000 random orders. The Matching Engine only processed 1 outstanding order per trader. The orders were processed one after the other, with no delay between successive orders. 

Measured Values

The variable we decided to alter in the experiment was the delay parameter value, specifically at the resequencing buffer for orders and the hold-and-release buffers for market data. This delay parameter was changed by increments of 250 μs starting at 0 μs and ending at 1000 μs. The control group for this experiment, denoted as FIFO, had no resequencing at all in the matching engine. The data for each order we used from the upstream data in the experiment was:

  • Gateway Num: A unique ID number for the gateway
  • Gateway Timestamp: A timestamp added to an order as it arrives at a gateway
  • Enqueue Timestamp: A timestamp added to an order as it enters the resequencing buffer
  • Dequeue Timestamp: A timestamp added to an order as it leaves the resequencing buffer

The data for each piece of market data we used from the downstream analysis in the experiments was:

  • Timestamps for each of the 8 gateways dictating when they received market data and when they were scheduled to release it


Both the upstream and downstream data were affected by the one parameter which was modified throughout the experiments – the delay parameter.


The data outputted by the experiment was formatted as 2 CSV (Comma Separated Value) files, one for upstream data and one for downstream data. These CSV files were parsed, simplified into readable rows, and converted to a data frame in the coding language Python. The upstream data had a row for each order processed – 2,000 orders per trader for 24 traders, meaning 48000 rows total – and had 28 columns. The downstream data had a row for each new piece of market data. Each experiment’s downstream data had 17 columns and differing numbers of rows. This data was used to see how the delay parameter affected performance, through several metrics.



Orders are out-of-sequence when processed in a different order than that in which they arrived at the gateways. The average resequencing latency is the average time orders spend in the resequencing buffer. Optimally, both out-of-sequence orders and average resequencing latency should be low.


Late market data is released from a gateway later than it is scheduled to be released. Market data is unfair if it is not released by all gateways at the same time. The average latency is the average time orders spend in the hold-and-release buffer. Optimally, both late market data and average latency should be low.



                  Figure 2 – Mean Latency Value with no Delay                            Figure 3 – Mean Latency Value with Delay

To analyze the upstream fairness, we looked at two specific latencies: the time taken for the order to reach the resequencing buffer (the one-way delay) and the time taken for the orders to get resequenced in the buffer (the resequencing delay). We analyzed the mean latencies across the experiment gateways in the VM and across experiments with different delay parameters. The experiment shown in Figure 2, denoted by FIFO, has no added delay parameter, giving an unfair advantage to gateways with a lower one-way delay (in blue) in that their orders are processed faster. For instance, the orders from Gateway 7 will be processed fastest because both Gateway 7’s latencies are low – the overall time for orders to transit from Gateway 7 to the matching engine would be faster than for Gateway 2. The scenario would result in an “unfair” trade exchange as Gateway 7 orders would be processed faster. The sum of the latencies, the green bars, is uneven across gateways for the FIFO case which is an indication of “unfair” trade conditions. 

However, with a delay parameter of 500 microseconds, the latencies have an inverse relationship (refer to Figure 3). This relationship occurs because the orders arriving later would wait less in the buffer than those arriving earlier, in order to ensure that all orders are processed in the correct sequence. The added delay parameter makes sure that all of the gateway orders wait at least a certain amount of time between their arrival at gateway and their get processing in the matching engine. (Generally speaking, the two latencies tend to an inverse relationship more with larger delay parameters.)

Figure 4 – Average resequencing latency and percentage out-of-sequence orders for various delay parameters

The two metrics we used to judge upstream fairness were the percent of out-of-sequence orders and the average resequencing latency. Orders are out-of-sequence when processed in a different order than that in which they arrived at the gateways. The average resequencing latency is the average time orders spend in the resequencing buffer. Figure 4 shows these two metrics for several delay parameters. There is an inverse relationship between resequencing latency and out-of-sequence orders: the longer orders wait at the buffer, the more the buffer can guarantee that orders get processed in the correct order. Optimally, both resequencing latency and out-of-sequence orders should be minimal. Increasing the delay parameter past 250μs shows diminishing decreases in out-of-sequence orders, so in this case, an optimal delay parameter would likely be between 250μs and 500μs.


                       Figure 5 – Percent Late Data with Delay                                              Figure 6 – Avg. Latency with Delay

To analyze the downstream fairness, we looked at late market data. This term refers to the percentage of data released later than it was supposed to be released at the gateway. Without the hold and release buffer, the trades from the matching engine would be arriving at the gateways at different times depending on the transit time between each stage of the exchange infrastructure. Figure 5 shows this metric across different experiments. As the delay parameter increases, the percentage of late data decreases. Specifically, around the 500-microsecond delay point, the percentage of late market data is minuscule. This happens because the delay parameter over-shoots all of the gateway latencies. In contrast, too much added delay can lead to an increase in average latency. Once again, the two metrics have an inverse relationship. This relationship occurs because the trades arriving later should have to wait least in the hold and release buffer to ensure all trades are released at the same time. The added delay parameter makes sure that all of the gateway orders wait until a certain time before being released to the traders. The optimal delay parameter should minimize both average latencies and percent late data. To ensure “fairness” in the exchange, the percentage of late market data should be close to 0. 

Processing and Visualizing CloudEx Data


The CloudEx data used here was logged from competition with 13 teams. The competitors were interns from the Stanford STEM to SHTEM Program who volunteered to participate. Each team was given one trader client. There was also a market maker trader, who conducted large amounts of trades to regulate stock prices. The competition exchange VM generated BigTable data to store the competition data, which had a complete timestamped history of all orders, trades, and information about them. Using this BigTable data, we could create graphs of share prices, trader net worths, and breakdowns of trader net worth over time. This data can also be used in real-time to create interactive dashboards with live data streams and to assess different trading algorithms.

A live demo of the dashboard can be found in the respective folder for this article.


This is an example of the graphs that a live dashboard we created can display.

 This graph displays a candlestick chart of a stock’s price over time, with minimum, maximum, opening, and closing prices for every interval. This can be updated live while running a simulation. 

This graph shows the net worth of traders over time in a simulation. Each line is a different trader’s balance.

This graph shows the breakdown of one trader’s net worth over time. Each line represents how much money the trader has invested in a certain stock at that time, and there is also a line representing cash.

This graph is similar to the previous one, showing the breakdown of another trader’s net worth over time. Here we can see that the trader sold lots of AF shares around time 21:54, resulting in a large loss.

Analyzing these graphs of the competition as a whole, we can note several observations. The market maker, for most stocks, makes the share prices follow sine curves with some anomalies. At some points (notably, around 21:52 for stocks AF and AJ), the share prices drastically drop due to low limit orders. This led some traders to end up with a lower net worth in the end. 

Graphs from this live dashboard could be used to understand trading behaviors and to compare different trading algorithms. The dashboard could also be deployed in real-time with a live data stream.


We were able to analyze the fairness problem in the cloud exchange. The latencies between the gateways and the matching engine usually have a large spread. We addressed the upstream fairness through the analysis of the out of sequence orders and the total resequencing buffer latency. In the downstream path, the added delay parameters at the hold and release buffer accounted for the percentage of late market data, without compromising the total latency. The optimal delay parameter was between 250 to 500 microseconds since it minimized the upstream and downstream latencies. The overall visualization of the data proved to be effective at representing the overall performance of the traders. 

Future Work

In the future, we would like to run our own experiments with different delay parameters and various different numbers of gateways. Additionally, we started working on an algorithm to simulate an experiment with a different delay than the source data, and we aim to finish that. Moreover, we would like to see how much the cost of these latencies can be improved without compromising the fairness of the infrastructure. We would like to implement the dashboard with live data to analyze trading algorithms in real time. Additionally, we would like to experiment with deploying gateways in different regions of the world.


We would like to acknowledge Ahmad Ghalayini and the CloudEx team for helping accomplish our research into the infrastructure of the exchange. They were instrumental in the developing stages of understanding the CloudEx design system, as well as in providing guidance and resources for our accomplishments.  


[1] Ghalayini, A., Prabhakar, B., & Sachidananda, V. (2020, June 15). CloudEx SHTEM Lectures. Lecture presented at STEM to SHTEM Program.

[2] Hjetland, G. (2020, August 2). Ajax loaded data, clickable points (Version 2.1)

[3] Google, Big Table (source code),

Appendix: Extra Graphs

Upstream Latency Histograms (for one gateway)

Downstream Latency Histograms

Trader Profits Over Time

This metric is calculated when a trader sells a share, and is the difference between the price at which those shares were sold and that at which they were bought.

Net Worth of Traders Over Time

Broken down for each trader

Leave a Reply