Comparing Actual andSimulatedHFTTraders’ Behavior for Agent Design

Recently financial markets have shown significant risks and levels of volatility. Understanding the sources of these risks require simulation models capable of representing adequately the real mechanisms of markets. In this paper, we compared data of the high-frequency-tradermarket-making (HFT-MM) strategy from both the real financialmarket andour simulation. Regarding the former,weextracted trader clusters and identified one cluster whose statistical indexes indicated HFT-MM features. We then analyzed the di erence between these traders’ orders and themarket price. In our simulation, we built an artificial market model with a continuous double auction system, stylized trader agents, and HFT-MM trader agents based on prior research. As an experiment, we compared the distribution of the order placements of HFT-MM traders in the real and simulated financial data. We found that the order placement distribution near the market or best price in both the real data and the simulations were similar. However, the orders far from the market or best price di ered significantly when the real data exhibited a wider range of orders. This indicates that in order to build more realistic simulation of financial markets, integrating fine-grained data is essential.


Introduction
. The current financial market involves considerable uncertainty due to the structural complexity of finance and the globalised world. The -financial crisis is one example. The beginning of the crisis was subprime mortgages. However, the fails in subprime mortgages spread widely, and it also a ected stock markets. More recently, crashes occurred in the financial market, such as the so-called flash crash. The most famous flash crash was the Flash Crash, in which stock indices, such as the S&P and Dow Jones Industrial Average, rapidly fell on May , . Usually, flash crashes rapidly return to normal patterns. However, sometimes their behavior can cause significant financial disturbances with implications on markets. Moreover, even if market prices a er flash crashes are back to normal, these disturbances should be avoided. Since these incidents can lead to financial crises or disruptions, they must be prevented and predicted. .
One promising approach is a real-data based approach. This approach is useful for analyzing past incidents but cannot predict future incidents that have never occurred before.
. The other promising approach is a numerical-simulation based approach. Especially, agent-based simulation is useful for this kind of research (Edmonds & Bruce ). Agent-based simulations aim to imitate the real world by making an imaginary world with agents on computers. Simulations are beneficial because they enable the exploration of hypothetical situations or the prediction of phenomena under certain conditions, such as a new regulation. Especially in financial markets, the importance of agent-based simulation was argued by many scholars (Farmer & Foley ; Battiston et al. ). For example, Lux & Marchesi ( ) showed that interaction between agents in financial market simulations is necessary to replicate stylized facts in financial markets. Moreover, some simulation models were made based on empirical researches or mathematically validated (Avellaneda & Stoikov ; Nagumo et al. ). These models can help us to interpret empirical findings of complex systems.
. However, these two approaches, i.e., real-data based approach and simulation, are distinct and not su iciently integrated. Some simulation studies tried to use real data for agent-based simulation. Sajjad et al. ( ) built a demographic movement simulation by using real data. On the other hand, Nonaka et al. ( ) also built an evacuation simulation and used measured data in real evacuation drills. However, in these simulations, real data are used just for setting initial conditions and/or fitting parameters in agent models. In another study targeting financial markets, Braun-Munzinger et al. ( ) built a multi-agent simulation for a bound market. Also, in this study, real data in a bound market was used only for fitting model parameters. .
So, in simulations, real data should be used more successfully. If we can do so, simulations can be used with more confidence, and data will o er more than the recorded past. Specifically, financial markets have massive amounts of real data known as tick data. On the other hand, as financial market simulations, computerized virtual markets called artificial markets are present. .
In this paper, the real data and simulation results in a financial market are compared as a first step toward an integrated approach. We aimed to find what is missing in the current financial market simulation. In this comparison, we focused on the high-frequency-trader market-making (HFT-MM) strategy. One reason is that HFT agents are influential in the real financial market (Hosaka ). The other reason is that the HFT-MM strategy has been well modeled in simulations from previous studies (Avellaneda & Stoikov ).
. In this paper, first, we quantified the empirical distribution of relative order frequencies of HFT-MM in the real data. Then, we tested whether a simulation model can regrow the same pattern. As a result, we showed that the simulation model partially made the same pattern as the real data. However, there are still di erences, which we discussed.
. This paper is organized as follows: Section . -. explains the HFT-MM strategy; Section . -. describes previous research; Section . -. shows the data-mining approach used in the behavioral analysis of HFT-MM in the Tokyo Stock Exchange (TSE). Section . -. shows the model and simulation results for HFT-MM in the artificial market we built. In section . -. , we compare the results from our data-mining and simulation analyses. Finally, in section . -. , we discuss the results and conclude with section . .

What is the HFT-MM Strategy?
. In the HFT-MM strategy, traders make many frequent limit orders on the millisecond time scale, made possible through ongoing improvements in information and communication technologies. An MM strategy is specific to limit orders and makes profits by placing buy and sell orders on the order books simultaneously.
. Figure shows an example of order books in a continuous double auction. In both the sell and buy order books, there are some orders. Every trader can put their sell or buy orders at any price. (Actually, some regulations on the price range are applied.) In addition, there are two types of orders. One is a "limit order," or an order with a price. The other is a "market order," or an order without any price, which means that the traders want to buy or sell shares at any price. Market orders are always executed immediately with the best-priced opposite orders. Usually, limit orders are placed and appear on either the buy or sell order book, hence the term "making" orders. Market orders, conversely, do not appear in the order books and make opposite orders disappear, hence the term "taking" orders.
. An MM strategy typically uses only making orders, i.e., limit orders, because it aims to realize the profit between the best sell and buy prices and must wait for better conditions by placing orders at a price near the best price. In the case of Figure , the MM strategy places sell orders at $ and buy orders at $ . A er both orders have been executed, the traders receive $ in profit, i.e., the same as the spread.

.
However, the MM strategy is always exposed to the risk of price changes. If MM traders have some shares during the MM operation and the price drops dramatically, the share value would also drop and the loss would be greater than the profit the traders can make via the MM strategy because the spread, i.e., the profit obtained by the MM strategy, is typically minuscule compared with the price changes. Thus, as a hedge against risk, the MM strategy must pay close attention to price changes. .
Adopting the HFT-MM strategy is one risk hedge option. If the MM strategy is applied on a smaller time scale, the risk of price changes will be less since the price changes in short time periods, i.e., milliseconds, are likely to be very small. is the best sell price in the sell order book (le ) and $ is the best buy price in the buy order book (right). Under these conditions, the spread is 1002 − 999 = 3. Simply, the MM strategy can make profits by placing sell orders at $ and buy orders at $ .

Previous Work
. Here, we review previous studies on financial market data analysis, the HFT-MM strategy, and multi-agent financial market simulations.

Previous work in financial market data analysis (empirical studies)
.
In Japan, the Japan Exchange Group provides tick data, which includes all the order data of the stock exchanges in Japan (Japan Exchange Group ). These order data called "Flex Full" data, are detailed and serve several purposes. For example, Miyazaki et al. ( ) proposed the use of a Gaussian mixture model and Flex Full data for the detection of illegal orders and trades in the financial market. Tashiro & Izumi ( ) proposed a shortterm price prediction model using neural networks. In their work, the authors processed Flex Full order data on a millisecond time scale with a recurrent neural network known as long short-term memory (Hochreiter & Schmidhuber ). Further, the author recently extended this method (Tashiro et al. ) using a convolution neural network (Krizhevsky et al. ). Nanex ( ) mined and reported some distinguishing ordering patterns from order data. Cont ( ) obtained stylized facts regarding the real financial market with statistical analytics such as volatility clustering.

Previous work on the HFT-MM strategy (empirical studies)
.
In the HFT analysis, the Flex Full data usage is insu icient because there is no information on who placed the orders. Thus, some studies have used detailed order data that includes anonymized trading server information, which is called "order-book reproduction data." Hosaka ( ) performed HFT analysis with such data and showed that many HFTs in the TSE market are making orders, thus enabling market liquidity. Uno et al. ( ) proposed an analysis method with a clustering algorithm for traders on the data and demonstrated the orders of traders who employ a distinctive strategy such as HFT-MM could be identified with said method.

Previous work on the HFT-MM strategy (empirical & simulation studies)
.
Other HFT studies include some MM modeling strategies that are solved with equations. Avellaneda & Stoikov ( ) built and demonstrated the performance of an equation model for an MM strategy and its simulation results. The authors modeled the MM strategy, solved the equation model as an optimization problem, and derived equations for calculating the optimized prices of orders.
Previous work in multi-agent financial market simulation (simulation studies) .
One promising approach for financial market simulation is multi-agent simulation, in which agents are constructed by imitating traders in the real financial market. This approach is called artificial market simulation. Mizuta ( ) has demonstrated that a multi-agent simulation for the financial market can contribute to the implementation of rules and regulations of actual financial markets. Torii et al. ( ) used this approach to reveal how the flow of a price shock is transferred to other stocks. Their study was based on (Chiarella & Iori ), which presented stylized trader models including only fundamental, chartist, and noise factors. Mizuta et al. ( ) tested the e ect of tick size, i.e., the price unit for orders, which led to a discussion of tick-size devaluation in TSEs. Hirano et al. ( ) assessed the e ect of the regulation of the capital adequacy ratio (CAR), such as the Basel regulatory framework, and observed the risk of market price shock and depression because of the CAR regulation. As a platform for artificial market simulation, Torii et al. ( ) proposed the platform "Plham". In this study, we partially used the updated "PlhamJ" platform (Torii et al. ). .
These previous works utilizing actual data only focused on either finding some empirical features or fitting parameters of simulation models in financial markets. So, these studies missed that the comparison between the real financial market and simulated financial markets. We argue that this comparative experiment is essential forward building an adequate artificial market simulation. Moreover, in our opinions, the simulation needs the usage of the real data beyond the parameter fitting of models. Thus, in this paper, we show the advanced data usage for model building in financial market simulations.

Data Mining for Analysis of the Behavior of the HFT-MM Strategy in the Tokyo Stock Exchange
. Here we explain our data-mining approach and present the analysis results of the behavior of the HFT-MM strategy. In our study, we used order-book reproduction data. However, the data are trivial in nature; therefore, further detail is provided. We used a clustering method based on (Uno et al. ) and clustered the traders identified in the order-book reproduction data. Subsequently, we selected an HFT-MM cluster and analyzed the ordering behavior by plotting the price divergences of the orders placed from the market price.

Data used in the data mining .
For our data mining, we used the order-book reproduction data provided by the Japan Exchange Group , which contains more detailed order data than the publicly available Flex Full data. Specifically, this data includes anonymous trading server information. The servers are called "virtual servers (VS)" and are connected to the order processing system in the TSE. Every order is sent to the system via any VS, and traders (or rather, stockbrokers) access this virtual device for ordering. Thus, this logical server is a type of gateway to the ordering system, and the order-book reproduction data contains the anonymous but traceable information of this VS. However, every VS has its own ordering limitation per second, and traders use multiple VS for the redundancy. To identify which orders are from which traders, we merged some VS in the following manner.  Table : Virtual server (VS) merging example. L , L , and L are continuous records for the same orders. So, VS-and VS-are identified as being used by the same traders. VS-and VS-are also the same.
. Table shows an example from the order-book reproduction data. Notations L to L indicate the line numbers for reference. L , L , and L are continuous records from the same order, order-. Thus, these records have come from the same trader. This means that VS-and VS-are used by the same trader. L and L are coming from the same trader, so VS-and VS-are also used by the same trader. This type of data preprocessing, which we call "VS merging" is necessary to correctly analyze the data. It is because, without VS merging, we cannot follow the continuous orders of each trader.
. For our analysis, we used data from all the business days in August . In this period, million order records are contained. Before performing VS merging, there were a total of , VS; a er VS merging, there were , VS. ( VS were not merged.) With this merged data, we performed clustering and some analysis.
Clustering & order distribution analysis method .
Our data analysis has two parts, the first of which is clustering (Uno et al. ). A er identifying an HFT-MM traders cluster, we analyzed the ordering price distribution. By analyzing the order placement distribution, we can roughly analyze the HFT-MM behavior because HFT-MM usually employs a recognizable trading strategy, whereby orders are placed only near the best price. In the following, we provide details regarding clustering and our order distribution analysis.
To perform clustering, we employed some indexes based on (Uno et al. ). The indexes for each trader on each business day are as follows: • Actions (new orders, order changes, order cancels) per one ticker (one stock kind): • Absolute ratio of inventory. Actually, we employed the median of all ticker trader trades. (Vol. means volume.) • Ratio of canceled share volume to total ordered share volume: .
. First, we filtered only HFT traders by (ActionPerTicker) ≥ 100. Next, we normalized and clustered the indexes into clusters, using Ward's method with the Euclidean distance. These settings are the same as those in (Uno et al. ). Finally, we identified a cluster whose indexes indicated HFT-MM features, i.e., high-frequency ordering, high cancel ratios, low inventory, and using many VS.  Using traders identified in the HFT-MM cluster, we plotted the di erence between the order and market prices.
In the plot, we used tick size (the minimum price unit for orders) as the price unit; the number of ticks indicated the distance between the traders' orders and the market price. For example, in Figure , tick size is $ . Thus, if the market price was $ , sell orders at $ would be converted to ticks because $ sell orders are ticks better than $ . Conversely, buy orders with $ would be converted to −1 tick because for buyers, $ buy orders are tick higher (worse) than $ .

Results of clustering & order distribution analysis
. Table show Table . Among these clusters, we identified and selected one we assumed to be an HFT-MM cluster. Significant features in HFT-MM are a high cancel ratio and low inventory ratio. Clusters , , , , and have high cancel ratios; clusters and have low inventory ratios. Thus, as an HFT-MM cluster, we selected cluster .

Figure and
.
Next, we plotted the order placement distribution. Figure shows our order price distribution analysis. This plot combines all the data from business days and all the ticker data from traders in the HFT-MM cluster. The results here clearly show that many orders from HFT-MM traders (traders who were considered to be HFT-MM traders based on clustering) are placed near the market price. This allows for two conclusions. First, there are traders with HFT-MM features because the strategy of making several orders near the best or market price is typical of HFT-MM. Second, clustering analysis correctly selected HFT-MM traders. During clustering, we simply selected a cluster based on having checked only key indexes. However, the results indicate that the selected cluster was actually an HFT-MM cluster. .
For a more accurate analysis, the analysis should be based not on ticks from the market price, but by ticks from the best price. However, the latter approach cannot be currently implemented owing to data characteristics and technical problems. That is, order-book reproduction data doesn't have the information on the best prices, so we have to match them to the other order book databases. This process needs quite big computational resources.

Multi-Agent Simulation for HFT-MM Strategy in Artificial Market
. In this section, we present our model for simulating artificial markets and its results. The importance of using numerical simulations, as mentioned in section . -. , is that we can conduct tests in a virtual market and engage in discussions that are not possible with traditional data analysis. In this paper, our main focus was to reveal the gap between real data and simulation analyses. Modeling HFT-MM in simulations is an essential task as the ratio of HFT in the financial market is showing a significant increase. Thus, in simulating an artificial market, we cannot ignore this type of HFT agent. Figure : Model outline. In our simulation, we considered one market with a continuous double auction system and two types of trader agents, i.e., HFT-MM and stylized trader agents. HFT-MM trader agents can place orders in the order books at every step and obtain information in real time. Stylized trader agents can only place their orders every steps and obtain information with a -step delay.

Model outline
. Figure shows an outline of our simulation model. In our simulation, to simplify the model, we assumed there to be only one market with a continuous double auction system, as explained in section . and Figure . We set the market start price to and its fundamental price movement according to geometric Brownian motion with a standard deviation of .
. We obtained this setting empirically by investigating real intra-day price changes, which are roughly about %, and considering the number of orders per step in simulations. There are two types of trader agents in our simulation model: HFT-MM and stylized. The most significant di erence between HFT-MM and stylized traders is the access speed to markets, i.e., ordering speed and speed of accessing market information. In the real market, HFT-MM traders locate their Algo ordering servers on the place provided by stock exchange markets to reduce information and ordering latency. Thus, we built our model with latency and assumed that stylized trader agents have a -step information delay and can place their orders only every steps. Other details regarding each agent type are provided in the following subsections.

Stylized trader agent model .
For the stylized traders, we developed a trader model based on that reported in (Torii et al. ). At time t, stylized trader agent i decides its order price by the following equations, which consider fundamental, chartist, and noise factors. .
First, we calculate the fundamental, chartist, and noise factors.
• Fundamental factor: where τ * i is agent i's mean-reversion-time constant, p * t is the fundamental price at the time t, which is given by the geometric Brownian motion as we mentioned above, and p t is the price at time t. Stylized trader agents have an information time delay, so they always refer to the information from steps before.
• Chartist factor: where τ i is agent i's time window size and r t is the logarithm return at time t.
• Noise factor: which means that N i t obeys a normal distribution with a zero mean and the variance (σ) 2 .
. Then, we calculate the weighted average of these three factors: where w i F , w i C , w i N are agent i's weights for each factor. .
In the next step, agent i's expected price is calculated using the following equation: . Then, using a fixed margin of k i ∈ [0, 1], the actual order prices are determined using the following rules: • If p i t > p t , agent i places a buy order at the price Here, p buy t is the best buy price, and p sell t is the best sell price.
. However, only using this routine based on that in (Torii et al. ), stylized agents can make unlimited orders step by step. Thus, we implemented a simple cancel routine, whereby every steps, when stylized traders can take action, stylized traders cancel all orders and make new orders according to the routine above. .
The following are the parameters we employed for this type of trader: 100,200]. Other than weights, we mainly determined these parameters based on the work of (Torii et al. ), and Ex(λ) indicates an exponential distribution with the expected value of λ.

HFT-MM trader agent model .
The HFT-MM trader agents in this study are based on (Avellaneda & Stoikov ). Using a pricing calculation, we constructed the HFT-MM traders' model.

.
First, we explain our calculation for pricing. At time t, agent i calculates the sell and buy prices by the following rules: . Calculate agent i's mid-price: where γ i is agent i's risk hedge level, σ i is agent i's observed standard deviation in the last τ i steps, τ i is agent i's time window size, as defined previously, T i is the time until their strategy is optimized, and q i t is agent i's inventory. T i is the parameter in the optimization process for deriving this equation, which we set to 1.
. Agent i's price interval between sell and buy orders: where k is a parameter for the order arrival time, which depends on Ex(k). In this simulation, we employed k = 1.5, which is the same as that used in (Avellaneda & Stoikov ).
. Calculate agent i's buy and sell price: . Then, agent i places a buy order at the following price: where p buy t is the best buy price, and agent i places a sell order at the following price: is the best sell price. .
In addition to price calculation, we implemented an order canceling routine. First, agents cancel orders whose prices are worse than the current p buy t and p sell t values. Second, we set the order limits of both sell and buy orders to ; if there are ten orders on the sell order book or ten orders on the buy order book, one order on the sell order book or buy order book is canceled. Moreover, all orders are subject to expiration. We set the expiration steps to be the same as the time window size of the agents. Thus, a er this number of expiration steps has passed since the order was placed, the orders are automatically canceled.

Simulation settings .
In our simulation, there are , stylized trader agents and HFT-MM trader agents. Each stylized trader agent has the opportunity to place orders every steps. These opportunities do not arrive simultaneously, i.e., agent 1 has opportunities in the first step of every steps, and agent 2 has opportunities in the second step of every steps. This means, in every step, only stylized traders and HFT-MM traders are activated. The market mechanism is the continuous double auction system, as described in section . and Figure . We implemented this mechanism in our simulation using the platform known as "PlhamJ" (Torii et al. ). .
When the simulation starts, all stylized trader agents have shares (at the beginning, $ for each) and $ , , and all HFT-MM trader agents have $ , . At the beginning of the simulation, the share price is set to $ , so, 50 shares + $30, 000 = 50 × $400 + $30, 000 = $50, 000. These settings are not set empirically but set just for simulation working correctly because short selling is allowed. Moreover, these settings are the same as (Torii et al. ). .
In the simulation, there are steps before the artificial market opening, steps to market stabilization, and , steps in the test.
. We ran the simulation times. We also plotted the order placements.

Simulation results and brief discussion
.
Figure shows the distribution of order price placements for the simulations. This plot indicates how many ticks away from the best price the orders of the HFT-MM trader agents were. Many orders were placed near the best price because HFT-MM trader agents adopt strategies to do so. However, this distribution has a quite long tail. To identify the reason for the long tail, we performed another analysis. .
Figure shows the autocorrelation function (ACF) for the absolute logarithm returns in one simulation run. This result indicates the presence of volatility clustering, which tend to have terms with high and low volatilities. Volatility clustering is listed as a stylized fact in (Cont ) and (Dacorogna et al. ). .
Because of this volatility clustering, there is a highly volatile term for which increasing the σ value in Equation causes HFT-MM traders to place orders far from the best prices.

Comparison of Data-based and Simulation Analyses
. In section . -. , we showed the order placement plot of HFT-MM traders based on real financial market data. In section . -. , we introduced our simulation model and presented a plot of order placements by HFT-MM trader agents. In this section, we combine the results and compare them.
Order price distribution comparison .
First, we generated a plot comparing the data mining and simulation results. To compare these plots, i.e., Figures and , we normalized the order frequency by dividing by the maximum order frequency for each data type. The results show that these order frequencies were highest when the tick was , so we divided the order frequency by the frequency when the tick was to obtain the relative frequency. Then, we generated the plot shown in Figure . Figure shows a comparison of the relative order frequency of the actual data from the real financial market described in section . -. and the simulation data described in section . -. . For the actual data from the real financial market, the horizontal axis is the number of ticks far from the market price. For the simulation data, however, the horizontal axis is the number of ticks away from the best price. This difference arises because we cannot obtain the best price from actual data owing to technical issues. Employing the market price in a simulation for this statistical analysis is inappropriate as the order book is thin and the spread is large because of the limited number of agents used. However, in the real financial market, HFT-MM trader agents tend to trade stocks that have high liquidity, so the market and best prices are not considerably di erent. Although we must address this problem, this di erence is not significant.
. Figure reveals that the order placement distribution by HFT-MM traders in the real market has significant tails on the right side. In the plot, when the number of ticks away is less than , the two plots are similar; however, when it is more than ticks, the plots completely di er, and around ticks away, the actual Figure : Comparison of the relative order frequencies of HFT-MM based on actual data from the real financial market, as described in section . -. , and the simulation data described in section . -. . We normalized both data sets by dividing them by the frequency when the tick was . In the plot, the horizontal axis is the number of ticks far from the market price (actual data) or best price (simulation).

The plot in
data have one local maximum. This phenomenon suggests that real HFT-MM traders combine the usual HFT-MM trade strategy in (Avellaneda & Stoikov ) and another strategy in which orders are placed a little further from the best or market price. Before discussing this issue, we subjected these two distributions to statistical analysis.

Comparison by calculating entropy .
To investigate the order distribution di erence, we calculated the degree of entropy and performed t-tests. .
First, we defined the order distribution range of the ticks. We set x as the number of ticks away from the market price (actual data) or the best price (simulation data). Then, we set the range to x ∈ [x min , x max ].
. Next, we calculated the ordering probabilities in the range: where f (x) is the frequency of orders i ticks away and x, i take only integers.
. Subsequently, we calculated the entropy.
As an experiment, we calculated E −5,15 and E −2,5 for the actual data for business days and the simulation data for simulations. Then, we subjected these pairs of data to a t-test. .
Here, we used entropy as one index representing the ordering distribution. If the distribution is flat, i.e., the probability of each ordering action is almost the same, the entropy becomes high. On the other hand, if the distribution has a significant peak, the entropy becomes comparatively low. Actually, both actual and simulated ordering distribution has one significant peak. So, here, we can understand that the higher entropy means fattailed distribution. Moreover, the t-test between the entropy of actual and simulated ordering distribution can determine whether there is a significant di erence between the two distributions. to show a significant di erence, but E −2,5 did not. This reveals that the two ordering distributions for the actual and simulation data near the best or market price are not di erent, whereas the two ordering distributions over a wide range are significantly di erent. Specifically, the entropy of the actual data over a wide range was greater than that of the simulation data. This is consistent with the order placement plot from the actual data having a longer tail than that of the simulation data. Thus, in this section, the significant di erence was verified statistically.

Discussion and Conclusions
. First, we discuss data mining. In our proposed method, we employed clustering and distribution analysis. Prior to clustering, we selected only high-frequency traders. Then, we performed a clustering analysis and selected one cluster, which we assumed to be HFT-MM traders based on some indexes. The results plotted in Figure  clearly show an order frequency peak near the market price. It means that our clustering method worked correctly because this peak is assumed to be from the HFT-MM strategy. In addition, we carefully checked that this extraction of HFT-MM was correctly working to our purposes. As future work, in order to apply our method to other types of traders, we should improve the clustering method. There is a possibility that the same types of traders are placed in multiple clusters or that some HFT-MM traders are also placed in the other clusters we extracted. We should improve our clustering method by employing or integrating other methods, such as clustering or classification with a neural network or k-nearest. .
Some considerations remain regarding our model and the results of our simulations. Our simulation is promising since it reproduced the order placement peak of the HFT-MM traders. Further, our simulation reproduced some stylized facts, such as volatility clustering. However, it must be said that achieving validity with this type of simulations is di icult. For instance, in our simulation, we employed only two types of trader agents. However, there are many trader types in real financial markets. Although we cannot create all agent types, we could consider employing additional types. If we added agent types, our model could approximate the real characteristics of markets in more detail. Moreover, this also could help improve the ordering action of other agents, so adding more realism to the mechanics of our simulated market. .
Our comparison of the actual and simulation data yielded an interesting insight: Our simulation has high fidelity near the best price. However, far from the best price, there is a significant di erence between actual data and our simulation. This di erence may exist because real HFT-MM traders combine other strategies. The one possible strategy is placing and keeping orders away from the best price for faster extraction when the price changes dramatically according to time-priority rules. The financial market has a time-priority rule that means a fast order is executed faster under the same conditions. Thus, real HFT-MM agents are supposed to place some orders far from the best price. However, by simply modeling the artificial market without data, we may overlook these important factors. In fact, the previous work (Avellaneda & Stoikov ) overpassed this feature. .
Therefore, in order to build a more dependable artificial market, it is essential to refer to real financial market data. Empirical data should be extensively used also when we do not merge the real data into simulation models. Conversely, if we build a model by focusing extensively on real data, the model could become overfitted with past data and result in irrelevant insights from the simulation. .
To take advantage of the benefits of simulation, we should develop a technique that can merge simulation and real data. One possible solution is extracting principal components from real data and combining these with simulations. The current problem is that the model made by human model makers can overlook key features of markets by overimposing theories and biases. So, if models could be designed more technologically, these problems could be reduced. For instance, building a model by eploiting machine learning or deep learning on data could be an interesting option. In this respect, there are plenty of data in financial markets, such as tick data, that could be used to exploit the power of machine learning or deep learning to inform model building. .
Moreover, we should try to focus also on other types of traders. This paper focused only on HFT-MM traders, whereas there are many types of traders and factors in the real financial market. The strategy of HFT-MM traders has been carefully modeled and is easily identified in real financial market data. However, dealing with traders or factors other than HFT-MM in the real financial market may prove more challenging. . In conclusion, in the next steps, we will try to build a method to implement real data to the simulation model without overfitting to the past record. Moreover, we will also extend and apply the method presented in this paper to the other type of traders.