Supporting Text

Literature Review

We will begin by reviewing what we call the standard literature, which includes purely empirical studies as well as theoretical studies predicated on rationality models. We will then review the literature on random process models of the continuous double auction, which is more closely related to the model we test here.

Standard Literature. The market microstructure literature focusing on the understanding of spread, volatility, and market impact in financial markets is theoretically and empirically extensive. Theoretical analyses traditionally use the underlying paradigm of rational agents. Models of spread, starting with Demsetz, Tinic, Stoll, Amihud and Mendelson, and Ho and Stoll (1-5) have examined the possible determinants of spreads as a result of the rational utility-maximizing problem faced by the market makers. Models providing insight into the utility-maximizing response of agents to other various measures of market conditions, such as volatility are, for example, Lo et al. (6), who investigate a simple model in which the log stock price is modeled as a Brownian motion diffusion process. Provided agents prefer a lower expected execution time; their model predicts a positive relationship between volatility and limit order placement. Copeland and Galai; Glosten and Milgrom; Easley and O’Hara; Glosten; Foucault; and Easley, O’Hara, and Saar (7-12) examine asymmetric information effects on order placement. Andersen (13) modifies the Glosten and Milgrom (12) model with the stochastic volatility and information flow perspective. Other models of trading in limit order markets include Cohen et al.; Angel; Harris; Chakravarty and Holden; Seppi; Rock; Parlour and Seppi, Parlour; Foucault, Kadan, and Kandel; and Domowitz and Wang (14-23).

Empirical research is equally rich. Roll (24) and Choi et al. (25) estimate spreads from transaction prices. Glosten and Harris, Hasbrouck, and Madhavan, Richardson, and Roomans (26-28) model spreads as a vector autoregresion (VAR) using various trade variables as independent variables. Roll; George, Kaul, and Nimalendran; Huang and Stoll; and Jang and Venkatesh (25, 29-31) focus on estimating components of the spread.

Empirical research in volatility was initiated by statistical descriptions of the volatility process [see Engle; Bollerslev (32, 33); and Bollerslev et al. (34) for a survey) but has grown increasingly ambitious with multivariate structural models of the interaction between volatility and other economic variables. Positive correlation of daily volume and volatility has been documented in Clark, Epps and Epps, and Tauchen and Pitts (35-37). Volume has been entered into autoregresive conditional heteroshedastic (ARCH) specifications by Lamoureux and Lastrapes (38). Other empirical investigations of volatility determinants and consequences include Gallant et al., Anderson, Blume et al., Reiss and Werner, Fleming et al., and Hasbrouck and Saar (13, 39-42). A comprehensive study of joint distribution or returns and volume is done by Gallant, Rossi, and Tauchen (38, 43).

Random Process Models of Continuous Double Auction. There are two independent lines of prior work, one in the financial economics literature and the other in the physics literature. The models in the economics literature are directed toward econometrics and treat the order process as static. In contrast, the models in the physics literature are mostly conceptual toy models, but they allow the order process to react to changes in prices and are thus fully dynamic. Our model bridges this gap. This is explained in more detail below.

The first model of this type we are aware of in the economics literature was due to Mendelson (44), who modeled random order placement with periodic clearing. Cohen et al. (45) developed a model of a continuous auction, modeling limit orders, market orders, and order cancellation as Poisson processes. However, they allowed limit orders only at two fixed prices, buy orders at the best bid and sell orders at the best ask. This assumption allowed them to use standard results from queuing theory to compute properties such as the expected number of stored limit orders, the expected time to execution, and the relative probability of execution vs. cancellation. Domowitz and Wang (17) extended this to multiple price levels by assuming arbitrary order placement and cancellation processes (which can take on any value at each price level). They assume these processes are fixed in time and do not respond to changes in the best bid or ask. This allows them to derive the distribution of the spread, transaction prices, and waiting times for execution. This model was tested by Bollerslev et al. (46) on 3 weeks of data for the Deutschmark/U.S. dollar exchange rate. They showed that the model does a good job of predicting the distribution of the spread. However, because the prices are pinned, the model does not make a prediction about price diffusion, and this also creates errors in the predictions of the spread and stored supply and demand.

Models in the physics literature, which appear to have been developed independently, differ in that they address price dynamics. That is, they incorporate feedback between order placement and price formation, allowing the order placement process to change in response to changes in prices. These models have mainly been conceptual toy models designed to understand the anomalous diffusion properties of prices (a property that all of these models fail to reproduce, as explained later). This line of work began with a paper by Bak et al. (47), which was developed by Eliezer and Kogan (48) and Tang (49). They assume that limit orders are placed at a fixed distance from the midpoint, and that the limit prices of these orders are then randomly shuffled until they result in transactions. It is the random shuffling that causes price diffusion. This assumption, which we feel is unrealistic, was made to take advantage of the analogy to a standard reaction-diffusion model in the physics literature. Maslov (50) introduced an alternative model that was solved analytically in the mean-field limit by Slanina (51). Each order is randomly chosen to be either a buy or a sell with equal probability and either a limit order or a market order with equal probability. If a limit order, it is randomly placed within a fixed distance of the current price. Both the Bak et al. (52) model and that of Maslov (50) result in anomalous price diffusion, in the sense of the Hurst exponent H = 1/4 (in contrast to standard diffusion, which has H = 1/2, or real prices, which tend to have H > 1/2). In addition, the Maslov model unrealistically requires equal probabilities for limit and market order placement; otherwise, the inventory of stored limit orders either goes to zero or grows without bound. A model adding a Poisson order cancellation process was proposed by Challet and Stinchcombe (52), and independently by Daniels et al. (53). Challet and Stinchcombe showed that this results in H = 1/4 for short times but asymptotically gives H = 1/2. The Challet and Stinchcombe model, which posits an arbitrary unspecified function for the relative position of limit order placement, is quite similar to that of Domowitz and Wang (17) but allows for the possibility of order placement responding to price movement.

The model we test here was introduced by Daniels et al. (53). Like other physics models, it treats the feedback between order placement and price movement. It has the advantage that it is defined in terms of five scalar parameters and so is parsmonious and can easily be tested against real data. Its simplicity enables a dimensional analysis, which gives approximate predictions about many of the properties of the model. Perhaps most important is the use to which the model is put: with the exception of ref. 47, work in the physics literature has focused almost entirely on the anomalous diffusion of prices. Although interesting and important for refining risk calculations, from a practical point of view, this is a second-order effect. In contrast, the model studied here focuses on first-order effects of primary interest to market participants, such as the bid-ask spread, volatility, depth profile, price impact, and the probability and time to fill an order. It demonstrates how dimensional analysis becomes a useful tool in an economic setting, and the analysis done in Daniels et al. (53) and Smith et al. (54) develops mean field theories to understand many relevant market properties. Many of the important properties of the model can be stated in terms of simple scaling relations in terms of the five parameters.

Subsequent to ref. 52, Bouchaud et al. (55) demonstrated they can derive a simple equation for the depth profile, by making the assumption that prices execute a random walk and introducing an additional free parameter. In this paper, we show how to do this from first principles without introducing a free parameter. Chiarella and Iori (56) have numerically studied fundamentalists and technical traders placing limit orders; a presentation about this work by Giulia Iori in part inspired this model.

Dimensional Analysis

Dimensional analysis can be used to simplify the study of this model and to make some approximate predictions about several of its properties. For a good reference on dimensional analysis, see Barenblatt (57).

There are three fundamental dimensional quantities in this model: shares, price, and time. There are five parameters. Because they have independent dimensions, when the dimensional constraints between the parameters are taken into account, this leaves only two independent degrees of freedom. It turns out that the order flow rates m , a , and d are more important than the discreteness parameters s and dp, in the sense that the properties of the model are much more sensitive to variations in the order flow rates than to variations in s or dp. It is therefore natural to construct nondimensional units based on the order flow parameters alone. There are unique combinations of the three-order flow rates with units of shares, price, and time. This gives characteristic scales for price, shares, and time that are unique up to a constant: the characteristic number of shares N_c = m /d , the characteristic price interval p_c = m /a , and the characteristic timescale t_c = 1/d . These are the unique combinations of these three parameters with the correct dimensions.

These characteristic scales can be used to define nondimensional coordinates based on the order flow rates. These are = p/p_c for price, = N/N_c for shares, and = t/t_c for time. The use of nondimensional coordinates has the great advantage that it reduces the number of degrees of freedom from 5 to 2. That is, instead of five independent parameters, we have only two, which makes the model easier to understand.

The two irreducible degrees of freedom are naturally discussed in terms of nondimensional versions of the discreteness parameters. A nondimensional scale parameter based on order size is constructed by dividing the typical order size s (with dimensions of shares) by the characteristic number of shares N_c. This gives the nondimensional parameter e º s /N_c = d s /m , which characterizes the granularity of the order flow. A nondimensional scale parameter based on tick size is constructed by dividing the tick size dp by the characteristic price, i.e., dp/p_c = a dp/m . The usefulness of this is that the properties of the model depend only on the two nondimensional parameters, e and dp/p_c; any variations of the parameters m , a , and d that keep these two nondimensional parameters constant give exactly the same market properties. One interesting result that emerges from analysis of the model is that the effect of the granularity parameter e is generally much more important than the tick size dp/p_c. For a more detailed discussion, see ref. 53.

Although we have investigated numerically the effect of varying the tick size in ref. 53, for the purposes of comparing to data, here we simply take the limit dp® 0, which provides a reasonable approximation.

The London Stock Exchange (LSE) Data Set

The LSE is composed of two parts: the electronic open limit order book and the upstairs market, which employs a bilateral method of exchange and is used to facilitate large block trades. During the time period of our data set, 40-50% of total volume was routed through the electronic order book and the rest through the upstairs market. It is believed that the limit order book is the dominant price formation mechanism of the LSE: » 75% of upstairs trades happen between the current best prices in the order book (58). Our analysis involves only the data from the electronic order book. We chose to study this data set because we have a complete record of every action taken by every participating institution, allowing us to measure order flows and cancellations and estimate all of necessary parameters of our model.

We used data from the time period August 1, 1998 to April 30, 2000, which includes a total of 434 trading days and roughly six million events. We chose 11 stocks, each having the property that the total number of events exceeds 300,000 and was never less than 80 on any given day. Some statistics about the order flow for each stock are given in Table 1.

The trading day of the LSE starts at 0750 with a roughly 10-min-long opening auction period (during the later part of the data set, the auction end time varies randomly by 30 sec). During this time, orders accumulate without transactions, then a clearing price for the opening auction is calculated, and all opening transactions take place at this price. After the opening at 0800, the market runs continuously, with orders matched according to price and time priority, until the market closes at 1630. In the earlier part of the data set, until September 22, 1999, the market opening hour was 0900. During the period we study, there have been some minor modifications of the opening auction mechanism, but because we discard the opening auction data anyway, this is not relevant.

Some stocks in our sample [Vodafone (VOD), for example] have had stock price splits and tick price changes during the period of our sample. We take splits into account by transforming stock sizes and prices to presplit values. In any case, because all measured quantities are in logarithmic units of the form log(p₁) - log(p₂), the absolute price scale drops out. Our theory predicts that the tick size should change some of the quantities of interest, such as the bid-ask spread, but the predicted changes are small enough in comparison with the effect of other parameters that we simply ignore them (and base our predictions on the limit where the tick size is zero). Because granularity is much more important than tick size, this seems to be a good approximation.

Opening Auction, Real Order Types, and Time. Because the model does not take the opening auction into account, we simply neglect orders leading up to the opening auction and base all our measurements on the remaining part of the trading day, when the auction is continuous.

To treat simply and in a unified manner the diverse types of orders traders can submit in a real market (for example, crossing limit orders, market orders with limiting price, fill or kill, execute, and eliminate), we use redefinitions based on whether an order results in an immediate transaction, in which case we call it an effective market order, or whether it leaves a limit order sitting in the book, in which case we call it an effective limit order. Marketable limit orders (also called crossing limit orders) are limit orders that cross the opposing best price and so result in at least a partial transaction. The portion of the order that results in an immediate transaction is counted as an effective market order, whereas the nontransacted part (if any) is counted as an effective limit order. Orders that do not result in a transaction and do not leave a limit order in the book, such as, for example, failed fill-or-kill orders, are ignored altogether. These have no effect on prices, and in any case, make up only a very small fraction of the order flow, typically < 1%. Note that we drop the term effective, so that, e.g., market order, means effective market order.

A limit order can be removed from the book for many reasons, e.g., because the agent changes her mind, because a time specified when the order was placed has been reached, or because of the institutionally mandated 30-day limit on order duration. We lump all of these together and simply refer to them as cancellations.

Our measure of time is based on the number of events, i.e., the time elapsed during a given period is just the total number of events, including effective market order placements, effective limit order placements, and cancellations. We call this event time. Price intervals are computed as the difference in the logarithm of prices, which is consistent with the model, in which all prices are assumed to be logarithmic to assure their positiveness.

Measurement of Model Parameters

We test the predictions of the model against real data cross-sectionally on 11 stocks. The parameters of the model are stated in terms of order-arrival rates, cancellation rate, and order size, and our data set allows us to compute for each stock the average values of these parameters. As we explain here, these average rates are calculated as means of daily values weighted by daily number of events. An alternative would have been to calculate the mean values of the parameters over the entire 2-yr period for each stock. Although this works well for the parameters m and s , it does not work as well for a and d , as explained below.

Measuring m and s . The parameter m _t, which characterizes the average market order arrival rate on day t, is just the ratio of the number of shares of effective market orders (for both buy and sell orders) to the number of events during the trading day. Thus for m , it makes no difference whether we measure it across the whole period or take a weighted average of daily values. This is also true for the average order size s _t. One complication in measuring s is that the model assumes the average size for limit orders and market orders is the same, whereas for real data, this is not strictly true. Nonetheless, as seen in Table 1, although the limit order size tends to be a bit larger than the market order size, it is still a fairly good approximation to take them to be the same. For the purposes of this analysis, we use the limit order size to measure order size, based on theoretical arguments that this is more important than market order size. In any case, this does not make a significant difference in the results.

Measuring a and d . The measuring of the cancellation rate d _t and the limit order rate density a _t is more complicated, due to the highly simplified assumptions we made for the model. In contrast to our assumption of a constant density for placement of limit orders across the entire logarithmic price axis, real limit order placement is highly concentrated near the best prices. Roughly 2/3 of all orders are placed either at the best price or inside the spread. Outside the spread, the density of limit order placement falls off as a power law as a function of the distance from the best prices (55, 59). In addition, we have assumed a constant cancellation rate, whereas in reality orders placed near the best prices tend to be cancelled much faster than orders placed far from the best prices. We cope with these problems by introducing an auxiliary assumption. Basically, we assume that order placement is constant inside an interval and zero outside that interval. This is described in more detail below.

To estimate the limit order rate density for day t, a _t, we make an empirical estimate of the distribution of the relative price for effective limit order placement on each day. For buy orders, we define the relative price as D = m - p, where p is the logarithm of the limit price, and m is the logarithm of the midquote price. Similarly for sell orders, D = p- m. We then somewhat arbitrarily choose Q_t^lower as the 2 percentile of the density of D corresponding to the limit orders arriving on day t, and Q_t^upper as the 60 percentile of D . Assuming constant density within this range, we calculate a _t as a _t = L/(Q_t^upper - Q_t^lower), where L is the total number of shares of effective limit orders within the price interval (Q_t^lower; Q_t^upper) on day t. The choice of Q_t^upper is made in a compromise to include as much data as possible for statistical stability but not so much as to include orders that are unlikely ever to be executed and therefore unlikely to have any effect on prices.

Similarly, to cope with the fact that in reality the average cancellation rate d decreases (55) with the relative price D , whereas in the model, d is assumed to be constant, we base our estimate for d only on canceled limit orders within the range of the same relative price boundaries (Q_t^lower; Q_t^upper) defined above. We do this to be consistent in our choice of which orders are assumed to contribute significantly to price formation (orders closer to the best prices contribute more than orders further away). We then measure d _t, the cancellation rate on day t, as the inverse of the average lifetime of a canceled limit order in the above price range. Lifetime is measured in terms of number of events happening between the introduction of the order and its subsequent cancellation.

The parameter Q_t^upper is referred to as W in the main text. In other subsequent studies (to be reported elsewhere), we are able to set the parameter (Q_t^lower = 0) and to compute D relative to the opposite best price rather than the midprice, with negligible differences in the results. The difference is that in the later studies, we have a cleaner data set. In this data set, there are some points that are clearly outliers, and it was convenient to introduce a lower cutoff for outlier removal. Thus, we do not feel that (Q_t^lower) is an important parameter for this analysis, and we have not discussed it in the text (where we have limited space).

The use of this procedure dictates that it is better to choose an average of daily parameters rather than computing average parameters based on ratios of values for the whole period, because the width of the interval over which orders are placed varies significantly in time. Moment-by-moment orderbook reconstruction makes it clear that the properties of the market tend to be relatively stationary during each day, changing more dramatically overnight. Order flows on different days can be rather dissimilar. This nonstationarity of the order flow means that d and a parameter calculation would perform poorly if we attempted to use an average price interval over the whole period. This would have the result that on some days, we might count only a small fraction of the order flow, excluding many orders that were important for price formation, whereas on other days, we would include almost all orders, many of which were not very relevant for price formation. This problem makes it natural to use daily averages of parameters.

This introduces the concern that daily variations in W might be an important predictive variable, above and beyond its effect on changing a (which is consistent with the model). There is a tendency for the value of W on a given day to track the spread, due to regularities in order placement, and therefore to automatically have some correlation with the spread. We have done several studies, which will be reported in a future work, testing the importance of this effect. These show that, although daily variations in W do give additional predictability for the spread, other aspects of the model are substantially responsible for these results.

Measuring the Price Diffusion Rate. The measurement of the price diffusion rate requires some discussion. We measure the intraday price diffusion by computing the midpoint price variance V(t ) = Var{m(t + t ) - m(t)}, for different time scales t . The averaging over t includes all events that change the midpoint price. The plot of V(t ) against t is called a diffusion curve and for an independently identically distributed (IID) random walk is a straight line with slope D, the diffusion coefficient.

In our case, the computation of D is as follows: for each day, we compute the diffusion curve. In this way, we avoid overnight price changes that would bias our estimate. To the daily diffusion curve, we then fit a straight line V(t ) = D_tt using least squares weighted by the square root of the number of observations for each value of V(t ). In fitting a straight line, we are assuming IID midpoint price movement, which is relatively well borne out in the data. For an example, see Fig. 5. Averaging the daily diffusion rates, we obtain the full sample estimate of the stock diffusion. We weigh the averaging by the number of events in each day. One must bear in mind that the price diffusion rate from day to day has substantial correlations, as illustrated in Fig. 6.

Estimating the Errors for the Regressions

The error bars presented in the text are based on a bootstrapping method. It may at first seem that the proper method would be to simply use White’s heteroskedasticity consistent estimators; however, we are driven to use this method for two reasons.

First, within each stock, the daily values of the dependent variables display slowly decaying positive autocorrelation functions. Averaging the daily values to get an estimate of the stock-specific average may seem to remedy the autocorrelation problem. However, autocorrelation is very persistent, almost to the scale of the length of our data set, and the variables may indeed have long memory. [It has recently been shown that order sign, order volume, and liquidity, as reflected by volume at the best price, are long-memory processes (60, 61).] This makes us suspicious about using standard statistics.

A second reason for using a bootstrapping method for inference is that, in addition to possibly being long memory, the daily values of the variables are crosscorrelated across stocks. (A high volatility in one stock on a particular day is likely to be associated with high volatility in other stocks.) These two reasons lead us to believe that using standard or White’s estimators would underestimate regression errors.

The method we use is inspired by the variance plot method described in Beran (62). We divide the sample into blocks, apply the regression to each block, and then study the scaling of the deviation in the results, extrapolating to coincide with the full sample. We divide the N daily data points for each stock into m disjoint blocks, each containing n adjacent days, so that n » N/m. We use the same partition for each stock, so that corresponding blocks for each stock are contemporaneous. We perform an independent regression on each of the m blocks and calculate the mean M_m and standard deviation s _m of the m slope parameters A_i and intercept parameters B_i, i = 1, . . . , m. We then vary m and study the scaling as shown in Figs. 7 and 8.

Fig. 7 a and b illustrate this procedure for the spread, and Fig. 8 a and b illustrate it for the price diffusion rate. Similarly, Figs. 7 c and d and 8 c and d show the mean and standard deviation for the intercept and slope as a function of the number of bins. As expected, the standard deviations of the estimates decrease as n increases. The logarithm of the standard deviation for the intercept and slope as a function of log n is shown in Figs. 7 e and f and 8 e and f. For IID normally distributed data, we expect a line with slope g = -1/2; instead, we observe g > -1/2. For example, for the spread g » -0.19, |g | < 1/2 is an indication that this is a long-memory process.

This method can be used to extrapolate the error for m = 1, i.e., the full sample. This is illustrated in Figs. 7 e and f and 8 e and f. The inaccuracy in these error bars is evident in the unevenness of the scaling. This is particularly true for the price diffusion rate. To get a feeling for the accuracy of the error bars, we estimate the standard deviation for the scaling regression assuming standard error, and repeat the extrapolation for the one standard deviation positive and negative deviations of the regression lines, as shown in Figs. 7 e and f and 8 e and f. The results are summarized in Table 2.

One of the effects that is evident in Figs. 7 c and d and 8 c and d is that the slope coefficients tend to decrease as m increases. We believe this is due to the autocorrelation bias discussed in Section V.

Market Impact

Relation of Market Impact to Supply and Demand Schedules. The market impact function is closely related to the more familiar notions of supply and demand. We have chosen to measure average market impact paper rather than average relative supply and demand in this paper for reasons of convenience. Measuring the average relative supply and demand requires reconstructing the limit order book at each instant, which is both time-consuming and error-prone. The average market impact function, in contrast, can be measured based on a time series of orders and best bid and ask prices.

At any instant in time, the stored queue of sell limit orders reveals the quantity available for sale at each price, thus showing the supply, and the stored buy orders similarly show the revealed demand. The price shift caused by a market order of a given size depends on the stored supply or demand through a moment expansion (54). Thus, the collapse of the market impact function reflects a corresponding property of supply and demand. Normally, one would assume that supply and demand are functions of human production and desire; the results presented here suggest that on a short timescale, in financial markets their form is dictated by the dynamical interaction of order accumulation, removal by market orders and cancellation, and price diffusion.

Alternative Market Impact Collapse Plots. We have demonstrated a good collapse of market impact by using nondimensional units. However, in deciding what "good" means, one should compare this to the best alternatives available. We compare to three such alternatives. Fig. 9 Upper Left shows the collapse when using nondimensional units derived from the model (repeated from the main text). Fig. 9 Upper Right shows the average market impact when we instead normalize the order size by its sample mean. Order size is measured in units of shares and market impact is in log price difference. Fig. 9 Lower Left attempts to take into account daily variations of trading volume, normalizing the order size by the average order size for that stock on that day. In Fig. 9 Lower Right, we use trade price to normalize the order sizes that are now in monetary units (British pounds). We see that none of the alternative rescalings comes close to the collapse we obtain when using nondimensional units; because of the much greater dispersion, the error bars in each case are much larger.

Error Analysis for Market Impact. Assigning error bars to the average market impact is difficult, because the absolute price changes D p have a slowly decaying positive autocorrelation function. This may be a long-memory process, although this is not as obvious as for other properties of the market, such as volume and sign of orders (60, 61). The signed price changes D p have an autocorrelation function that rapidly decays to zero, but to compute market impact, we sort the values into bins, and all of the values in the bin have the same sign. One might have supposed that, because the points entering a given bin are not sequential in time, the correlation would be sufficiently low that this might not be a problem. However, the autocorrelation is sufficiently strong that its effect is still significant, particularly for smaller market impacts and must be taken into account.

To cope with this, we assign error bars to each bin using the variance plot method described, for example, in Beran (62). This is a more straightforward version of the method discussed in Section V. The sample of size n = 434 is divided into m subsamples of n points adjacent in time. We compute the mean for each subsample, vary n, and compute the standard deviation of the means across the m = N/n subsamples. We then make use of theorem 2.2 from Beran (62), which states that the error in the n sample mean of a long-memory process is = s n^-g, where g is a positive coefficient related to the Hurst exponent, and s is the standard deviation. By plotting the standard deviation of the m estimated intercepts as a function of n, we estimate g and extrapolate to n = sample length to get an estimate of the error in the full sample mean. An example of an error scaling plot for one of the bins of the market impact is given in Fig. 10.

A central question about Fig. 9 is whether the data for different stocks collapse onto a single curve, or whether there are statistically significant idiosyncratic variations from stock to stock. From the results presented in Fig. 9, this is not completely clear. Most of the stocks collapse onto the curve for the pooled data (or the pooled data set with themselves removed). There are a few that appear to make statistically significant variations, at least if we assume that the mean values of the bins for different order size levels are independent. However, they are most definitely not independent, and this nonindependence is difficult to model. In any case, the variations are always fairly small, not much larger than the error bars. Thus the collapse gives at least a good approximate understanding of the market impact, even if there are some small idiosyncratic variations it does not capture.

Market Impact in Log-Log Coordinates. If we fit a function of the form f (w ) = Kw b to the market impact curve, we get b = 0.26 ± 0.02 for buy orders and b = 0.23 ± 0.02 for sell orders, as shown in Fig. 11. The functional form of the market impact we observe here is not in agreement with a recent theory by Gabaix et al. (63), which predicts b = 0.5. Although the error bars given are standard errors and are certainly too optimistic, it is nonetheless quite clear that the data are inconsistent with b = 1/2. This relates to an interesting debate: The theory for average market impact put forth by Gabaix et al. (63) follows traditional thinking in economics and postulates that agents optimize their behavior to maximize profits, whereas the theory we test here assumes they behave randomly, and that the form of the average market impact function is dictated by the statistical mechanics of price formation.

Extending the Model

In the interest of full disclosure, and as a stimulus for future work, in this section, we detail the ways in which the current model does not accurately match the data and sketch possible improvements. This model was intended to describe a few average statistical properties of the market, some of which it describes surprisingly well. However, there are several aspects that it does not describe well. Fixing the problems with this model requires a more sophisticated model of order flow, including a more realistic model of price dependence in order placement and cancellations (55, 59), long-memory properties (60, 61), and the relationship of the different components of the order flow to each other. This is a much harder problem and is likely to require a more complicated model. Members of our group are actively working on this problem. Although this will certainly have many advantages over the current model, it will also have the disadvantage of introducing more free parameters and thereby complicating the scaling laws (and making the possibility for analytic results more remote).

One of the major ways in which this model is not realistic concerns price diffusion. Real price increments are roughly white, i.e., roughly uncorrelated. One might naively think that under IID Poisson order flow, price increments should also be IID. However, due to the coupling of boundary conditions for the buy market order/sell limit order process to those of the sell market order/buy limit order process, this is not the case. Because supply and demand tend to build (i.e., the depth of standing limit orders increases) as one moves away from the center of the book, price reversals are more common than price changes in the same direction. As a result, the price increments generated by this model are more anticorrelated than those of real price series. This has an interesting consequence: if we add the assumption of market efficiency and assume that real price increments must be white, it implies that real order flow should be positively autocorrelated to compensate for the anticorrelations induced by the continuous double auction. This has indeed subsequently been observed to be the case (60, 61).

One of the side effects of this anticorrelation of prices is that it implies there exist arbitrage opportunities that can be taken advantage of by an intelligent agent. A separate study of these arbitrage opportunities makes clear they are not risk-free in the sense usually used in economics. That is, taking advantage of them necessarily involves taking risks, and they do not permit arbitrarily large profits; returns decrease with the size of investment and eventually go to zero. Exploring the nature of these arbitrage opportunities and the effect that exploiting them has on prices is one of the directions in which this model can be improved (one that is being actively explored). However, we do not feel the existence of such arbitrage opportunities (which in our opinion mimic those of real markets) presents a serious problem for the purposes for which we are using this model.

In the following list, we summarize the main directions in which members of our group are working to improve this model.

• Price diffusion. The variance of real prices obeys the relationship s ²(t ) = Dt ^2H to a good approximation for all values of t , where s ²(t ) is the variance of price changes or returns computed on timescale t . The Hurst exponent H is close to and typically a little greater than 0.5. In contrast, under Poisson order flow, as already discussed above, due to the dynamics of the double continuous auction price formation process, prices make a strongly anticorrelated random walk. This means that the function s ²(t ) is nonlinear. Asymptotically H = 0.5, but for shorter times, H < 0.5. Alternatively, one can characterize this in terms of a timescale-dependent diffusion rate D(t ), so that the variance of prices increases as s ²(t ) = D(t )t . Refs. 52 and 53 show that the limits t ® 0 and t ® ∞ obey well defined scaling relationships in terms of the parameters of the model. In particular, D(0) ~ m ^2d/a ^2e
-1/2, and D(1) ~ m ^2d/a ^2e
1/2. Interestingly, and for reasons we do not fully understand, the prediction of the short-term diffusion rate, D(0), does a good job of matching the real data, as we have shown here, whereas D(∞) does a much poorer job.

• Market efficiency. The question of market efficiency is closely related to price diffusion. The anticorrelations mentioned above imply market inefficiency. We are investigating the addition of low-intelligence agents to correct this problem.

• Correlations in spread and price diffusion. We discussed in Section V the problems that autocorrelations in spread and price diffusion create for comparing the theory with the model on a daily scale. This is related to the fact that this model does not correctly capture either the fat tails of price fluctuations or the long memory of volatility.

• Lack of dependence on granularity parameter. In Section VI, we discuss the fact that the model predicts more variation with the granularity parameter than we observe. Apparently the Poisson-based nondimensional coordinates work even better than one would expect. This suggests there is some underlying simplicity in the real data that we have not fully captured in the model.

Although in this paper we are stressing the fact that we can make a useful theory out of zero-intelligence agents, we are certainly not trying to claim that intelligence does not play an important role in what financial agents do. Indeed, one of the virtues of this model is that it provides a benchmark to separate properties driven by the statistical mechanics of the market institution from those driven by conditional intelligent behavior.

1. Amihud, Y. & Mendelson, H. (1980) J. Financ. Econ., 8, 31-53.

2. Demsetz, H. (1968) Q. J. Econ., 82, 33-53.

3. Ho, T. & Stoll, H. R. (1981) J. Financ. Econ. 9, 47-73.

4. Stoll, H. R. (1978) J. Finance 33, 1133-1151.

5. Tinic, S. M. (1972) Q. J. Econ. 86, 79-93.

6. Lo, A. W., MacKinley, A. C. & Zhang, Z. M. (2002) J. Financ. Markets 65, 31-71.

7. Copeland, T. E. & Galai, D. (1983) J. Finance 38, 1457-1469.

8. Easley, D. & O’Hara, M. (1987) J. Financ. Econ. 19, 69-90.

9. Easley, D., O’Hara, M. & Saar, G. (2001) J. Financ. Quant. Anal. 36, 25-51.

10. Foucault, T. (1999) J. Financc Markets 2, 99-134.

11. Glosten, L. R. (1995) Competition and the Set of Allowable Prices (Columbia University, New York).

12. Glosten, L. R. & Milgrom, P. R. (1985) J. Financ. Econ. 14, 71-100.

13. Andersen, T. G. (1996) J. Finance 51, 169-204.

14. Angel, J. J. (1994) Limit vs. Marker Orders (<53><53><53>).

15. Chakravarty, S. & Holden, C. W. (1995) J. Financ. Intermed. 4, 213-241.

16. Cohen, K. J., Maier, S. F., Schwartz, R. A. & Whitcomb, D. K. (1981) J. Polit. Econ. 89, 287-305.

17. Domowitz, I. & Wang, J. (1994) J. Econ. Dyn. Control 18, 29-60.

18. Foucault, T., Kadan, O. & Kandel, E. (2001) Limit Order Book as a Market for Liquidity (Center for Economic Policy Research, London), Discussion Paper 2889.

19. Harris, L. (1998) Financ. Markets Inst. Instr. 7, 1-75.

20. Parlour, C. A. (1998) Rev. Financ. Studies 11, 789-816.

21. Parlour, C. A. & Seppi, D. J. (2003) Rev. Financ. Studies 16, 301-343.

22. Evan, S., Martin, D. D. & Lyons, R. K. (2002) J. Polit. Econ. 110, 170-180.

23. Seppi, D. J. (1997) Rev. Financ. Studies 10, 103-150.

24. Choi, W. S., Lee, S. B. & Yu, P. I. (1998) in Advances in Investment Analysis and Portfolio Management, ed. Lee, C.-F. E. (JAI Press, Greenwich, CT), Vol. 5, pp. 105-122.

25. Roll, R. (1984) J. Finance 39, 1127-1139.

26. Glosten, L. L. (1988) J. Financ. Econ. 21, 123-142.

27. Hasbrouck, J. (1988) J. Financ. Econ. 22, 229-252.

28. Madhavan, A., Richardson, M. & Roomans, M. (1997) Rev. Financ. Studies 10, 1035-1064.

29. George, T. J., Kaul, G. & Nimalendran, M. (1991) Rev. Financ. Studies 4, 623-656.

30. Huang, R. D. & Stoll, H. R. (1997) Rev. Financ. Studies 10, 995-1034.

31. Jang, H. & Venkatesh, P. C. (1991) J. Finance 46, 433-446.

32. Engle, R. F. (1982) Econometrica 50, 987-1007.

33. Bollerslev, T. (1986) J. Econometrics 31, 307-327.

34. Bollerslev, T., Chou, R. Y. & Kroner, K. F. (1992) J. Econometrics 52, 5-59.

35. Clark, P. K. (1973) Econometrica 41, 135-155.

36. Epps, T. W. & Epps, M. L. (1976) Econometrica 44, 305-321.

37. Tauchen, G. E. & Pitts, M. (1983) Econometrica 51, 485-505.

38. Lamoureux, C. G. & Lastrapes, W. D. (1994) J. Bus. Econ. Stat. 12, 253-260.

39. Gallant, R. A., Rossi, P. E. & Tauchen, G. E. (1992) Rev. Financ. Studies 5, 199-242.

40. Blume, M. E., Keim, D. B. & Patel, S. A. (1991) J. Finance 46, 49-74.

41. Reiss, P. C. & Werner, I. M. (1994) Transaction Costs in Dealer Markets: Evidence from the London Stock Exchange (National Bureau of Economic Research, Cambridge, MA).

42. Fleming, J., Kirby, C. & Ostdiek, B. (2001) J. Finance 56, 329-352.

43. Gallant, A. R., Rossi, P. E. & Tauchen, G. E. (1993) Econometrica, 61, 871-908.

44. Mendelson, H. (1982) Econometrica 50, 1505-1524.

45. Cohen, K. J., Conroy, R. M. & Maier, S. F. (1985) in Market Making and the Changing Structure of the Securities Industry, eds. Amihud, Y., Ho, T. & Schwartz, R. A. (Rowman & Littlefield, Lanham, MD), pp. 93-110.

46. Bollerslev, T., Domowitz, I. & Wang, J. (1997) J. Econ. Dyn. Control 21, 1471-1491.

47. Bak, P., Paczuski, M. & Shubik, M. (1996) Price Variations in a Stock Market with Many Agents (Cowles Foundation Library, Yale University, New Haven, CT).

48. Eliezer, D. & Kogan, I. I. (1998) http://arxiv.org/cond-mat/9808240.

49. Tang, L. H. & Tian, G. S. (1999) Physica A 264, 543-550.

50. Maslov, S. (2000) Physica A 278, 571-578.

51. Slanina, F. (2001) Phys. Rev. E 64, 056136.

52. Challet, D. & Stinchcombe, R. (2001) Physica A 300, 285-299.

53. Daniels, M. G., Farmer, J. D., Iori, G. & Smith, E. (2003) Phys. Rev. Lett. 90, 108102

54. Smith, E., Farmer, J. D. (2003) Quant. Finance 3, 481-514.

55. Bouchaud, J.-P., Mezard, M. & Potters, M. (2002) Quan. Finance 2, 251-256.

56. Chiarella, C. & Iori, G. (2002) Quant. Finance 2, 346-353.

57. Barenblatt, G. (1987) Dimensional Analysis (Gordon and Breach, New York).

58. London Stock Exchange (2001) SETS: Four Years on the London Stock Exchange (London Stock Exchange, London).

59. Zovko, I. & Farmer, J. D. (2002) Quant. Finance 2, 387-392.

60. Bouchaud, J.-P., Gefen, Y., Potters, M. & Wyart, M. (2004) Quant. Finance 4, 176-190.

61. Lillo, F. & Farmer, J. D. (2003) Studies Nonlin. Dyn. Econ. 8, 1.

62. Beran, J. (1994) Statistics for Long-Memory Processes (Chapman & Hall, New York).

63. Gabaix, X., Gopikrishnan, P., Plerou, V. & Stanley, H. E. (2003) Nature 423, 267-270.