Farmer et al. 10.1073/pnas.0409157102.

Supporting Information

Files in this Data Supplement:

Supporting Text
Supporting Table 1
Supporting Table 2
Supporting Figure 5
Supporting Figure 6
Supporting Figure 7
Supporting Figure 8
Supporting Figure 9
Supporting Figure 10
Supporting Figure 11

Table 1. Summary statistics for stocks in the data set

Stock ticker	No. events (1,000s)	Average (per day)	Limit (1,000s)	Market (1,000s)	Deletions (1,000s)	Eff. limit (shares)	Eff. market (shares)	No. of days
AZN	608	1,405	292	128	188	4,967	4,921	429
BARC	571	1,318	271	128	172	7,370	6,406	433
CW.	511	1,184	244	134	134	12,671	11,151	432
GLXO	814	1,885	390	200	225	8,927	6,573	434
LLOY	644	1,485	302	184	159	13,846	11,376	434
ORA	314	884	153	57	104	12,097	11,690	432
PRU	422	978	201	94	127	9,502	8,597	354
RTR	408	951	195	100	112	16,433	9,965	431
SB.	665	1,526	319	176	170	13,589	12,157	426
SHEL	592	1,367	277	159	156	44,165	30,133	429
VOD	940	2,161	437	296	207	89,550	71,121	434

Fields from left to right: stock ticker symbol, total number of events (effective market orders + effective limit orders + order cancellations) in thousands, average number of events in a trading day, number of effective limit orders in thousands, number of effective market orders in thousands, number of order deletions in thousands, average limit order size in shares, average market order size in shares, and number of trading days in the sample.

Table 2. A summary of the bootstrap error analysis described in the text

Regression	Estimated	Standard	Bootstrap	Low	High
Spread intercept	0.06	0.21	0.29	0.25	0.33
Spread slope	0.99	0.08	0.10	0.09	0.11
Diffusion intercept	2.43	1.22	1.76	1.57	1.97
Diffusion slope	1.33	0.19	0.25	0.23	0.29

The columns (left to right) are the estimated value of the parameter, the standard error from the cross-sectional regression in Fig. 10, the one standard deviation error bar estimated by the bootstrapping method, and the one standard deviation low and high values for the extrapolation, as shown in Figs. 3 e and f and 4 e and f.

Supporting Figure 5

Fig. 5. Illustration of the procedure for measuring the price diffusion rate for Vodafone (VOD) on August 4, 1998. On the x axis, we plot the time tin units of ticks, and on the y axis, the variance of midprice diffusion V(t ). According to the hypothesis that midprice diffusion is an uncorrelated Gaussian random walk, the plot should obey V(t) = Dt. To cope with the fact that points with larger values of thave fewer independent intervals and are less statistically significant, we use a weighted regression to compute slope D.

Supporting Figure 6

Fig. 6. Time series (Upper) and autocorrelation function (Lower) for daily price diffusion rate D_t for Vodafone. Because of long-memory effects and the short length of the series, the long-lag coefficients are poorly determined; the figure is simply to demonstrate that the correlations are quite large.

Supporting Figure 7

Fig. 7. Subsample analysis of regression of predicted vs. actual spread. To get a better feeling for the true errors in this estimation (as opposed to standard errors, which are certainly too small), we divide the data into subsamples (using the same temporal period for each stock) and apply the regression to each subsample. (a) The results for the intercept; (b) the results for the slope. In both cases, we see that progressing from right to left, as the subsamples increase in size, the estimates become tighter. (c and d) The mean and standard deviation for the intercept and slope. We observe a systematic tendency for the mean to increase as the number of bins decreases. (e and f) The logarithm of the standard deviations of the estimates against log n, the number of each points in the subsample. The line is a regression based on binnings ranging from m = N to m = 10 (lower values of m tend to produce unreliable standard deviations). The estimated error bar is obtained by extrapolating to n = N. To test the accuracy of the error bar, the dashed lines are one standard deviation variation on the regression, whose intercepts with the n = N vertical line produce high and low estimates.

Supporting Figure 8

Fig. 8. Subsample analysis of regression of predicted vs. actual price diffusion (see Fig. 10), similar to Fig. 7. The scaling of the errors is much less regular than for the spread, so the error bars are less accurate.

Supporting Figure 9

Fig. 9. Market impact collapse under four kinds of axis rescaling. In each case, we plot a normalized version of the order size on the horizontal axis vs. a (possibly normalized) average market impact log(p_t+1) - log(p_t) on the vertical axis. (a) Collapse using nondimensional units based on the model; (b) order size is normalized by its mean value for the sample. (c) Order size is normalized the average daily volume. (d) Order size is multiplied by the current best midpoint price, making the horizontal axis the monetary value of the trade.

Supporting Figure 10

Fig. 10. The variance plot procedure used to determine error bars for mean market impact conditional on order size. The horizontal axis n denotes the number of points in the m different samples, and the vertical axis is the standard deviation of the m sample means. We estimate the error of the full sample mean by extrapolating n to the full sample length.

Supporting Figure 11

Fig. 11. The average market impact vs. order size plotted on log-log scale. (Upper Left and Upper Right) Buy and sell orders in nondimensional coordinates; the fitted line has slope b = 0.26 � 0.02 for buy orders and b = 0.23 � 0.02 for sell orders. In contrast, Lower Left and Lower Right show the same thing in dimensional units, using British pounds to measure order size. Although the exponents are similar, the scatter among different stocks is much greater.