Abstract
This paper develops an inferential framework for matrix completion when missing is not at random and without the requirement of strong signals. Our development is based on the observation that if the number of missing entries is small enough compared to the panel size, then they can be estimated well even when missing is not at random. Taking advantage of this fact, we divide the missing entries into smaller groups and estimate each group via nuclear norm regularization. In addition, we show that with appropriate debiasing, our proposed estimate is asymptotically normal even for fairly weak signals. Our work is motivated by recent research on the Tick Size Pilot Program, an experiment conducted by the Security and Exchange Commission (SEC) to evaluate the impact of widening the tick size on the market quality of stocks from 2016 to 2018. While previous studies were based on traditional regression or difference-in-difference methods by assuming that the treatment effect is invariant with respect to time and unit, our analyses suggest significant heterogeneity across units and intriguing dynamics over time during the pilot program.
Keywords: Missing not at random (MNAR), Weak signal-to-noise ratio, Multiple treatments, Tick size pilot program, Causal inference
1. Introduction
The problem of noisy matrix completion in which we are interested in reconstructing a low-rank matrix from partial and noisy observations of its entries arises naturally in numerous applications. It has attracted a considerable amount of attention in recent years, and a lot of impressive results have been obtained from both statistical and computational perspectives. See, e.g., Candes and Plan (2010); Mazumder et al. (2010); Koltchinskii et al. (2011); Negahban and Wainwright (2012); Chen et al. (2019a, 2020b); Jin et al. (2021); Xia and Yuan (2021); Bhattacharya and Chatterjee (2022) among many others. A common and crucial premise underlying these developments is that observations of the entries are missing at random. Although this is a reasonable assumption for some applications, it could be problematic for many others. In the past several years, there has been growing interest to investigate how to deal with situations where missing is not at random and to what extent the techniques and insights that are initially developed assuming missing at random can be extended to these cases. See, e.g. Agarwal et al. (2020, 2021); Athey et al. (2021); Bai and Ng (2021); Chernozhukov et al. (2023); Cahan et al. (2023); Xiong and Pelger (2023) among others.
This fruitful line of research is largely inspired by the development of synthetic control methods in causal inference. See, e.g., Abadie and Gardeazabal (2003); Abadie et al. (2010); Abadie (2021). The close connection between noisy matrix completion and synthetic control methods for panel data was first made formal by Athey et al. (2021) who showed that powerful matrix completion techniques such as nuclear norm regularization can be very useful for many causal panel data models where missing is not at random. It also helps bring together two complementary perspectives of noisy matrix completion: one focuses on statistical inferences assuming a strong factor structure and the other aims at recovery guarantees with minimum signal strength requirement. The main objective of this work is to further bridge the gap between these two schools of ideas and develop a general and flexible inferential framework for matrix completion when missing is not at random and without the requirement of strong factors.
In particular, we shall follow Athey et al. (2021) and investigate how the technique of nuclear norm regularization can be used to infer individual treatment effects under a variety of missing mechanisms. One of the key observations to our development is the fact that if the number of missing entries is sufficiently small when compared to the panel size, then they can be estimated well even when missing is not at random. For more general missing patterns with an arbitrary proportion of missingness, we can judicially divide the missing entries into smaller groups and leverage this fact by applying the nuclear norm regularization to a submatrix with a small number of missing entries. This is where our approach differs from that of Athey et al. (2021) who suggest applying the nuclear norm regularized estimation to the full matrix. We shall show that subgrouping is essential in producing more accurate estimates and more efficient inferences about individual treatment effects. It is worth noting that it is computationally more efficient to estimate all missing entries together, as suggested by Athey et al. (2021). But estimating too many missing entries simultaneously can be statistically suboptimal. In a way, our results suggest how to trade-off between the computational cost and statistical efficiency.
Our proposal of subgrouping is similar in spirit to the approach taken by Agarwal et al. (2021) who suggested estimating the missing entries one at a time. For estimating a single missing entry, they propose a matching scheme that constructs multiple “synthetic” neighbors and averages the observed outcomes associated with each synthetic neighbor. Separating the observations into different sets of neighbors, however, could lead to a loss in efficiency. For example, when estimating the mean of an matrix with one missing entry, the estimation error of the approach from Agarwal et al. (2021) for the missing entry converges at the rate of , which is far slower than the rate of attained by our method.
Furthermore, we show that, with appropriate debiasing, our proposed estimate is asymptotically normal even with fairly weak signals. More specifically, the asymptotic normality holds if where is the smallest nonzero singular value of the mean of an matrix and is the variance of the observed entries. Our development builds upon and complements a series of recent works that show that statistical inference for matrix completion is possible with a low signal-to-noise ratio when the data are missing uniformly at random. See, e.g., Chen et al. (2019a, 2020b); Xia and Yuan (2021). Our results also draw an immediate comparison with the recent works by Bai and Ng (2021); Cahan et al. (2023) who developed an inferential theory for the asymptotic principle component (APC) based approaches when the signal is much stronger, e.g., . It is worth pointing out that the nuclear norm regularization and APC-based approach each has its own merits and requires different treatment. For example, APC-based methods usually assume that the factors are random and impose moment conditions to ensure that the factor structure is strong and identifiable, whereas our development assumes that the factors are deterministic but incoherent and allows for weaker signals.
Our work is motivated by a number of recent studies on the Tick Size Pilot Program, an experiment conducted by the Security and Exchange Commission (SEC) to evaluate the impact of widening the tick size on the market quality of small and illiquid stocks from 2016 to 2018. See, e.g., Albuquerque et al. (2020); Chung et al. (2020); Werner et al. (2022). The pilot consisted of three treatment groups with a control group: 1) The first treatment group was quoted in $0.05 increments but still traded in $0.01 increments, say, Q(quote) rule was applied, 2) The second treatment group was quoted and traded in $0.05 increments, say, Q(quote) and T(trade) rules were applied, 3) The third treatment group was quoted and traded in $0.05 increments, and also subject to the trade-at rule, say, Q(quote), T(trade), and TA(trade-at) rules were applied. The trade-at rule, in general, prevents price matching by exchanges that are not displaying the best price. The control group was quoted and traded in $0.01 increments. Previous studies (see, e.g., Chung et al., 2020) on the effects of the quote rule (Q), trade rule (T), and trade-at rule (TA) on the liquidity measure are based on traditional regression or difference-in-difference methods and assume that the treatment effect is invariant with respect to time and unit. As we shall demonstrate, this assumption is problematic for the Tick Size Pilot Program data and there is significant heterogeneity in the treatment effect across both time and units. Indeed, more insights can be obtained using a potential outcome model with interactive fixed effects to capture such heterogeneity. To do so, we extend our methodology from estimating a single matrix to the simultaneous completion of multiple matrices, accounting for the multiple potential situations.
The remainder of this paper is organized as follows. Section 2 introduces the method of using the nuclear norm penalized estimation when missing is not at random and provides the convergence rates of the estimator. Section 3 discusses how to reduce bias and provides inferential theory using the debiased estimator. Section 4 shows how our proposed methodology can be applied to infer the treatment effect in the Tick Size Pilot Program and presents the empirical findings of our analysis. Finally, we conclude with a few remarks in Section 5. All proofs and simulation studies are relegated to the supplement due to the space limit.
In what follows, we use , and to denote the matrix Frobenius norm, spectral norm, and nuclear norm, respectively. In addition, denotes the entrywise norm, and the largest norm of all rows of the matrix, i.e., . For any vector denotes its norm. For any set is the number of elements in . We use ° to denote the Hadamard product or the entry-by-entry product between matrices of conformable dimensions. means for some constant and means for some constant means that both and are bounded. indicates for some sufficiently small constant and indicates for some sufficiently small constant . In addition, .
2. Noisy Matrix Completion
Consider a panel data setting where is a approximately low rank matrix. We assume that where is a matrix of rank and is a low rank approximation error such that . We use as the cross-section index and as the time index. Following the convention of the matrix completion literature, we shall assume that the singular vectors of the low rank matrix are incoherent in that there is a such that where and denote the left and right singular vectors of , respectively. The incoherence condition requires the singular vectors to be de-localized, in the sense that entries are not dominated by a small number of rows or columns. Instead of , we observe a subset of the entries of where is a noise matrix following the assumption below.
Assumption [Noise].
are independent, identically distributed zero-mean, sub-Gaussian random variables, i.e., and some constant .
The independence assumption is a common assumption in the matrix completion literature. In the model, the dependence in can be explained by the low rank part in which we can allow arbitrary correlations. As noted in our empirical application, this low rank term works as an interactive fixed effect term, and together with additional control variables, it may account for the large part of the dependence structure in . In addition, the simulation experiment in Section A.4 of the supplement shows that if the dependence in noise terms is mild, our estimator and the corresponding inference method work quite well. However, because our proof strategy is based on the leave-one-out technique (see, e.g., Chen et al. (2019b, 2020a,b)), allowing dependency in noise is challenging. Since this is beyond the scope of this paper, we leave this to future research. Additionally, to relax the identical distribution assumption, we present the extension of our theory to the case of heteroskedasticity in Section G.2 of the supplement.
Let indicate the observed entries: if and only if is observed. The goal of noisy matrix completion is to estimate from . A popular approach to do so is the nuclear norm penalization:
where is a tuning parameter. The properties of are by now well understood in the case of missing completely at random, especially when the entries of are independently sampled from a Bernoulli distribution. See, e.g., Koltchinskii et al. (2011); Chen et al. (2020b). Instead, we are interested here in the situation where is not random.
Situations when missing is not at random arise naturally in many causal panel models. Consider, for example, the evaluation of a program that takes effect after time for the last units. If is the potential outcome under the control, then we do not have observations of its entries for and , e.g., , yielding a block missing pattern as shown in the left panel of Figure 1. A more general setting that often arises in causal panel data is the staggered adoption where units may differ in the time they are first exposed to the treatment, yielding a missing pattern as shown in the right panel of Figure 1. See Athey et al. (2021); Agarwal et al. (2021) for other similar missing patterns that are common in the context of recommendation systems and A / B testing.
Fig. 1.

Two typical observation patterns of the potential outcomes under the control in the causal panel model: Here, the blue area is the observed area, and the white area is the missing area. Missingness occurs because we cannot observe the potential outcomes under the control for the treated entries.
Note that if the entries are observed uniformly at random, then
for sufficiently large and . The right-hand side is minimized by , which justifies as a plausible estimate of . This intuition, however, no longer applies when is not random and has more structured patterns. Our proposal to overcome this problem is dividing the missing entries into smaller groups and estimating each group via nuclear norm regularization. The main inspiration behind our method is the observation that is a good estimate of when there are only a few missing entries, even if they are missing not at random.
It is instructive to start with a single treated period, e.g., . In this case, the number of missing entries is . Denote by and the largest and smallest nonzero singular value of , respectively, and its condition number. The following theorem provides bounds for the estimation error of .
Theorem 2.1. Assume that
;
;
;
-
.
Then, with probability at least , we havefor some absolute constant .
Some immediate remarks are in order. Consider the situation where , and . Ignoring the logarithmic term, the signal-to-noise ratio requirement given by Assumption (i) reduces to which is significantly weaker than those in the existing literature regarding the matrix completion for MNAR (Missing Not At Random) data such as Agarwal et al. (2021); Bai and Ng (2021); Cahan et al. (2023). These papers assume a stronger signal-to-noise ratio, , as noted in Assumptions A and C of Bai and Ng (2021), Assumption 6 of Agarwal et al. (2021).
More specifically, if there is a single missing entry, e.g., , Agarwal et al. (2021) suggest to partition the submatrix into smaller matrices. In particular, their Theorem 2 states that the best estimation error for their estimate is given by
by setting . In contrast, under the assumptions of Agarwal et al. (2021), are bounded and hence the convergence rate of our estimator is
Theorem 2.1 serves as our building block for dealing with more general and common missing patterns, which we shall now discuss in detail.
Single Treated Period.
Note that Assumption (iii) of Theorem 2.1 restricts the number of missing entries not to be large compared to and . In particular, if and , then it requires that . To deal with a larger number of missing entries, we shall leverage this result by splitting the missing entries into small groups and estimating them separately, as illustrated in Figure 2.
Fig. 2.

How to construct the submatrix: We divide the missing entries into groups. For each , we estimate the entries in using the nuclear norm penalized estimation on the submatrix after making the submatrix as described in the right panel.
Specifically, we split the missing entries into small groups, denoted by , and construct the submatrices as illustrated in Figure 2. For each , we estimate , the corresponding submatrix of , using the nuclear norm penalization:
| (2.1) |
where and is the corresponding submatrix of . We shall then assemble these estimated submatrices into an estimate of . Note that each missing entry appears in one and only one of the submatrices and can therefore be estimated accordingly. The entries from in Figure 2, e.g., the principle submatrix of , on the other hand, are estimated for all groups. We can estimate these entries by averaging all of these estimates. Let the smallest nonzero singular value of be , where is the submatrix of corresponding to . Denote by and the -th row of and -th row of , respectively. We can then derive the following bounds from Theorem 2.1.
Corollary 2.2. Assume that
;
;
;
;
-
There are constants such that
where and are the largest and smallest singular value of , respectively.
Then, with probability at least , we havefor some absolute constant .
The main difference from Theorem 2.1 lies in Assumptions (iii) and (v) of Corollary 2.2. Assumption (iii) specifies how large a block can be. In principle, we can always take , that is, recovering one entry at a time so that this condition is trivially satisfied with sufficiently large and . However, there could be enormous computational advantages in creating groups as large as possible because the number of s that need to be computed decreases with increasing group size.
Assumption (v) can be viewed as an incoherence condition to ensure that the singular vectors of are not dominated by either the treated or untreated units. It is easy to see that when there are few missing entries, e.g., , the condition is satisfied by virtue of the incoherence of s. In general, if is exchangeable or if the treated units are uniformly selected, then this condition is satisfied with high probability, at least for sufficiently large , since by means of matrix concentration inequalities (see, e.g., Tropp et al., 2015).
Single Treated Unit.
A similar estimating strategy can also be used to deal with a single treated unit. Without loss of generality, let . Then the fully observed submatrix is . As in the case of a single treated period, we split the missing entries into smaller groups, denoted by , by periods, and estimate them separately as before. Similar to Theorem 2.2, we have the following bounds for the resulting estimate.
Corollary 2.3. Assume that
;
;
-
There are constants such thatThen, with probability at least , we have
for some absolute constant .
General Block Missing Pattern.
We can also apply the grouping and estimating procedure to general block missing structures such as that depicted in the left panel of Figure 1, e.g., , by estimating missing entries one period at a time (or one unit at a time). Denote by the groups of missing units (or periods). The following result again follows from Theorem 2.1:
Corollary 2.4. Assume that
;
;
;
;
- There are constants such that
Then, with probability at least , we have
for some absolute constant .
It is worth noting that both Corollary 2.2 and Corollary 2.3 can be viewed as special cases of Corollary 2.4. It is also of interest to compare the rates of convergence with those of Athey et al. (2021). Athey et al. (2021) considered a direct application of the nuclear norm penalized estimation to the full matrix. Their Theorem 2 states that
ignoring the logarithmic factors and , and . In other words, the estimate could be inconsistent when . On the other hand, the convergence rate of our estimator is given by
up to a logarithmic factor when we assume . Hence, our estimator is consistent as long as diverges. Furthermore, the simulation results in Section A of the supplement also show that applying the nuclear norm penalized estimation to the submatrix indeed performs much better than applying it to the full matrix as long as and are not too small.
Staggered Adoption.
More generally, we can take advantage of our estimation strategy for staggered adoption where there are number of adoption time points, says , and number of corresponding groups of treated units, says . That is, for each , the units in adopt the treatment in the time period . We can utilize the strategy for block missing patterns to estimate the missing entries. More specifically, denote by the submatrix with missing entries corresponding to units in and time periods in , with the convention that , where . To estimate these missing entries, we can assemble a submatrix, denoted by , with units untreated prior to and time periods in , as well as units in and time periods in . As shown in Figure 3, is now the missing block of , and can be estimated as described in the previous case.
Fig. 3.

How to construct the general block missing pattern: Consider the case of and . When we estimate the missing entries in , we make the block missing matrix by assembling four red matrices. Then, we can estimate the missing entries in using the estimation method for the general block missing pattern.
Denote by the groups for missing units in such as the number of units that are untreated prior to , and the smallest singular value of the submatrix . In addition, denote by the submatrix of corresponding to . The performance of the resulting estimate is given by Corollary 2.5.
Corollary 2.5. Assume that
;
-
There are constants such thatThen, with probability at least , we have
for some absolute constant .
It is worth comparing the rates of convergence with those of Bai and Ng (2021) which apply their TW algorithm to the full matrix. For all missing entries, the convergence rates of the estimators in Bai and Ng (2021) are . On the other hand, if we assume , the convergence rate of our estimator is up to a logarithmic factor. Since and for all and , our convergence rate is faster than that of Bai and Ng (2021) except for the estimation of missing entries in part for which both estimates have similar rates of convergence. This shows the advantage of exploiting submatrices for the imputation of missing entries.
Additionally, we consider the case where there are sparse missing entries in the lower left, upper left, and upper right block matrices of the block missing pattern in Section G.1 of the supplement. For the details, please refer to Section G.1.
3. Debiasing and Statistical Inferences
We now turn our attention to inferences. While the nuclear norm regularized estimator enjoys good rates of convergence, it is not directly suitable for statistical inferences due to the bias induced by the penalty. To overcome this challenge, we propose an additional projection step after applying the nuclear norm penalization in recovering missing entries from group :
| (3.1) |
where is the best rank-r approximation of . We now discuss how this enables us to develop an inferential theory for estimating the missing entries. To fix ideas, we shall focus on inferences about the average of a group of entries at a given time period, e.g., , where .
Block Missing Patterns.
We shall begin with general block missing patterns, e.g., if or . Note that both the single treated period and single treated unit examples from the previous section can be viewed as special cases with and , respectively.
Suppose that we are interested in the inference of the average of a group of entries at the time , where and . Similar to before, we split the interesting group, , into smaller subgroups, denoted by with the convention that , and construct the corresponding submatrices as illustrated in Figure 4, and construct if .
Fig. 4.

How to construct the submatrix: The blue area is the observed area and the white area is the missing area. We estimate the entries in using the submatrix as described in the figure.
Recall that is the smallest nonzero singular value of the matrix . The following theorem establishes the asymptotic normality of the group average estimator, .
Theorem 3.1. Assume that
;
;
- There are constants such that
-
and for some constant where .
Then, we havewhere
Staggered Adaption.
More generally, consider the case of staggered adoption when there are number of adoption time points, , and number of corresponding groups of treated units, . As in the previous situation, suppose that we are interested in inference for the group average at time . Denote by the number of units that are untreated until , and by the number of time periods where is untreated, respectively.
We proceed by first splitting into smaller groups, denoted by with the convention that . In doing so, we want to make sure that all units in each subgroup have the same adoption time point, e.g., , as illustrated in Figure 5. Denote by and by the smallest singular value of the submatrix .
Fig. 5.

Submatrix construction: For each , we make the submatrix by putting , and together. In addition, we estimate the entries in using the fully observed part . Denote by and the sets of units and time periods of , respectively.
Theorem 3.2. Assume that for any and ,
- there are constants such that
-
and for some constant .
Then, we havewherewith the convention that .
Variance Estimation.
In practice, to use the results above for inferences, we also need to estimate the variance. To this end, let be the SVD of . Denote by and . They can be viewed as estimates of rescaled left and right singular vectors. However, as such, they are significantly biased and the bias can be reduced by considering instead
We can then use and in place of the left and right singular vector in defining , leading to the following variance estimate
where , and . The following corollary shows that asymptotic normality established in Theorem 3.2 continues to hold if we use this variance estimate.
Corollary 3.3. Suppose that the assumptions in Theorem 3.2 hold. In addition, suppose that for any ,
Then
Since Theorem 3.1 is a special case of Theorem 3.2, the variance estimator can also be used for Theorem 3.1. Specifically, it is enough to change from in to for Theorem 3.1.
4. Application to Tick Size Pilot Program
Our work was motivated by the analysis of the Tick Size Pilot Program, which we shall now discuss in detail to demonstrate how the proposed methodology can be applied in causal panel data models.
4.1. Data and Methods
Background.
In October 2016, the SEC launched the Tick Size Pilot Program to evaluate the impact of an increase in tick sizes on the market quality of stocks. As noted before, the pilot consisted of a control group and three treatment groups:
Control. stocks in the control group was quoted and traded in $0.01 increments;
Q rule. stocks in the Q rule group was quoted in $0.05 increments but still traded in $0.01 increments;
Q+T rule. stocks in this rule group was quoted and traded in $0.05 increments;
Q+T+TA rule. stocks in this group are also subject to the additional trade-at rule, a regulation which makes exchanges display the NBBO (National Best Bid and Offer) when they execute a trade at the NBBO.
This pilot program has attracted considerable attention, and there are a growing number of studies on the impact of these changes on market quality, often represented by a liquidity measure such as the effective spread since its conclusion in 2018. See, e.g., Albuquerque et al. (2020); Chung et al. (2020); Griffith and Roseman (2019); Rindi and Werner (2019); Werner et al. (2022).
Data.
Data for control variables were obtained from the Center for Research in Security Prices (CRSP) and the daily share-weighted dollar effective spread data from the Millisecond Intraday Indicators by Wharton Research Data Services (WRDS). A key control variable introduced by Chung et al. (2020) is TBC which measures the extent to which the new tick size ($0.05) is a binding constraint on the quoted spreads in the pilot periods and is estimated by the percentage of quoted spreads during the day that are equal to or less than 5 cents, which is the new minimum quoted tick size under the Q rule. Specifically, we calculate the percentage of NBBO updates with quoted spread less than or equal to 5 cents for each day. Using the TBC variable, we can check the effect of an increase in the minimum quoted spread (from 1 cent to 5 cents) on the effective spread.
A data-cleaning process similar to Chung et al. (2020) yields a total of stocks with in the control group, in the Q group, in the Q+T group, and in the Q+T+TA group. Following Chung et al. (2020), data from Oct 1, 2015 to Sep 30, 2016 were used as the pre-pilot periods and Nov 1, 2016 to Oct 31, 2017 as the pilot periods, i.e., and for daily data. See Chung et al. (2020) for further discussion of data collection. As is common in previous studies, we consider the daily effective spread in cents as a measure of liquidity. Denote by the potential outcome for stock at time under treatment with the convention that corresponds to the control, the Q rule, the Q + T rule, and the Q + T + TA rule, respectively. The four matrices have block missing patterns, as shown in Figure 6.
Fig. 6.

Missing pattern in the pilot program: The blue area is the observed area and the white area is the missing area. In the case of the controlled situation (), we can observe the outcomes of all units in the pre-pilot periods and those of the control group in the pilot periods. In the case of the treated situation by the treatment , we can only observe the outcomes of the treatment group in the pilot periods.
Model.
Previous studies of the effects of the quote (Q) rule, the trade (T) rule, and the trade-at (TA) rule on the liquidity measure are usually based on traditional regression or difference-in-difference methods by assuming that the treatment effect is constant across all units and time periods. For instance, Chung et al. (2020) postulated if unit receives treatment at time where the potential outcomes
and
| (4.1) |
Here, and are unknown parameters, and is a set of control variables that includes typical stock characteristics like stock prices and trading volumes, and TBC, a variable measuring the extent to which the new tick size ($0.05) is a binding constraint on the quoted spreads in the pilot period. See Section B in the supplement for further details. It is worth noting that, in addition to the treatment effects ( and ), their differences are also of interest, as they represent the treatment effects of quote rule, trade rule, and trade-at rule, respectively.
However, (4.1) fails to account for the significant heterogeneity in the treatment effects across units and time periods. To this end, we shall consider a more flexible model:
| (4.2) |
where is a -dimensional vector of (latent) unit specific characteristics and is the corresponding coefficients of at time in the potential situation . As we shall see later in this section, (4.2) allows us to get more insights into the treatment effects of the pilot program.
One of the key assumptions of Model (4.2) is that the subspace spanned by the left singular vector of for all is included in the subspace spanned by the left singular vector of . Agarwal et al. (2020) propose a subspace inclusion test to check the validity of this assumption. We carried out this test on the pilot data, which confirms this is a reasonable assumption.
We note that similar low-rank models have also been considered by Agarwal et al. (2020) and Chernozhukov et al. (2023) earlier. However, it is unclear how their methodology can be adapted for the analysis of the Tick Size Program. For example, Chernozhukov et al. (2023) impose conditions on the missing pattern that are clearly violated by the pilot data; Agarwal et al. (2020) only study the average treatment effect and so cannot be used to assess the heterogeneity or dynamics of the treatment effects across units and time periods, respectively.
Estimation.
We now discuss how we can apply the methodology in the previous sections to analyze the tick size program, and in particular to estimate and make inferences about (4.2). More specifically, we are interested in estimating the group-averaged treatment effects: for an interesting group of treated units ,
and their differences:
for . Especially, when is a certain unit, it reduces to the individual treatment effect and if is the group of all treated units, it becomes the cross-sectional averaged treatment effect. To this end, we shall derive estimates for under Model (4.2).
First, note that, for this particular application, one of the covariates (TBC) is only present for the pilot periods. Therefore, we cannot hope to estimate the regression coefficient using the pre-pilot data alone, as suggested by Bai and Ng (2021). Nonetheless, under (4.2), follow an interactive fixed effect model:
for some low rank components and therefore the regression coefficient can be estimated at the rate of . See Bai (2009) for details. This is much faster than that of the estimates of . For brevity, we shall, therefore, treat the regression coefficient as known in what follows, without loss of generality.
For , we can apply the method proposed in the previous sections to the potential outcome panel . As illustrated in Figure 6, it has a block missing pattern with if and only if or . As such, we can derive estimates for .
When , we can only observe if unit receives treatment and , so our method cannot be applied directly. Instead, we shall combine all observations from prepilot periods and these observations to form a panel whose entry is if receives treatment and , is , if and is missing otherwise. Let be a matrix whose entry is if , and otherwise. can be viewed as the noisy observation of with a block missing pattern: if and only if unit receives treatment or . Under (4.2), where if and otherwise. Therefore, we can again apply our method to to obtain estimates for .
We shall then proceed to estimate the treatment effects by
Inferences.
We can also use the results from the last section to derive the asymptotic distribution for and . More specifically, let be a matrix that combines all observed outcomes: the first columns of consist of the potential outcomes under the control for the whole periods , the next columns the potential outcomes under the Q rule for the pilot periods , followed by those under the Q+T rule again for the pilot periods , and finally those under the Q+T+TA rule . Note that is also a rank- matrix. Let be its singular value decomposition. Denote by and the -th row vector of and -th row vector of , respectively. In addition, denote by the group of units treated by treatment with the convention that is the control group. Then, under suitable conditions, we have
and where
Similar to before, the variance can be replaced by its estimate. Due to the space limit, we shall defer the formal statements and proofs, as well as derivations of the variance estimator to the supplement.
4.2. Empirical Findings
Fixed Effects vs Interactive Effects.
We begin with some exploratory analyses to illustrate the impact of the pilot program. The top left panel of Figure 7 gives the boxplots of difference in the effective spread, averaged over time, after and before the pilot. There are a few units with differences that are much larger in magnitude than usual. For better visualization, the top right panel zooms in with a difference between −10 cents and 10 cents. Taken together, it is clear that the three treatment groups have a significant impact on the effective spread.
Fig. 7.

Top panels: Boxplot of difference in averaged effective spread after and before the tick size program. Bottom panels: two stocks treated with Q rule and with different treatment effects.
The treatment effect of the pilot, however, differs between units. The bottom panels of Figure 7 show barplots of the time series of the effective spread of two typical stocks. The impact of the treatment is much clearer for the stock depicted in the bottom right panel.
The difference in treatment effect among the units suggests that the interactive effect model is more suitable than the fixed effect model used in the previous studies. Note that the fixed effect model (4.1) can be viewed as a special case of the interactive effect model (4.2) with . We conducted a Hausman-type model specification test to further show that the fixed effect model is inadequate in capturing the heterogeneity of the treatment effect. More specifically, denote our estimator of by and the two-way fixed effect estimator of in Model (4.1) by . We considered the following test statistic for model specification:
where is the group of all treated stocks, , and is the estimator of the asymptotic variance of . Moreover, to test whether is time and unit invariant or not, we also considered the test statistic such that
where .
We derived the large sample distributions of the test statistics under the null and corresponding critical values using the Gaussian bootstrap method (see, e.g., Belloni et al., 2018). And the null hypothesis that Model (4.1) is well specified and the null hypotheses that are time and unit invariant are all rejected at 1% significance level, again indicating that Model (4.1) is misspecified and are time and unit variant.
To further illustrate the heterogeneity of the treatment effect, we compute the estimated unit-specific treatment effect averaged over time: and Figure 8 gives the kernel density estimates of these unit-specific treatment effects for the Q rule, T rule and TA rule respectively. It is evident from these density plots that there is considerable amount of variation and skewness among the estimated treatment effects across units.
Fig. 8.

Kernel density estimates of the estimated unit-specific treatment effect averaged over time.
Note that a key assumption behind the interactive effect model is that the unit specific characteristic remains the same across all treatment groups as well as the control group so that they can be learned from the pre-pilot periods and utilized for the estimation of during the pilot period. This amounts to the assumption that the left singular space of is included in that of . To check the validity of the assumption, we carry out the subspace inclusion test for introduced in Agarwal et al. (2020), and the test statistics are 0.15, 0.19 and 0.11 with corresponding critical values at 95% level 0.43, 0.48 and 0.28. Additionally, we also confirm that the ranks of and are the same for all using the typical rank estimation method (e.g., Ahn and Horenstein, 2013), which implies the validity of this assumption.
The rank test also indicates that is an appropriate choice for the pilot data. The associated is 0.79. This is to compared with the fixed effect model (4.1) whose is 0.67 with the same degrees of freedom. This again suggests that the interactive effect model (4.2) is preferable.
Dynamics of Treatment Effects.
Next, we examine the dynamics of the treatment effects of the Q rule, the T rule, and the TA rule.
To better visualize the dynamics, we plot in Figure 9 the estimated daily treatment effects along with their 95% confidence interval, adjusted with Bonferoni correction. To gain further insights, we also plot in Figure 10 the weekly average of the estimated daily treatment effects, again with their 95% confidence interval adjusted with Bonferoni correction. Note that to do so, we need to consider the estimator of the form
where is a week of interest. We can generalize the inferential theory from the previous section straightforwardly with the new variance:
where
Fig. 9.

The dynamics of the daily cross-sectional average of : For the confidence band, we use the 95% uniform critical value, . The dots denote the daily cross-sectional average of .
Fig. 10.

The dynamics of the weekly cross-sectional average of : For the confidence band, we use the 95% uniform critical value, . The dots denote the weekly cross-sectional average of .
and can be interpreted as the treatment effects of the T rule and the TA rule. As expected by theory in the literature, we have the positive treatment effects of T rule most of the time. The rule has a negative effect on price improvements, as liquidity providers are less likely to offer them when the minimum possible price improvement is larger. For example, if the T rule makes the minimum possible price improvement to be 5 cents, liquidity providers who would have been willing to provide less than 5 cents of price improvements are unlikely to offer any price improvement at all. Since the effective spread is “quoted spread - price improvement”, we can expect that treatment effects of the T rule is positive. Here, we use the following definitions:
, and , where is the national best ask price at time is the national best bid price at time , and is the transaction price.
Interestingly, one can observe that the periods associated with large effects of the T rule usually correspond to large trading volumes. In particular, there were large trading volumes in November, early and mid-December in 2016, March, mid and late June, early August, early September, and late October in 2017, and, by and large, these periods coincide with periods with larger impact of the T rule. In general, the correlation coefficients between the estimated effect of the T rule and the trading volume is 0.33. This suggests that the effect of the T rule becomes stronger when transactions are more active. This agrees with the well-known fact that price improvement is more likely to occur when stocks are actively traded, and therefore the effect of the T rule through price improvement will become amplified and strong when trades are active.
Moreover, we find that the treatment effects of the TA rule are negative most of the time. The TA rule increases visible liquidity by exposing hidden liquidity because, under the TA rule, a venue should display the best bid or ask to execute incoming market orders at the NBBO. It implies a decrease in the quoted spread and a smaller room for price improvements. Chung et al. (2020) expect that the effect on the quoted spread is likely to be greater than the effect on price improvements, and so the TA rule decreases the effective spread. Our result corroborates with their conjecture. Further discussion about the empirical findings is given in Section B in the supplement.
5. Concluding Remarks
This article develops an inference framework for the matrix completion when missing is not at random and without the need for strong signals. One of the key observations to our development is that if the number of missing entries is small enough compared to the size of the panel, they can be well estimated even if missing is not at random. We judicially divide the missing entries into smaller groups and use this observation to provide accurate estimates and efficient inferences. Moreover, we showed that our proposed estimate, even with fairly weak signals, is asymptotically normal with suitable debiasing. As an application, we studied the treatment effects in the tick size pilot program, an experiment conducted by the SEC to assess the impact of tick size extension on the market quality of small and illiquid stocks from 2016 to 2018. While previous studies on this program were based on traditional regression or difference-in-difference methods by assuming that the treatment effect is invariant with respect to time and unit, we observed significant heterogeneity in treatment effects and gained further insights about treatment effects in the pilot program using our estimation method. Lastly, we conducted simulation experiments to further demonstrate the practical merits of our methodology.
Supplementary Material
Acknowledgments
This research was supported by NSF Grants DMS-2015285 and DMS-2052955, and NIH Grant R01-HG01073.
Footnotes
Supplementary Materials and Conflict of Interest
The supplement includes all proofs and simulation studies as well as additional empirical findings. In addition, the authors report there are no competing interests to declare.
References
- Abadie A (2021). Using synthetic controls: Feasibility, data requirements, and methodological aspects. Journal of Economic Literature, 59(2):391–425. [Google Scholar]
- Abadie A, Diamond A, and Hainmueller J (2010). Synthetic control methods for comparative case studies: Estimating the effect of california’s tobacco control program. Journal of the American statistical Association, 105(490):493–505. [Google Scholar]
- Abadie A and Gardeazabal J (2003). The economic costs of conflict: A case study of the basque country. American economic review, 93(1):113–132. [Google Scholar]
- Agarwal A, Dahleh M, Shah D, and Shen D (2021). Causal matrix completion. arXiv preprint arXiv:2109.15154. [Google Scholar]
- Agarwal A, Shah D, and Shen D (2020). Synthetic interventions. arXiv preprint arXiv:2006.07691. [Google Scholar]
- Ahn SC and Horenstein AR (2013). Eigenvalue ratio test for the number of factors. Econometrica, 81(3):1203–1227. [Google Scholar]
- Albuquerque R, Song S, and Yao C (2020). The price effects of liquidity shocks: A study of the sec’s tick size experiment. Journal of Financial Economics, 138(3):700–724. [Google Scholar]
- Athey S, Bayati M, Doudchenko N, Imbens G, and Khosravi K (2021). Matrix completion methods for causal panel data models. Journal of the American Statistical Association, 116(536):1716–1730. [Google Scholar]
- Bai J (2009). Panel data models with interactive fixed effects. Econometrica, 77(4):1229–1279. [Google Scholar]
- Bai J and Ng S (2021). Matrix completion, counterfactuals, and factor analysis of missing data. Journal of the American Statistical Association, 116(536):1746–1763. [Google Scholar]
- Belloni A, Chernozhukov V, Chetverikov D, Hansen C, and Kato K (2018). High-dimensional econometrics and regularized GMM. arXiv preprint arXiv:1806.01888. [Google Scholar]
- Bhattacharya S and Chatterjee S (2022). Matrix completion with data-dependent missingness probabilities. IEEE Transactions on Information Theory, 68(10):6762–6773. [Google Scholar]
- Cahan E, Bai J, and Ng S (2023). Factor-based imputation of missing values and covariances in panel data of large dimensions. Journal of Econometrics, 233(1):113–131 [Google Scholar]
- Candes EJ and Plan Y (2010). Matrix completion with noise. Proceedings of the IEEE, 98(6):925–936. [Google Scholar]
- Chen J, Liu D, and Li X (2020a). Nonconvex rectangular matrix completion via gradient descent without ℓ2,∞ regularization. IEEE Transactions on Information Theory, 66(9):5806–5841. [Google Scholar]
- Chen Y, Chi Y, Fan J, Ma C, and Yan Y (2020b). Noisy matrix completion: Understanding statistical guarantees for convex relaxation via nonconvex optimization. SIAM journal on optimization, 30(4):3098–3121. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Y, Fan J, Ma C, and Yan Y (2019a). Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46):22931–22937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen Y, Fan J, Ma C, and Yan Y (2019b). Inference and uncertainty quantification for noisy matrix completion. Proceedings of the National Academy of Sciences, 116(46):22931–22937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chernozhukov V, Hansen C, Liao Y, and Zhu Y (2023). Inference for low-rank models. The Annals of statistics, 51(3):1309–1330. [Google Scholar]
- Chung KH, Lee AJ, and Rösch D (2020). Tick size, liquidity for small and large orders, and price informativeness: Evidence from the tick size pilot program. Journal of Financial Economics, 136(3):879–899. [Google Scholar]
- Griffith TG and Roseman BS (2019). Making cents of tick sizes: The effect of the 2016 us sec tick size pilot on limit order book liquidity. Journal of Banking & Finance, 101:104–121. [Google Scholar]
- Jin S, Miao K, and Su L (2021). On factor models with random missing: Em estimation, inference, and cross validation. Journal of Econometrics, 222(1):745–777. [Google Scholar]
- Koltchinskii V, Lounici K, Tsybakov AB, et al. (2011). Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5):2302–2329. [Google Scholar]
- Mazumder R, Hastie T, and Tibshirani R (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of machine learning research, 11:2287–2322. [PMC free article] [PubMed] [Google Scholar]
- Negahban S and Wainwright MJ (2012). Restricted strong convexity and weighted matrix completion: Optimal bounds with noise. The Journal of Machine Learning Research, 13(1):1665–1697. [Google Scholar]
- Rindi B and Werner IM (2019). Us tick size pilot. Fisher College of Business Working Paper:2017-03-018. [Google Scholar]
- Tropp JA et al. (2015). An introduction to matrix concentration inequalities. Foundations and Trends[textregistered] in Machine Learning, 8(1–2):1–230. [Google Scholar]
- Werner IM, Rindi B, Buti S, and Wen Y (2022). Tick size, trading strategies, and market quality. Management Science, 69(7):3818–3837. [Google Scholar]
- Xia D and Yuan M (2021). Statistical inferences of linear forms for noisy matrix completion. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 83(1):58–77. [Google Scholar]
- Xiong R and Pelger M (2023). Large dimensional latent factor modeling with missing observations and applications to causal inference. Journal of Econometrics, 233(1):271–301. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
