Auxiliary Materials VULCAN-ODIAC VULCAN emissions data [Gurney et al., 2009] were scaled linearly by EPA regions from 2002 to 2008. VULCAN emissions vary by the day of the week with weekdays (Mon-Fri) having different emissions characteristics from weekends (Sat-Sun). Thus, the VULCAN data for 2002 was shifted to account for the correct day of the week for 2008 as well as adding an additional matching day of the week to account for the leap year. The scaled VULCAN (aggregated to 1° x 1° spatial, 3hrly temporal resolution) emissions over the contiguous USA were then combined with the ODIAC emissions (available at 1° x 1° spatial, annual temporal resolution) [Oda and Maksyutov, 2011] from Canada, Mexico, and Alaska, US, where VULCAN data are not available. Atmospheric CO2 Data Continuous atmospheric CO2 data were collected from 35 towers for the year 2008 from the stations listed in Table S1. The year 2008 was used as it included the large expansion of continuous measurement locations from the Mid-Continent Intensive (MCI) project [Ogle et al., 2006; Miles et al., 2012]. Observations were averaged to three hourly intervals. Data selection was carried out for each tower similar to that of Gourdji et al. [2012] (Table S1). Afternoon data were used for most short towers (<100m) whereas data from all hours of the day were used for taller towers (>300m). Additionally, nighttime data were used for several sites with complex terrain (e.g. Niwot Ridge - NWR). Synthetic CO2 Data The biospheric component of the synthetic data was modeled using output from the CASA-GFEDv2 terrestrial biospheric model at 1° x 1° resolution [Randerson et al., 1997; van der Werf et al., 2006], downscaled from monthly to a 3-hourly resolution using net shortwave radiation and near-surface temperature data [Olsen and Randerson, 2004]. The synthetic biospheric concentration data were created by transporting the fluxes to the observation locations using the STILT-WRF model. The data choices and gaps found in the real data were mirrored in the synthetic data. The synthetic FFCO2 concentration data were created using the merged VULCAN-ODIAC FFCO2 emissions product described above and the same STILT-WRF model. The use of the STILT-WRF for generating synthetic observations is described in more detail in Gourdji et al. [2010]. Adding the two observation vectors (from biospheric and FFCO2 fluxes) resulted in the SD_BFO case. Transport model errors were simulated by adding white noise to the data with a mean of zero and a variance based on the model-data mismatch variance, which varies by observation location and month (Table S1). The model-data mismatch variances were optimized using the real data and the Restricted Maximum Likelihood approach described in more detail in the following section. Model Selection Approach The link between “detecting a signal” and “selecting a variable through BIC” stems from the notion that detection lies in the ability to distinguish a signal from a background using a threshold with a given set of observations. This notion differs from simply examining enhancements above a given background concentration as it also incorporates the attribution of variations in the data to a specific signal or pattern (e.g. FFCO2 emissions patterns). Selection into the best model according to BIC is the requirement or threshold that ensures that the FFCO2 emissions patterns (from the selected region-months) explain a substantial portion of the variability in the observations. The potential variables to be included in the model are FFCO2 -emissions from various region-months. These variables are examined against the atmospheric data in the model selection step (detailed below) to determine if they explain a sufficient portion of the observed variability in the data to warrant inclusion in the statistical model. Model selection was carried out using the geostatistical inverse modeling (GIM) framework and BIC formulation modified from Gourdji et al. [2012]. The model selection framework is similar to that used in multiple linear regression, where the dependent variable here is atmospheric CO2 (“z”) and the independent variables (covariates) are the various atmospheric signatures of FFCO2 emissions from different region-months (“HX”). A modification is made to the setup from Gourdji et al. [2012] in that the drift coefficients (“B_hat”) which are used to scale the covariates (columns in “HX" ) are set equal to unity, because in the current study the goal was to examine the ability of the covariates (FFCO2 emissions from certain region-months) to explain the variability in the data. Setting the drift coefficients to unity enforced that the covariates in “HX" (FFCO2 emissions from a given region-month) could not be scaled using unrealistic combinations or representations of the region-months of FFCO2 emissions. Thus, the information gained from the model selection provided insight into the explanatory power of the actual FFCO2 emissions patterns (as represented by the VULCAN/ODIAC product). To determine whether specific region-months are detected, various combinations of covariates are compared based on their BIC value. The BIC equation with the abovementioned modification becomes: BIC=ln|PSI|+[(z^T)(PSI^-1)(z-HX1)]+pln(n) (S1) where "z" are the (nx1) atmospheric observations, "H" is an (n x m) Jacobian matrix representing the sensitivities of observations "z" to the underlying fluxes, "PSI" = HQH^T + R , "Q" and" R" are the (m x m) prior error covariance and (n x n) model-data mismatch covariance matrices, respectively, "X" is (m x (p+19)) the model matrix containing the “p” selected variables, and "^T" and "^-1" represents the transpose and inverse of a matrix, respectively. "X" contains 19 columns that represent monthly and diurnal constants that are not included in the model selection. There are twelve monthly constants and eight diurnal constants (where each monthly constant also represents one of the diurnal constants for that month). "Q" is modeled as an exponentially decaying function in space and time, as was done by Gourdji et al. [2012], the temporal structure of "Q" allowed for covariance across the same time periods between different days but not across different time periods within or between days. The parameters in "Q" (Table S2) were held constant between cases and were optimized using the Restricted Maximum Likelihood (RML) approach on the CASA-GFEDv2 fluxes as done by Gourdji et al., [2010]. "R" was optimized using the same RML approach using the real atmospheric data from 2008. As described earlier, "X" contains 19 columns that represent constant terms and up to 132 columns that each represent the FFCO2 emissions (Vulcan/ODIAC inventory estimates) from one month and one region (Figure 1). As there are 12 months per year and 11 regions within the domain, the largest possible model would include all 132 columns in "X" representing all of the FFCO2 emissions for one year in NA, along with the 19 columns of constants. If the FFCO2 emissions patterns from a given region-month do not sufficiently help to explain the variability in the atmospheric CO2 signal (more than the penalty term in S1) that region is not included in the “best” model, as determined by BIC. The “best” model is determined using the branch and bound algorithm of Yadav et al. [2013] to identify the subset of the 132 explanatory variables that offers the best explanatory power with the most parsimonious model (i.e. lowest BIC value). Model Performance The variance explained by the “best” model selected using BIC “BIC Model” is compared to one including all 132 region-months of FFCO2 emissions “Full Model” for each case study (Table S3). The R2 of the “Full Model” represents the total amount of variation that the FFCO2 emissions signal explains in the total CO2 signal. The variance explained is calculated based on a form of R^2 that accounts for the spatiotemporal correlation of the data: R^2=1-RSS/TSS (S2) where the residual sum of squares (RSS), the sum of the squared deviations of the data from the model, and total sum of squares (TSS), the sum of the squared deviations of the data from the mean, are written to account for the flux and model-data mismatch covariance matrices, "Q" and" R" , and “Q”, as well as the sensitivity of observations to fluxes, "H" : RSS = (z^T)((PSI^-1) - (PSI^-1)HX(((X^T)(H^T)(PSI^-1)HX)^-1)(X^T)(H^T)(PSI^-1))z (S3) TSS = (z^T)(PSI^-1)z (S4) where all other terms are previously defined. The goal of the model performance analysis is to verify whether BIC is able to select a limited model that includes only those region-months that substantially contribute to explaining the variability of the total FFCO2 emissions signal (i.e. Full Model). As expected, the Full Model always explains more of the variability than the model selected using BIC, BIC Model, because the Full Model includes all of the possible covariates/variables. In the SD_OFO case, the total FFCO2 emissions signal explains 100% of the variation in the synthetic observations (Table S3, Row 1, Column 4) whereas BIC selects 109 region-months in this case, which together explain 95% of the total signal. For the other examined cases, the Full Model explains a decreasing fraction of the total variability (Table S3, Column 4), as the FFCO2 signal is confounded by other signals (e.g. biospheric fluxes and/or model-data mismatch errors). In all cases, however, the BIC Model is able to explain a similar fraction of the variability explained by the Full Model (>80%) (ratio of Table S3, Column 2 to Column 4), but using a substantially lower number of covariates, p. This result supports the notion that detectability can be linked to selection into the model by BIC as the covariates selected into the model by BIC consistently explain a large fraction of the variability from the FFCO2 signal. Clearly, the models in the more simplistic cases explain a larger portion of the variability than the more complex cases. This is due to the reduced ability of FFCO2 emissions to explain the variability in the observed signal as the FFCO2 signal becomes more obstructed. Accordingly, the ability to detect the FFCO2 emissions in the more complex cases also reduces, as evidenced by the lower number of selected variables (“p”) for the more complex cases. Supplemental References Gourdji, S. M. et al. (2012), North American CO2 exchange: inter-comparison of modeled estimates with results from a fine-scale atmospheric inversion, Biogeosciences, 9(1), 457–475, doi:10.5194/bg-9-457-2012. Gourdji, S. M., A. I. Hirsch, K. L. Mueller, V. Yadav, A. E. Andrews, and A. M. Michalak (2010), Regional-scale geostatistical inverse modeling of North American CO(2) fluxes: a synthetic data study, Atmospheric Chem. Phys., 10(13), 6151–6167, doi:10.5194/acp-10-6151-2010. Gourdji, S. M., K. L. Mueller, K. Schaefer, and A. M. Michalak (2008), Global monthly averaged CO2 fluxes recovered using a geostatistical inverse modeling approach: 2. Results including auxiliary environmental data, J Geophys Res, 113, D21115, doi:10.1029/2007jd009733. Gurney, K. R., D. L. Mendoza, Y. Zhou, M. L. Fischer, C. C. Miller, S. Geethakumar, and S. de la Rue du Can (2009), High Resolution Fossil Fuel Combustion CO2 Emission Fluxes for the United States, Env. Sci Technol, 43(14), 5535–5541, doi:10.1021/es900806c. Miles, N. L., S. J. Richardson, K. J. Davis, T. Lauvaux, A. E. Andrews, T. O. West, V. Bandaru, and E. R. Crosson (2012), Large amplitude spatial and temporal gradients in atmospheric boundary layer CO2 mole fractions detected with a tower-based network in the U.S. upper Midwest, J. Geophys. Res., 117, 13 PP., doi:201210.1029/2011JG001781. Mueller, K. L., S. M. Gourdji, and A. M. Michalak (2008), Global monthly averaged CO(2) fluxes recovered using a geostatistical inverse modeling approach: 1. Results using atmospheric measurements, J. Geophys. Res.-Atmospheres, 113(D21), doi:10.1029/2007JD009734. Oda, T., and S. Maksyutov (2011), A very high-resolution (1 km x 1 km) global fossil fuel CO2 emission inventory derived using a point source database and satellite observations of nighttime lights, Atmospheric Chem. Phys., 11(2), 543–556, doi:10.5194/acp-11-543-2011. Ogle, S., K. Davis, A. Andrews, T. West, R. Cook, R. Parkin, J. Morisette, S. Verma, and S. Wofsy (2006), Science Plan: Mid-Continent Intensive Campaign, report, U.S. Global Change Res. Program, North Am. Carbon Program, Greenbelt, Md., [online] Available from: http://www.nacarbon. org/nacp/mci.html. Olsen, S. C., and J. T. Randerson (2004), Differences between surface and column atmospheric CO2 and implications for carbon cycle research, J. Geophys. Res. Atmospheres, 109(D2), n/a–n/a, doi:10.1029/2003JD003968. Randerson, J. T., M. V. Thompson, T. J. Conway, I. Y. Fung, and C. B. Field (1997), The contribution of terrestrial sources and sinks to trends in the seasonal cycle of atmospheric carbon dioxide, Glob. Biogeochem. Cycles, 11, 535, doi:10.1029/97GB02268. Van der Werf, G. R., J. T. Randerson, L. Giglio, G. J. Collatz, P. S. Kasibhatla, and A. F. Arellano, Jr. (2006), Interannual variability in global biomass burning emissions from 1997 to 2004, ATMOSPHERIC Chem. Phys., 6, 3423–3441. Yadav, V., K. L. Mueller, and A. M. Michalak (2013), A backward elimination discrete optimization algorithm for model selection in spatio-temporal regression models, Environ. Model. Softw., 42, 88–98, doi:10.1016/j.envsoft.2012.12.009.