Abstract
Atmospheric state analysis is a difficult scientific problem due to the chaotic nature of the atmosphere. Data assimilation is a framework for generating an accurate state analysis of a physical system using probability density functions (PDFs) describing uncertainty of information on the state of the physical system. However, since PDFs cannot be deduced theoretically, those used in data assimilation of atmospheric state analysis are based on empirical tunings. This PDF uncertainty limits the theoretical consistency and accuracy of atmospheric state analysis and that of all atmospheric sciences. In this study, we constructed a highly accurate and theoretically consistent atmospheric state analysis by objectively estimating the PDFs of all datasets (forecasts and observations) under the Gaussian approximation. We show that an ensemble of data assimilations with 192 members using the four-dimensional variational method and sample statistics obtained with the data assimilation theory (Desroziers’ method) can generate more accurate objective Gaussian PDFs, including flow-dependent forecast error structures. Numerical experiments of atmospheric state analysis and forecasts using objective PDFs were conducted and compared with those using conventional empirical PDFs. The objective PDFs had smaller error variances for most data (about 34% of those of CNTL on average) and larger observation error correlations for satellite radiances, where the strongest correlation was greater than 0.8. The analysed atmospheric states are systematically different, such as a cooler (exceeding 1.2 K) and wetter (exceeding 1.2 g/kg) low troposphere in regions characterized by low-level clouds off the west coast of the continents. The theoretical consistency evaluated by the chi-square-based tests showed a clear improvement from 16 to 95%. The forecast accuracy was improved globally up to 9%, with 95% statistical significance. The tropical cyclone track forecast accuracy was also improved about 20%.
Subject terms: Climate sciences, Environmental sciences, Natural hazards, Planetary science, Astronomy and planetary science
Introduction
Atmospheric state analysis is a difficult scientific problem due to the chaotic nature of the atmosphere1. In a chaotic system, small perturbations in an initial state grow rapidly; therefore, the state analysis of the chaotic system needs to be sufficiently accurate to detect such small perturbations. This requires detailed observations and numerical model description of the system. As a result, we need to solve a state analysis problem with huge degrees of freedom compared to our observation and computation ability. However, knowing the state of a physical system is essential to understanding the system. Data assimilation (DA) is a framework for generating an accurate state analysis based on the relationships between probability density functions (PDFs) describing the uncertainty of information on the state of a physical system2,3. Therefore, accurate PDFs are essential for accurate atmospheric state analysis. The information sources of atmospheric state analysis using DA in a state-of-the-art global numerical weather prediction (NWP) system are observations (such as earth-observing satellites), forecasts by a numerical model of the atmosphere, and physical laws, where the number of observations and degrees of freedom of the numerical model are about 106 and 108, respectively4. The PDFs used in NWP and climate reanalysis can be approximated by Gaussian distributions. Although the true PDF is unknown, the Gaussian approximation is supported as a first-order approximation by the central limit theorem, sample statistics on real data, and NWP accuracy using the Gaussian approximation. To improve PDFs under the Gaussian approximation has been an important research theme in DA. In addition, there are studies on DA using non-Gaussian background PDFs in high-dimensional systems5,6. Atmospheric state analysis using DA has provided a more accurate atmospheric state analysis than both observations and forecasts.
However, since PDFs cannot be deduced theoretically, those used in DA for atmospheric state analysis contain a huge number of empirical tuning parameters. When the PDF is approximated by a Gaussian distribution, the error covariance matrix (ECM) contains a huge number of empirical tuning parameters. This uncertainty limits the theoretical consistency and accuracy of atmospheric state analysis, NWP, climatological analysis and prediction, and other atmospheric sciences developed by referring to these analyses. The purpose of this study is to overcome this limitation. Since the ECM of each dataset is a component of one ECM that represents one joint PDF as a whole (hereafter referred to as the unity of the ECM), objective estimation of the ECMs of all datasets (all observations and background variables) is essential.
Here, we define objective estimation as follows. Objective estimation is based on the physical and mathematical properties (referred to as theoretical properties) of the object to be estimated. Since the theoretical properties of the estimation target are generally not completely known, the theoretical properties used for the actual objective estimation include approximations. The higher the accuracy of this approximation, the smaller the estimation error and the smaller the need to make additional adjustments to the estimation result. An estimation with higher approximation accuracy of the theoretical properties of the object used for estimation is more objective estimation. For example, let us compare the objectivity of the National Meteorological Center (NMC) method7 and the ensemble of DA (EnDA)8–10 as estimation methods of the background ECM (BECM). First, in both methods, an ensemble of forecast errors is approximately generated based on the property that the BECM is a covariance matrix of forecast errors. The theoretical property approximation that the NMC method uses for estimation is an approximation of the spatiotemporal uniformity of forecast error statistics. On the other hand, the approximation used by EnDA for the estimation is to represent the time evolution of the BECM using suboptimal ECMs, such as a BECM estimated by the NMC method. As explained later, it can be expected that this approximation will not strongly affect the estimation results, and previous research supports this. The BECM estimated by the NMC method is static and horizontally uniform. The BECM estimated by EnDA includes spatiotemporal structure. Therefore, the EnDA estimation is more objective than estimation by the NMC method.
ECMs used in the atmospheric state analysis by DA have been studied as a major research theme in atmospheric science and developed in NWP. In state-of-the-art DA for global atmospheric state analysis in NWP and climatological reanalysis, ECMs describing Gaussian PDFs have been constructed using statistical samples, such as ensemble forecasts, that follow these PDFs. A BECM describing the forecast error PDF has been constructed using two types of ensembles. The first is a forecast error ensemble approximated as the forecast differences between T-h and T + DT-h forecasts, assuming spatiotemporal homogeneity of the background error statistics (the NMC method)7, where one typical value set of T and DT are both 24. This ensemble is designed to represent the climatological structures of background errors, where a typical ensemble size is O (100). To construct a BECM from this ensemble, rescaling of error amplitudes and assumptions of correlation structures, such as geostrophic balance, are needed. The second is ensemble forecasts generated by ensemble Kalman filter (EnKF)11,12 or EnDA8–10 to represent spatiotemporal (flow-dependent) structures of background errors, where a typical ensemble size is O (100). To construct a BECM from this ensemble, rescaling of ensemble spreads and possibly modifying correlation structures are needed to maintain analysis accuracy. This is because a raw BECM derived directly from ensemble forecasts does not have sufficient accuracy for DA in NWP. Only a few studies have shown that the fully flow-dependent BECM improves forecast accuracy compared with the four-dimensional variational data assimilation (4D-Var) with a climatological BECM. Even in these studies, inflated ensemble spreads were used13–15. All other previous studies and operational NWP systems use BECMs that include rescaling and approximated climatological structures to obtain better accuracy than the 4D-Var with a climatological BECM10,16. These requirements of rescaling ensemble spreads and replacing correlation structures would be due to their ensemble generation method and the empirical tunings applied to their ECMs. For example, (a) EnKF using observation space localization cannot fully assimilate non-local observations, such as satellite radiance observations, which are the main observation information sources of global DA for NWP; (b) and the three-dimensional variational data assimilation (3D-Var) with a climatological BECM cannot represent the flow dependence of the BECM; and (c) when the time evolution of the BECM is represented by ensemble forecasts without using an adjoint model, localization leads to physical imbalances17. Furthermore, if a BECM is objectively estimated using ensemble forecasts and observation ECMs (OECMs) are kept empirical, then empirical ad hoc tunings would be needed to balance them.
An OECM has been constructed mainly using an observation error ensemble approximated as the differences between observations and background fields (forecasts), observation minus background (OmB), and empirical tunings for each observation data type to maintain analysis and forecast accuracy. More accurate estimation of OECMs based on sample statistics obtained with the DA theory18 (the D05 method) is recently started for a few selected observations19–22. The method uses three datasets, observations, forecasts, and analyses to estimate the OECMs. However, DA requires an ECM that describes the overall relationships between errors, including those between different datasets; thus, an objective DA cannot be achieved with such objective estimation targeting only specific datasets. In fact, these studies needed empirical tunings to maintain analysis and forecast accuracy. Recently,23 tried objectively constructing all ECMs (BECM and OECMs) in DA for global NWP, and showed that the analysis accuracy is generally improved compared to conventional empirical ECMs. However, in their study, one tuning parameter was still needed to avoid partial degradation of forecast accuracy in the extratropical northern hemisphere (NH, north of 20° N) in the low-level troposphere, and a climatological BECM component was still used. A single tuning parameter was introduced to balance the observation impacts in the variational bias correction. This empirical tuning likely originated from the climatological BECM since, although the magnitude of the BECM was objectively estimated, covariance structures were climatological.
All existing ECM estimation methods, including the D05 method and EnDA, have estimation errors because the assumptions of each method are not fully satisfied in real NWP systems. In fact, the D05 method cannot estimate true ECMs when the Kalman gain used in the NWP system is not the true Kalman gain24, and the EnDA and EnKF also cannot estimate true BECMs when OECMs used in the DA are not the true OECMs. However, what is important for the ECM estimation is whether ECMs closer to the true ECMs than original ECMs can be obtained23. Previous studies have shown that this is the case for the EnDA and D05 methods10,19–23. We never have true ECMs, and a possible scientific approach is implicit one, as follows. (1) Estimate ECMs using imperfect estimation methods, (2) Use the estimated ECMs in NWP DA, and evaluate changes in analysis and forecast accuracy and theoretical consistency. (3) If these are improved, we can conclude that the estimated ECMs are more accurate than the original ECMs. Otherwise, the estimation has to be improved, such as adding estimation processes to better satisfy the assumption of the method, and then back to (1). This approach has been used in many studies, including those mentioned above. However, in these studies, only the ECMs of specific datasets, such as specific satellite sensors, were estimated, rather than the ECMs of all datasets (all observation datasets and background field variables). In this case, even if the objectively estimated ECM of the specific datasets becomes accurate, if the ECMs of the other datasets remain inaccurate, the ECM as a whole will not be accurate due to the unity of the ECM. For example, if only the OECMs of all observation datasets are objectively estimated, and the BECM remains inaccurate, the ECM as a whole will not be accurate.
This study aims to construct atmospheric state analysis using fully objective ECMs with higher accuracy and theoretical consistency than that with empirical ECMs. This has not been performed in previous studies, as described above. Here, fully objective ECMs mean that the ECMs estimated based on data assimilation (statistical interference) theory and no empirical tuning (trial and error) are applied to the estimation. We objectively estimate the ECMs of all datasets (observations and forecasts) under the Gaussian approximation. A BECM is objectively estimated by EnDA with 192 members using 4D-Var, which can adequately assimilate non-local observations, such as satellite radiance observations, and solve state analysis problems without introducing artificial spatiotemporal divisions of the analysis space or observation data. OECMs are estimated using an ensemble of observation error samples generated by the differences between observations, forecasts, and analyses based on the DA theory following the D05 method. Numerical experiments of atmospheric state analysis and forecasting using the objective ECMs (referred to as TEST) and conventional empirical ECMs (referred to as CNTL) were conducted on the operational global NWP system of Japan Meteorological Agency (JMA)25 to evaluate the effects of objective ECMs on analysis and forecast accuracy, and theoretical consistency. In this study, the strong constraint 4D-Var approach that assumes that model errors are much smaller than background errors is used, which has also been used in the previous studies cited above.
The remainder of this paper is organized as follows. “Methods” describes the objective ECM estimation methods and the design of the numerical experiments. “Results” presents the results of the ECM estimation and numerical experiments. “Summary and conclusions” provides the summary and conclusions. Appendix A describes how to verify the theoretical consistency of the estimated ECMs.
Methods
Atmospheric state analysis by DA
Atmospheric state analysis by DA2,3,26,27 is based on Bayes’ theorem, that is the identity between PDFs describing the uncertainty of information of the atmospheric state, as follows.
| 1 |
where, x represents the discretized atmospheric state (N-dimensional vector), and z represents our information of the atmospheric state (Z = N + -dimensional vector) that includes observations (y: -dimensional vector) and a background state (: N-dimensional vector). The typical values of N and in the state-of-the-art NWP system are O (108) and O (106), respectively. is the PDF that describes the probability of occurrence of , and is the conditional PDF that describes the probability of occurrence of a under the occurrence of b. Here, a and b represent arbitrary events, such as x and z. These PDFs can be adequately approximated using Gaussian distributions owing to the central limit theorem. This is because the observation process performs multiple operations on the input signal to generate the observed values, and the NWP models and observation operators perform a huge number of operations on the input data to generate the predicted fields and first guesses in the observation space. Here, both the inputs and operators have errors. Thus, the observed and predicted values are the sums of random variables whose probability distributions approach Gaussian distributions because of the central limit theorem, including its extended versions28,29. Indeed, the fact that the OmB statistics for many data are close to a Gaussian distribution is consistent with this. Ultimately, the validity of the Gaussian assumption must be evaluated based on the accuracy of the analysis and predicted fields obtained.
Under the Gaussian approximation, assuming there is no correlation between background and observation errors, the atmospheric state with a maximum probability can be estimated by minimizing the following cost function:
| 2 |
where, and R are the BECM (N × N real symmetric matrix) and the OECM ( × real symmetric matrix), respectively, is the observation operator, and the superscripts T and − 1 are the transpose and inverse of a matrix, respectively. When the atmospheric general circulation model (AGCM) is included in , this method is called 4D-Var30–35. The cost function (2) can be effectively minimized using adjoint codes and numerical nonlinear minimization algorithms, such as the quasi-Newton and conjugate gradient methods36–38. Here, we also assumed that the background errors and observation errors are independent, and is a uniform distribution. This means that we do not use climatological information. Note that when is symmetric about and such as Gaussian PDF, we can set and in (1), and we obtain . This is an alternative form of Eq. (1) the same as the formulation in26. Note that in26, is directly given as the Gaussian distribution whose mean is and ECM is B, where no relationship with is given. Both formulations using Bayes' theorem result in the same cost function (2) when the background and observation errors are independent. Equation (1) is symmetrical with respect to the observations and the background field.
Objective construction of ECMs
Here, the objective estimation methods of ECMs are described. An ECM representing a Gaussian PDF of an arbitrary stochastic variable can be approximated as follows:
| 3 |
Here, denotes the expectation value of . We assume is equal to its true value the second equal denotes the approximated equal; is the normalized ensemble deviation matrix given as , where E is the number of ensemble members and denotes the ensemble average of . Each column of represents each ensemble member obeying the PDF, and is the arbitrary operator acting on the raw ensemble-based ECM, , which introduces some external balances into the raw ECM to approximate , such as spatial localization39–43, climatological average operations (the NMC method)7, and spatiotemporal averages18. The dimensions of for , , and are N × E, N × E, and × E′, respectively. Here, the number of ensemble members of the observation data is denoted as E′. In this study, the NMC method and the spatial localization are used for the BECMs in EnDA and deterministic DA, respectively. The spatiotemporal average is used for the OECM estimation. is given by the ensemble forecasts for the BECM estimation and the observations, analyses, and forecasts for the OECM estimation. The details of the estimation are given below.
BECM construction by EnDA with perturbed observation method
For the BECM construction, we use the EnDA with the perturbed observation method8,9. In the perturbed observation method, first, OECMs are simulated because they have a simpler structure than that of BECM. Subsequently, the analysis ECM and BECM are generated by an ensemble of analyses and forecasts in DA cycles, where each DA member uses different observation error realizations obeying the observation error statistics, and each DA is independent. The EnDA can estimate the true BECM only when the OECMs used for the perturbed observation generation are the true OECMs, and this is not satisfied for real NWP DA since we never know the true OECM. Therefore, EnDA never estimates the true BECM (“Introduction”). However, since EnDA calculates the time evolution of ensemble members using the NWP model, the estimated BECM reflects approximately correct flow dependence of the BECM. This can be qualitatively shown in the formulation of the BECM estimation by EnDA, as follows:
| 4 |
Here, , and are the BECM and the analysis ECM estimated by the EnDA and the OECM used in the perturbed observation method for the analysis at time t. K is the Kalman gain of the DA system, H is the tangent linear (TL) observation operator, M is the TL NWP model, , the subscript denotes analysis time, , and the time of the linear operator is defined as that of its argument, and n denotes the time satisfying , where is the Frobenius norm of . The existence of such n is expected from idealized simple case consideration and many real case experiments. For an idealized consideration, we assume that ECMs and M are diagonal, H = M, most singular values of M are less than 1 and components with singular values greater than 1 vary every time. Then, since eigenvalues of are less than 1 in average, close to 0 for large n. In real cases, as many data assimilation experiments have shown, two weeks are enough both for a spin-up term of observing system experiments44 and for a term forecasts to loose information of initial state45. Here, note that is a nonlinear function of because the trajectory of each TL operator depends on . Equation (4) shows is constructed as the evolved by M, K, and L. Since causes large structure changes of its input in hours, which is known as the nonmodal growth46,47, does not strongly depend on . Therefore, Eq. (4) qualitatively shows that the EnDA can estimate an a BECM close to the true BECM compared to the original BECM used in the DA system to some extent even when is not the true OECM. This interpretation of the EnDA agrees with the results of previous studies showing improved analysis accuracy by using the BECM estimated by the EnDA with suboptimal 9,10,15. This also agree with the well-known fact that 4D-Var with climatological BECM estimated by the NMC method can achieve improved flow dependent structures of BECM and analysis accuracy by operating and . The numerical experiments shown in later sections will present quantitative verification of the BECM estimation by the EnDA. Note that (4) does not include model error covariance matrix term because, in this study, the strong constraint 4D-Var approach that assumes model errors are much smaller than background errors is employed (“Introduction”).
In this study, we used a 192-member ensemble of 4D-Var. Since the ensemble size is much smaller than the degrees of freedom of the discretized atmosphere (N), additional information is needed to generate the BECM. We use spatial localization for the additional information formulated as a localization matrix that expresses that the error correlation between two spatiotemporal points is a decreasing function of the distance between them (locality of error statistics). The BECM is given by (3) as follows:
| 5 |
Here, is the normalized forecast (background) ensemble deviation matrix (N × E-dimensional) given as . E is the number of ensemble members, and each X column represents each ensemble member obeying the background error statistics. C is the spatial localization matrix (N × N-dimensional) given as the elementwise operator , where d is the physical distance between two components in the BECM, and D is given as 750 km and 2 km in horizontal and vertical directions, respectively. We determined the values of D such that the distance at which the correlation obtained by the ensemble becomes indistinguishable from spurious correlations due to sampling noise is the half-width at half-maximum (HWHM) of the localization function. The procedure for determining the value of D is as follows. First, we determined the minimum correlation value (Cmin) that can be calculated with 95% statistical significance (the null hypothesis of zero correlation can be rejected at the 5% significance level) using the 192-member ensemble as 0.142 based on the t-test for correlation48. Second, we estimated a correlation function and its standard deviation for temperature at 500 hPa from the raw ensemble, where s is the distance between two points. Then, the distance that satisfies was estimated. The meaning of is that at distances greater than , correlations estimated by the ensemble are indistinguishable from sampling errors, even when considering one standard deviation width fluctuation. Finally, D was determined such that this was the HWHM of the localization function (the distance at which the localization function has a value of 0.5). The Gaussian function has been widely used in previous studies, and the horizontal value of D determined here is the same value used in13,14, and the vertical D value is close to that of49. The operator denotes the elementwise product (Hadamard product, Schur product). We did not apply any additional tuning to (5), such as rescaling of variances.
To efficiently minimize the cost function and eliminate the inverse calculation, the square root of the BECM, L, is used in 4D-Var. The (i, m, α) component of L is given as follows:
| 6 |
Here, the row and column of L are denoted by i and (m, α), respectively43; Note that each element of L can also be represented by normal two indices by introducing a one-dimensional column index representing the two-dimensional column indices as ; is the (i,m) component of X; is the (i, α) component of S; S is the N × F matrix satisfying , where C is the localization matrix in (5); and F is the number of modes used for the C expression43. In this study, S is constructed by eigenvalue decomposition of C using the top 99% of eigenmodes. The resulting rank of the BECM, M = E × F, used in this study is about 3.7 million. Although it is difficult to directly handle matrices of rank 3.7 million such as inverse decomposition and eigenvalue decomposition, variational methods can handle matrices of this order.
OECM construction using the D05 method
For OECM objective construction, we use the D05 method, as the same as23. The OECM is estimated based on (3) as follows:
| 7 |
Here is the normalized observation ensemble deviation matrix (P × E’ dimensional), indicates the observation error samples, and denotes the expectation value of approximated by the spatiotemporal average. is the vector that satisfies the two conditions, = 0 and = 0. In the observation space, the forecast and analysis error vectors approximately satisfy the conditions for and , respectively (these relationships are summarized in Fig. 1 of18, and the OECM can be estimated as follows:
| 8 |
Fig. 1.
Comparisons of the objective ECMs (TEST) and conventional empirical ECMs (CNTL). The diagonal components of the BECM (TEST) and the BECM (CNTL) in standard deviation (SD) are shown for (a) zonal wind (m/s), (b) water vapor mixing ratio (g/kg), and (c) temperature (K), where the blue and red lines show the standard deviations of BECMs of TEST and CNTL, respectively. The horizontal axis shows the values of the background error standard deviation, and the vertical axis shows pressure levels in hPa. Panel (d) shows the ratios of the objective observation error standard deviations against those of empirical ones, where the horizontal axis shows the observation dataset names (see “Methods”) and the vertical axis shows the values of the ratio of error standard deviations. Panel (e) and (f) show error correlations in the OECM of the Advanced Microwave Scanning Radiometer 2 (AMSR2) sensor from the Global Change Observation Mission—Water “SHIZUKU” (GCOM-W1) satellite and the Special Sensor Microwave Imager/Sounder (SSMIS) sensor from the Defense Meteorological Satellite Program F17 (DMSP F17) satellite, respectively. The horizontal and vertical axes show the channels of the sensor in the same order, and colour shades represent values of correlations.
Here, a and b are the analysis and forecast error vectors in the observation space (P dimensional vector), respectively. Although, the D05 method has estimation errors as mentioned in “Introduction”, qualitative validity of the D05 method in real suboptimal DA can be considered by rewriting Eq. (8) in the filtering form. That is , where and R are the OECM used in DA and estimated by the D05 method, respectively, and are the OmB covariance matrix used in DA and the true OmB covariance matrix, respectively. Therefore, the D05 method estimates the ECM by multiplying the original ECM by the coefficient matrix that represents the mismatch between system’s and true OmB covariance matrices. The method basically works to correct suboptimality of ECMs, where the coefficient matrix is not exactly correct one for each ECM, but averaged one for the OmB covariance matrices. The validity of OECMs estimated by this method has been demonstrated for some sensors in real NWP systems (“Introduction”). The numerical experiments will be shown in later sections present quantitative verification of this estimation. Note that OECMs estimated by the D05 method generally have small asymmetric components and positive definiteness is not guaranteed due to sampling errors and suboptimality of real DA systems. In this study, OECMs estimated by the D05 method were used in numerical experiments (“Results” and “Summary and conclusions”) after symmetrized and checked positive definiteness. All eigenvalues of ECMs estimated by the D05 methods in this study were positive.
Numerical experiment design
Numerical experiments were conducted on the global NWP system of the Japan Meteorological Agency25. First, the specifications of this NWP system are as follows. The spatiotemporal resolution of the AGCM is approximately 20 km horizontally, 100 layers vertically from the surface to 0.01 hPa (top of the mesosphere), and temporally integrated with a time step of 400 s. The horizontal resolution of the adjoint and tangent linear NWP models used in 4D-Var is reduced to about 60 km. The DA window length is 6 h. The BECM is constructed based on the NMC method7. The OECMs are given as diagonal matrices, where variances are constructed using empirical tunings. The observation datasets assimilated in the 4D-Var are described in25, and their impacts on NWP accuracy are described in50. The main observation datasets are summarized as follows: (1) temperature-sensitive microwave radiances (MW-T), (2) water vapor-sensitive microwave radiances (MW-WV), (3) infrared radiances by hyper-spectral sounders (HSS), (4) GNSS radio-occultation observations (GNSSRO), (5) infrared radiances by geostationary satellites (CSR), (6) atmospheric motion vector (AMV), (7) GNSS surface observations (GNSSSF), (8) aviation observations (AVIATION), (9) radiosonde observations (SONDE), (10) surface pressure observations (PS), and (11) wind profiler observations (WPR). The variational bias correction scheme (VarBC)51,52 is used for the bias correction of radiance observations. In this study, the horizontal resolution of the AGCM and adjoint and tangent linear NWP models were reduced to 60 km and 120 km, respectively, to reduce the computational cost.
Second, we extended this JMA NWP system to generate objective ECMs, as follows. An ensemble of atmospheric states obeying the background error statistics was constructed using EnDA with the perturbed observation method8,9. As described in “BECM construction by EnDA with perturbed observation method”, EnDA generates an improved BECM by filtering suboptimal ECMs (B and R). In this study, the ECMs used in EnDA are the same as those used in CNTL. In the perturbed observation method, first, an ensemble of observation data is generated by simulating OECMs, which have simpler error structures (nearly diagonal) than the analysis and forecast (background) ECMs. Second, analysis and forecast ensembles satisfying the analysis and forecast PDFs, respectively, are generated in an ensemble of DA cycles using 4D-Var with different observation data realizations (EnDA)9. We constructed an EnDA with 192 members, as in15. Using this ensemble, we constructed an objective BECM based on Eq. (5). The OECMs were constructed using the D05 method, as in23, where non-diagonal components are also estimated. The estimated inter-channel error correlations for satellite radiance observations were directly introduced into DA as non-diagonal OECMs, and the estimated horizontal correlation lengths were applied as objective thinning distances of radiance observations sensitive to temperature, as in23. In this study, this results in four times denser use of these data. As described in the introduction, we aim to realize objective DA by constructing objective ECMs and demonstrate its accuracy and theoretical consistency. Achieving improved atmospheric state analysis while keeping the observation information constant is not the purpose of this study, although, as shown in23, the objective DA also solves this problem as a result.
Numerical experiments of the analysis and forecast cycles using these two NWP systems were conducted to evaluate the effects of the objective ECMs on NWP accuracy and validity (Table. 1). We refer to the numerical experiment using the new objective ECMs as TEST, where the objective thinning distances of the radiance observations sensitive to temperature were also used, and those using the empirical ECMs as CNTL. In TEST, no empirical ECMs were used. The numerical experiment term started on July 10, 2014, and ended on August 31, 2014, where the first 10 days were the spin-up term of EnDA, and the next 12 days were the spin-up term of the DA experiments. Only the data in August 2014 were used for the ECM estimation and verification of atmospheric state analysis and forecast accuracy. We also conducted two supplemental experiments to evaluate the specific aspects of the objective ECM (Table. 1). The first supplemental experiment used the objective BECM and the empirical OECMs, where the amplitude of the BECM is rescaled using the inflation coefficient with a value of 2 for standard deviation to balance the empirical OECMs, as in15. The experiment that applied these changes to TEST is referred to as TEST-oB. The second supplemental experiment was designed to clarify the ability of the objective ECMs to analyse tropical cyclones. In this experiment, operationally used synthesized typhoon observation data were eliminated. These data include typhoon central sea surface pressure estimates and wind profiles describing typical typhoon structures in the western Pacific, which are estimated from weather forecaster analysed central pressure and 15 m/s wind radius, assuming gradient wind balance and typical typhoon structure25. These data are similar to NCEP's Tropical Cyclone Vital Statistics Record (TC-VITALS)53 in that they are tropical cyclone estimates, but the estimated variables and estimation methods are different. The other settings are the same as those of TEST. This experiment is referred to as TEST-noST. Synthesized typhoon observation data are needed to compensate for the shortage of observation data for tropical cyclones. If the BECM can express the flow-dependent structure adequately, then accurate analysis would be possible without the synthesized typhoon observation data.
Table 1.
Summary of experiments.
| BECM | OECM | Synthesized Typhoon observation data | |
|---|---|---|---|
| CNTL | NMC with empirical tuning | OmB statistics with empirical tuning | Assimilated |
| TEST | EnDA | D05 | Assimilated |
| TEST-oB | EnDA | OmB statistics with empirical tuning | Assimilated |
| TEST-noST | EnDA | D05 | Not assimilated |
Impact verification of estimated ECMs on forecast skill
To validate the objective ECMs, we compared the forecast accuracy of the experiments using these ECMs with that of CNTL. We use the fifth-generation atmospheric reanalysis of the European Centre for Medium-Range Weather Forecasts (ERA5)54,55 as reference analyses to calculate the forecast errors. Since error correlations between ERA5 analyses and JMA forecasts would be small and negligible compared to those between analyses and forecasts of JMA, this verification using ERA5 as reference analyses is reliable15. Furthermore, the ERA5 data are preferable as validation data because they are calculated using a state-of-the-art NWP system and data is available not only for the troposphere but also up to 1 hPa.
We use the normalized root mean square error difference () to compare two schemes (CNTL and TEST), which is define as follows.
| 9 |
Here, and are the root mean square errors (RMSEs) of CNTL and TEST, respectively. We calculate for each pressure level and region, the globe, the extratropical northern hemisphere (NH, north of 20° N), the tropics (between 20° N and 20° S), and the extratropical southern hemisphere (SH, south of 20° S). Statistical significance is evaluated based on the t-test of paired samples for mean differences under serial dependence56. If the null hypothesis that the mean RMSE of TEST and the mean RMSE of CNTL are equal is rejected at the 5% significance level, then we simply express that the difference between the mean RMSE of TEST and the mean RMSE of CNTL is 95% statistical significance. As shown in the next section, the differences in RMSEs between TEST and CNTL are large enough to obtain 95% statistical significance for the 1-month experimental period. This is because the t-score is both proportional to the differences in the RMSEs and the square root of the number of sample data. We have confirmed that even with an experimental setup with smaller RMSE differences, such as that of23, the experimental verification results are almost the same between one- and two-month experiments.
Two other forecast accuracy verifications were also conducted: verification using radiosonde observations as truth and verification based on the observation minus background statistics for all assimilated observations. Since results of these two verifications agreed with those of the verification using reference analyses (ERA5), and the reference analysis verification is superior to the other verifications in coverage and ability to detect fast-growing forecast errors, we mainly use the reference analysis verification. We also present a verification of the tropical cyclone track forecast accuracy. We used the best track data of tropical cyclones from the National Oceanic and Atmospheric Administration/National Hurricane Centre (NOAA/NHC) and those from JMA as truth for the eastern and western Pacific, respectively.
Verification of estimated ECMs by theoretical consistency
We also verify the estimated ECMs based on the theoretical consistency. Two indices and are used. These indices evaluate how well the cost function of 4D-Var for an analysis field follows the chi-squared distribution as theoretically expected. is the index using the value of the cost function as is, and is the index using the corrected values of the cost function by a scalar coefficient. Both indicators take values from 0 to 100%, and reach 100% when the theoretical consistency is perfectly satisfied. The detailed derivation of these indices is presented in Appendix A.
Results
Comparisons of objective and empirical ECMs
Figure 1 shows comparisons between the ECMs of TEST and CNTL. The diagonal components of the BECM of TEST are smaller than those of CNTL (Fig. 1a–c), and the standard deviation ratio (TEST/CNTL) is about 50%. This value agrees with the results of a previous study that objectively estimated the diagonal components of a BECM by the D05 method using radiosonde point samples23. Note that in Fig. 1b, the difference between TEST and CNTL is difficult to see at altitudes above 200 hPa because the amount of water vapour is much smaller (less than 1%) in the upper troposphere than in the lower troposphere. The diagonal components of the OECM of TEST are also generally smaller than those of CNTL (Fig. 1d). Especially large changes of the diagonal components of OECMs are seen for satellite-based observations, including radiance observations, compared with other direct observations. For example, the average ratio of standard deviations for temperature-sensitive microwave radiance observations (MW-T) is about 50%, that for the water vapor-sensitive microwave radiance observations (MW-WV) is about 15%, and that for the conventional observation datasets, such as SONDE and AVIATION, is about 80%, where the average for all observation datasets is about 34%. The observation error correlations between the different wavelengths of radiance observations of TEST are large and must be treated correctly in DA (Fig. 1e,f), which are ignored in CNTL. The horizontal observation error correlation distances are much smaller than the empirical those distances (figures are not shown). These results agree with the previous study by23. These large standard deviations of the empirical ECMs are due to the low ability of the observation minus background statistics, on which the empirical ECM is based, to separate observation and background errors. These standard deviation ratios between the objective and empirical ECMs would explain the unknown origin of the empirical inflation coefficient values of approximately 2 applied to the observation error standard deviations estimated by the D05s method in several previous studies10,19–22 to maintain the analysis and forecast accuracy. That is, empirical inflation would be needed to balance the excessively large standard deviations of their empirical ECMs. The differences between the objectively and empirically derived ECMs shown here are expected to affect the analysis and forecast fields. Thus, we consider these aspects below.
Monthly averaged analysis field changes by the objective PDFs
Figure 2 shows the differences between the monthly averaged analysis fields of TEST and CNTL. We see the differences in the 500 hPa temperature are in the range of 0.3 to 0.6 K in wide areas (Fig. 2c). Such changes were also found in23, and this is natural since the especially large changes of the OECMs are seen in those of the satellite radiances (Fig. 1d) that have wide observation coverage. The differences in the 300 hPa zonal wind are about 0.6 to 1.5 m/s (Fig. 2d). There are relatively large differences in the easterly wind regions in the tropics and the boundary regions of the westerly and easterly winds in the extratropics. Figure 2a,b show that the average analysis fields of TEST at 925 hPa are cooler and wetter than that of CNTL in regions with large amounts of low-level clouds, which are off the west coast of California, Peru, and Africa. These regions correspond to regions with large uncertainty in atmospheric state analysis because accurate expression of these clouds is difficult for current AGCMs57–61. These clouds are formed below the stable layer in the planetary boundary layer, therefore, these simultaneous changes in water vapor and temperature are reasonable. This change is consistent with the results of the ECM objective estimation study using a climatological BECM23. These areas also correspond to cold ocean currents with relatively large mixing in the ocean mixing layer, which are not accurately given to the AGCM as its boundary condition. This is because the sea surface temperature interacting directly with the atmosphere is the water temperature of very thin layer (micro to a few meters), and such a small and fast process cannot be represented in SST analysis and ocean numerical models. These regions are also the regions with large root mean square differences (RMSDs) between the analysis fields of TEST and CNTL (Fig. 3a,d) and correspond to the regions with large background error standard deviations in the objective BECM (Fig. 3b,e). In contrast, no such correspondence is observed in the background error standard deviations in the BECM derived by the NMC method (Fig. 3c,f). These changes imply that increased observation information is adequately assimilated by the objective ECMs. Although, which analysis is more accurate will be evaluated by forecasting from each analysis, as shown below, it is important here that two analyses with and without empirical tunings have the systematic differences, and these are not small, for example, compared to the standard deviations in BECM (Fig. 1).
Fig. 2.
Changes in the monthly averaged analysis fields between TEST and CNTL. The monthly averaged differences between TEST and CNTL (TEST minus CNTL) are shown with colour shades for (a) temperature (K) at 925 hPa, (b) water vapor mixing ratio (g/kg), (c) temperature (K) at 500 hPa, and (d) zonal wind velocity (m/s) at 300 hPa. The monthly average field of the corresponding quantity in CNTL is also shown with black contour lines in each panel. The grey areas show below ground. This figure was generated with GrADS v2.0.2 (http://cola.gmu.edu/grads/grads.php).
Fig. 3.
Comparisons of the monthly mean background error standard deviation and analysis field RMSDs for TEST and CNTL. The monthly averaged RMSDs between analyses of TEST and CNTL (a,d), the monthly averaged background error standard deviations of TEST (b,e) and CNTL (c,f) are shown with colour shades. The left and right columns show water vapor mixing ratio (g/kg) and temperature (K) at 925 hPa, respectively. The black contours show the monthly mean fields of the CNTL in panels (a) and (d) and the monthly ensemble mean fields of EnDA in panels (b) and (f). The grey areas show below ground. This figure was generated with GrADS v2.0.2 (https://cola.gmu.edu/grads/grads.php).
These changes in the monthly mean field would be due to the fact that in TEST, more observational information is assimilated into the analysis field due to the diagnosed new ECMs (Fig. 1) than in CNTL, so the influence of the model climate field has become smaller. This is consistent with the fact that changes are large in areas that are difficult to represent by NWP models, as mentioned above. It should be noted that the changes in the mean field are small compared to the root mean squared differences and are not inconsistent with the strong constraint 4D-Var assumption and no bias assumption of DA.
Forecast accuracy improvement by the objective PDFs
Figure 4 shows the normalized forecast RMSE differences defined in Eq. (9) (“Impact verification of estimated ECMs on forecast skill”) between TEST and CNTL. We can see that TEST has smaller forecast RMSEs for most physical quantities, regions, pressure levels, and forecast times. The maximum error reduction rate exceeds 9% with 95% statistical significance, which is as large as the improvement brought about by introducing physical laws in DA (i.e., 4D-Var against 3D-Var; figures not shown). These improvements persist for more than 5 days. The magnitude and duration of these improvements are larger and longer than those when only the BECM is objectively estimated using EnDA15 or when the amplitudes of the BECM and OECM of all datasets are objectively estimated23. This demonstrates the importance of objectively estimating the entire ECM, including the structure of the BECM. The improvements in the extratropical southern hemisphere (SH, south of 20° S) and the tropics are larger than those in NH due to the importance of satellite observations and the flow-dependent BECM there. This is because the complex mass-wind balances in the tropics compared to those in the extratropics, the active baroclinic instability in the winter hemisphere, and the shortage of direct observations (such as radiosondes) in the tropics and SH require highly accurate flow-dependent ECMs. Contrary to the general forecast accuracy improvement, Fig. 4 also shows minor forecast accuracy degradations for relative humidity (RH) in the stratosphere (around 100 hPa) globally and for temperature in the tropics in the upper troposphere (around 400–200 hPa). These degradations may be pseudo-degradations due to the small error growth rate of these quantities, as partly shown in23 for RH in the stratosphere. The similar degradation patterns in the comparisons between 4D-Var and 3D-Var (figures not shown) support this consideration. Other verifications using observations as truth and the observation minus background statistics agree with Fig. 4 (figures not shown). These results show that atmospheric state analysis with objective ECMs has higher accuracy than that with empirical ECMs.
Fig. 4.
The normalized forecast RMSE differences between TEST and CNTL. The colour shades show the normalized forecast RMSE differences between TEST and CNTL calculated by Eq. (9) (“Impact verification of estimated ECMs on forecast skill”), where the red (blue) colour shades denote that the forecast RMSEs of TEST are smaller (larger) than those of CNTL. The line hatch shows 95% statistical significance, and the dotted hatch shows one sigma (68%) statistical significance. The columns show the scores averaged for the globe (GB), NH, the tropics (TP), and SH, respectively, from left to right. The rows show the scores for zonal wind (U), meridional wind (V), temperature (T), and relative humidity (RH) from top to bottom, respectively. The vertical and the horizontal axes represent the pressure level (hPa) and the forecast time 0–5 days, respectively.
Theoretical consistency of DA using objective Gaussian PDFs
Figure 5 shows the theoretical consistency measured by two chi-square-based indices and (see “Verification of estimated ECMs by theoretical consistency” and Appendix A). The index shows that TEST has significantly better theoretical consistency than CNTL, where the values of TEST and CNTL are 95% and 16%, respectively. Notably, the high theoretical consistency of TEST is achieved without any empirical tunings. The index also shows TEST has significantly better theoretical consistency than CNTL, where the values of TEST and CNTL are 95% and 70%, respectively. Since the index represents the hypothetical theoretical consistency when the best single scalar coefficient for multiplying the ECM is applied, the consistency of CNTL in is better than that in . However, CNTL is still significantly worse than TEST. The theoretical consistency of TEST is also better than the DA system reported in23, where ECMs (BECM and OECM) objectively estimated by the D05 method were tuned using a single empirical parameter.
Fig. 5.

Theoretical consistency of DA. The theoretical consistency of DA measured by and is shown for each experiment. The blue and orange bars show and (see “Verification of estimated ECMs by theoretical consistency” and Appendix A), respectively. The horizontal axis shows the experiment names, and the vertical axis shows the theoretical consistency in percent.
Tropical cyclone track forecast accuracy
Figure 6 shows the tropical cyclone track forecast RMSEs for TEST and CNTL. We see that TEST has smaller track forecast errors than CNTL about 20% at 48-h in the western Pacific (Fig. 6a) and about 38% at 12-h in the eastern Pacific (Fig. 6b) with 95% statistical significance. Here the sample sizes are 14 and 64, respectively. These statistically significant results are due to the large difference in the typhoon track forecast errors between TEST and CNTL. This improvement is consistent with the improvements in forecast accuracy shown in Fig. 4. The smaller improvements in the early forecast time in Fig. 6a compared to Fig. 6b are due to the synthesized typhoon observation data (see “Methods”) used only in the western Pacific. When only the BECM is objectively estimated (TEST-oB), smaller improvements are seen (Fig. 6c) compared to TEST. Figure 6d shows that even when the synthesized typhoon observation data is not assimilated (TEST-noST), the objective ECMs realize better accuracy than CNTL after 12 h. The degradation near the initial time would be within the analysis errors of the typhoon centre position analysis. These results agree with the atmospheric state forecast accuracy shown in Fig. 4.
Fig. 6.
Tropical cyclone track forecast RMSE. Tropical cyclone (TC) track forecast errors in the western Pacific for (a) TEST, (c) TEST-oB, and (d) TEST-noST. (b) Tropical cyclone track forecast errors in the eastern Pacific for TEST. The horizontal axis shows the forecast time (h), the left vertical axis shows the track forecast error (km), and the right vertical axis shows the number of samples. The red and blue lines show track forecast errors of each test experiment and CNTL, respectively. The red and blue dots show the sample number of each TEST experiment and CNTL, respectively, where, the blue dots overlap behind the red dots because TEST and CNTL have the same number of samples (equal sampling). The two parallel triangles in each panel indicate statistical significance, where the top (bottom) row is the result with (without) considering the temporal correlation between samples. Green (black) triangles indicate a statistical significance above (below) than 95%.
Summary and conclusions
In this study, we have constructed the objective DA by objectively estimating the PDFs of all datasets (observations and forecasts) under the Gaussian approximation and using them in DA without empirical tunings. The BECM was constructed using an ensemble of 4D-Var with 192 members, and the OECM was constructed using the D05 method. The numerical experiments of atmospheric state analysis and forecast using these objective PDFs show the following results compared to those with the empirical PDFs. (1) The standard deviations of objective ECMs are smaller than those of empirical ECMs, where the standard deviation ratio values are about 50% for the BECM, about 50% for the temperature-sensitive radiances, and about 15% for the water vapor-sensitive radiances. (2) The analysed atmospheric states are systematically different, such as cooler and wetter low troposphere in regions characterized by low-level clouds off the west coast of the continents. (3) Forecast accuracy is improved for most variables, regions, pressure levels, and forecast times up to about 9% with 95% statistical significance. (4) Theoretical consistency evaluated by chi-square-based tests shows clear improvement. (5) Tropical cyclone track forecast accuracy is also improved globally, with 95% statistical significance. These results show that objective PDFs improve the theoretical consistency and accuracy of atmospheric state analysis. This is the first result of the fully objective atmospheric state analysis.
The objective analysis would be essential and contribute to all atmospheric sciences extensively, from everyday NWP to future climate predictions. The objective DA would also realize efficient development of operational NWP since they would be free from empirical tunings and compensation error problems. Furthermore, since the state analysis of huge degrees of freedom physical system is a common scientific problem in various physical systems other than the atmosphere, our results would contribute to such science areas.
Finally, we discuss future studies. First, the objective PDFs enable us to diagnose the DA system using various statistical quantities, such as Shannon’s entropy, degrees of freedom for signals, and the Kullback–Leibler divergence. Diagnostics using these quantities have been difficult due to the empirical ECMs that significantly distort the theoretical relationships. Second, the objective PDFs also enable us to study DA with next-order accuracy, where small differences of PDFs from the Gaussian distribution can be estimated. Such differences cannot be correctly estimated in DA systems with empirical ECMs. For example, this is essential for studies on DA using a non-Gaussian background PDF and a model error PDF. Third, the objective PDFs for various coupled DA, such as atmosphere–ocean coupling, atmosphere–land coupling, dynamics–physical and chemical process coupling, would be essential for the state analysis of the earth system. This study would be the essential starting point for these future studies.
Supplementary Information
Acknowledgements
The author is grateful to all those individuals who contributed to developing the JMA global NWP system and to the anonymous reviewers. I also thank my colleagues for discussions and support in various situations.
Author contributions
T.I. conducted all of this research.
Funding
This work was supported by JSPS KAKENHI Grant JP17K05658, JP22K03726.
Data availability
All information needed to evaluate the conclusions of this paper are described in the main text. The data of the numerical simulations are available from the author for an appropriate request. The copyright of the code of the original NWP system used here belongs to the Japan Meteorological Agency. Since the NWP system is a huge and complex system, and its input and output data are huge, it need to be performed under collaborative framework.
Competing interests
The author declares no competing interests.
Footnotes
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
The online version contains supplementary material available at 10.1038/s41598-024-71849-7.
References
- 1.Lorenz, E. N. Deterministic nonperiodic flow. J. Atmos. Sci.20, 130–141 (1963). [Google Scholar]
- 2.Kalnay, E. Atmospheric Modeling, Data Assimilation and Predictability (Cambridge University Press, 2003).
- 3.Lewis, J. M., Lakshmivarahan, S., & Dhall, S. Dynamic Data Assimilation (Cambridge University Press, 2006).
- 4.Bauer, P., Thorpe, A. & Brunet, G. The quiet revolution of numerical weather prediction. Nature525, 47–55. 10.1038/nature14956 (2015). [DOI] [PubMed] [Google Scholar]
- 5.Leeuwen, V., Künsch, H. R., Nerger, L., Potthast, R. & Reich, S. Particle filters for high-dimensional geoscience applications: A review. Quart. J. R. Meteorol. Soc.145, 2335–2365. 10.1002/qj.3551 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Poterjoy, J. Implications of multivariate non-Gaussian data assimilation for multiscale weather prediction. Mon. Weather Rev.150, 1475–1493. 10.1175/MWR-D-21-0228.1 (2022). [Google Scholar]
- 7.Parrish, D. F. & Derber, J. C. The National Meteorological Center’s spectral statistical interpolation analysis system. Mon. Weather Rev.120, 1747–1763 (1992). [Google Scholar]
- 8.Houtekamer, P. L., Lefaivre, L., Derome, J., Ritchie, H. & Mitchell, H. L. A system simulation approach to ensemble prediction. Mon. Weather Rev.124, 1225–1242 (1996). [Google Scholar]
- 9.Fisher, M. ‘Background error covariance modelling.’ in Proceedings of the ECMWF Seminar on recent developments in data assimilation for atmosphere and ocean (45–64), 8–12 September 2003. ECMWF. http://www.ecmwf.int/publications/.
- 10.Bonavita, M., Isaksen, L. & Hólm, E. On the use of EDA background error variances in the ECMWF 4D Var. Q. J. R. Meteorol. Soc.138, 1540–1559. 10.1002/qj.1899 (2012). [Google Scholar]
- 11.Evensen, G. Sequential data assimilation with a nonlinear quasi geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res.99, 10143–10162 (1994). [Google Scholar]
- 12.Houtekamer, P. L. et al. Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Weather Rev.133, 604–620. 10.1175/MWR2864.1 (2005). [Google Scholar]
- 13.Buehner, M., Houtekamer, P. L., Charette, C., Mitchell, H. L. & He, B. Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part I: Description and single observation experiments. Mon. Weather Rev.138, 1550–1566. 10.1175/2009MWR3157.1 (2010). [Google Scholar]
- 14.Buehner, M., Houtekamer, P. L., Charette, C., Mitchell, H. L. & He, B. Intercomparison of variational data assimilation and the ensemble Kalman filter for global deterministic NWP. Part II: One-month experiments with real observations. Mon. Weather Rev.138, 1567–1586. 10.1175/2009MWR3158.1 (2010). [Google Scholar]
- 15.Ishibashi, T. Network structure of atmospheric perturbations. Mon. Weather Rev.151, 1849–1861. 10.1175/MWR-D-22-0242.1 (2023). [Google Scholar]
- 16.Clayton, A. M., Lorenc, A. C. & Barker, D. M. Operational implementation of a hybrid ensemble/4DVAR global data assimilation system at the Met Office. Q. J. R. Meteorol. Soc.139, 1445–1461. 10.1002/qj.2054 (2013). [Google Scholar]
- 17.Fairbairn, D., Pring, S. R., Lorenc, A. C. & Roulstone, I. A comparison of 4DVar with ensemble data assimilation methods. Q. J. R. Meteorol. Soc.140, 281–294. 10.1002/qj.2135 (2014). [Google Scholar]
- 18.Desroziers, G., Berre, L., Chapnik, B. & Poli, P. Diagnosis of observation, background and analysis error statistics in observation space. Q. J. R. Meteorol. Soc.131, 3385–3396 (2005). [Google Scholar]
- 19.Weston, P., Bell, W. & Eyre, J. Accounting for correlated error in the assimilation of high resolution sounder data. Q. J. R. Meteorol. Soc.140, 2420–2429. 10.1002/qj.2306 (2014). [Google Scholar]
- 20.Bormann, N. et al. Enhancing the impact of IASI observations through an updated observation error covariance matrix. Q. J. R. Meteorol. Soc.142, 1767–1780. 10.1002/qj.2774 (2016). [Google Scholar]
- 21.Eresmaa, R., Letertre-Danczak, J., Lupu, C., Bormann, N. & McNally, A. P. The assimilation of Cross-track Infrared Sounder radiances at ECMWF. Q. J. R. Meteorol. Soc.143, 3177–3188 (2017). [Google Scholar]
- 22.Campbell, W. F., Satterfield, E. A., Ruston, B. & Baker, N. L. Accounting for correlated observation error in a dual formulation 4D variational data assimilation system. Mon. Weather Rev.145, 1019–1032. 10.1175/MWRD160240.1 (2017). [Google Scholar]
- 23.Ishibashi, T. Improvement of accuracy of global numerical weather prediction using refined error covariance matrices. Mon. Weather Rev.148, 2623–2643 (2020). [Google Scholar]
- 24.Ménard, R. Error covariance estimation methods based on analysis residuals: Theoretical foundation and convergence properties derived from simplified observation networks. Q. J. R. Meteorol. Soc.142, 257–273. 10.1002/qj.2650 (2016). [Google Scholar]
- 25.JMA, Outline of the operational numerical weather prediction at the Japan Meteorological Agency. Appendix to WMO Technical Progress Report on the Global Data-processing and Forecasting System (GDPFS) and Numerical Weather Prediction (NWP), Japan Meteorological Agency, Tokyo, Japan, accessed 21 May 2020, http://www.jma.go.jp/jma/jmaeng/jma-center/nwp/outline2013-nwp/index.htm, (2013).
- 26.Lorenc, A. C. Analysis methods for numerical weather prediction. Q. J. R. Meteorol. Soc.112, 1177–1194 (1986). [Google Scholar]
- 27.Tsuyuki, T. & Miyoshi, T. Recent progress of data assimilation methods in meteorology. J. Meteorol. Soc. Jpn.85B, 331–361. 10.2151/jmsj.85B.331 (2007). [Google Scholar]
- 28.Chung, K. L. A Course In Probability Theory (Elsevier, 2000).
- 29.Hamilton, J. D. Time Series Analysis (Princeton University Press, 2020).
- 30.Sasaki, Y. Proposed inclusion of time evolution terms, observational and theoretical in numerical variational objective analysis. J. Meteorol. Soc. Jpn.47, 115–124 (1969). [Google Scholar]
- 31.Sasaki, Y. Some basic formalisms in numerical variational analysis. Mon. Weather Rev.98, 875–883 (1970). [Google Scholar]
- 32.Thompson, P. Reduction of analysis error through constraints of dynamical consistency. J. Appl. Meteorol.8, 738–742 (1969). [Google Scholar]
- 33.Rabier, F., Järvinen, H., Klinker, E., Mahfouf, F. & Simmons, A. The ECMWF operational implementation of four dimensional variational assimilation. I: Experimental results with simplified physics. Q. J. R. Meteorol. Soc.126, 1143–1170 (2000). [Google Scholar]
- 34.Mahfouf, F. & Rabier, F. The ECMWF operational implementation of four dimensional variational assimilation. II: Experimental results with improved physics. Q. J. R. Meteorol. Soc.126, 1171–1190 (2000). [Google Scholar]
- 35.Klinker, E., Rabier, F., Kelly, G. & Mahfouf, F. The ECMWF operational implementation of four dimensional variational assimilation. III: Experimental results and diagnostics with operational configuration. Q. J. R. Meteorol. Soc.126, 1191–1215 (2000). [Google Scholar]
- 36.Navon, I. M. & Legler, D. M. Conjugate gradient methods for large scale minimization in meteorology. Mon. Weather Rev.115, 1479–1502 (1987). [Google Scholar]
- 37.Zou, X. et al. Numerical experience with limited-memory quasi-Newton and truncated Newton methods. SIAM J. Optim.3, 582–608 (1993). [Google Scholar]
- 38.Fisher, M. Minimization algorithms for variational data assimilation. Annual Seminar on Recent Developments in Numerical Methods for Atmospheric Modelling, Shinfield Park, Reading, 7–11 September 1998, ECMWF, 364–385 (1998).
- 39.Gaspari, G. & Cohn, S. Construction of correlation functions in two and three dimensions. Q. J. R. Meteorol. Soc.125, 723–757 (1999). [Google Scholar]
- 40.Houtekamer, P. L. & Mitchell, H. L. A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev.129, 123–137 (2001). [Google Scholar]
- 41.Houtekamer, P. L. & Mitchell, H. L. Ensemble Kalman filtering. Q. J. R. Meteorol. Soc.131, 3269–3289 (2005). [Google Scholar]
- 42.Lorenc, A. C. The potential of the ensemble Kalman filter for NWP—A comparison with 4DVAR. Q. J. R. Meteorol. Soc.129, 3183–3203 (2003). [Google Scholar]
- 43.Ishibashi, T. Tensor formulation of ensemble-based background error covariance matrix factorization. Mon. Weather Rev.143, 4963–4973 (2015). [Google Scholar]
- 44.Bauer, P., Radnóti, G., Healy, S. & Cardinali, C. GNSS radio occultation constellation observing system experiments. Mon. Weather Rev.142, 555–572. 10.1175/MWR-D-13-00130.1 (2014). [Google Scholar]
- 45.Zhang, F. et al. What is the predictability limit of midlatitude weather?. J. Atmos. Sci.76, 1077–1091. 10.1175/JAS-D-18-0269.1 (2019). [Google Scholar]
- 46.Buizza, R. & Palmer, T. N. The singular-vector structure of the atmospheric global circulation. J. Atmos. Sci.52, 1434–1456. 10.1175/1520-0469(1995)052%3c1434:TSVSOT%3e2.0.CO;2 (1995). [Google Scholar]
- 47.Gelaro, R., Buizza, R., Palmer, T. N. & Klinker, E. Sensitivity analysis of forecast errors and the construction of optimal perturbations using singular vectors. J. Atmos. Sci.55, 1012–1037. 10.1175/1520-0469(1998)055%3c1012:SAOFEA%3e2.0.CO;2 (1998). [Google Scholar]
- 48.Philips, J. L. How to Think About Statistics (Freeman, 1988).
- 49.Kuhl, D. D., Rosmond, T. E., Bishop, C. H., McLay, J. & Baker, N. L. Comparison of hybrid ensemble/4DVar & 4DVar within the NAVDAS-AR data assimilation framework. Mon. Weather Rev.141, 2740–2758. 10.1175/MWR-D-12-00182.1 (2013). [Google Scholar]
- 50.Ishibashi, T. Adjoint-based observation impact estimation with direct verification using forward calculation. Mon. Weather Rev.146, 2837–2858 (2018). [Google Scholar]
- 51.Derber, J. C. & Wu, W. S. The use of TOVS cloud cleared radiances in the NCEP SSI analysis system. Mon. Weather Rev.126(8), 2287–2299. 10.1175/1520-0493(1998)126%3c2287:TUOTCC%3e2.0.CO;2 (1998). [Google Scholar]
- 52.Dee, D. P. Variational bias correction of radiance data in the ECMWF system. In Proceedings of the ECMWF Workshop on Assimilation of High Spectral Resolution Sounders in NWP, June 28. ECMWF (2004).
- 53.NCEP, Format of the tropical cyclone vital statistics records (“TCVitals”). NCEP/Environmental Modeling Center, accessed 4 July 2024. http://www.emc.ncep.noaa.gov/mmb/data_processing/tcvitals_description.htm (2011).
- 54.Hersbach, H. & Dee, D. P. ERA5 reanalysis is in production. ECMWF Newsl.147, 7 (2016). [Google Scholar]
- 55.Hersbach, H. et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc.146, 1999–2049. 10.1002/qj.3803 (2020). [Google Scholar]
- 56.Wilks, D. S. Test for differences of mean under serial dependence. In Statistical Methods in the Atmospheric Sciences, 2nd ed. 143–146 (Elsevier Academic Press, 2011).
- 57.Teixeira, J. & Hogan, T. F. Boundary layer clouds in a global atmospheric model: Simple cloud cover parameterizations. J. Clim.15, 1261–1276. (2002). [Google Scholar]
- 58.Kawai, H. et al. Significant improvement of cloud representation in the global climate model MRI-ESM2. Geosci. Model Dev.12, 2875–2897. 10.5194/gmd-12-2875-2019 (2019). [Google Scholar]
- 59.Kawai, H., Koshiro, T. & Webb, M. J. Interpretation of factors controlling low cloud cover and low cloud feedback using a unified predictive index. J. Clim.30(22), 9119–9131 (2017). [Google Scholar]
- 60.Konsta, D. et al. Low-level marine tropical clouds in six CMIP6 models are too few, too bright but also too compact and too homogeneous. Geophys. Res. Lett.49(11), e2021GL097593 (2022). [Google Scholar]
- 61.Chiba, J. & Kawai, H. Improved SST-shortwave radiation feedback using an updated stratocumulus parameterization. WGNE Blue Book Res. Act. Earth. Syst. Modell.51, 403 (2021). [Google Scholar]
- 62.Bennett, A. F., Leslie, L. M., Hagelberg, C. R. & Powers, P. E. Tropical cyclone prediction using a barotropic model initialized by a generalized inverse method. Mon. Weather Rev.121, 1714–1729 (1993). [Google Scholar]
- 63.Chapnik, B., Desroziers, G., Rabier, F. & Talagrand, O. Diagnosis and tuning of observational error in a quasi operational data assimilation setting. Q. J. R. Meteorol. Soc.132, 543–565 (2006). [Google Scholar]
- 64.Sadiki, W. & Fischer, C. A posteriori validation applied to the 3D VAR Arpege and Aladin data assimilation systems. Tellus.57 A, 21–34 (2005). [Google Scholar]
- 65.Talagrand, O. A posteriori evaluation and verification of analysis and assimilation algorithms. In Proceedings of Workshop on Diagnosis of Data Assimilation Systems, November 1998, ECIVIWF, Reading, 17–28 (1999).
- 66.Desroziers, G. & Ivanov, S. Diagnosis and adaptive tuning of observation error parameters in variational assimilation. Q. J. R. Meteorol. Soc.127, 1433–1452 (2001). [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
All information needed to evaluate the conclusions of this paper are described in the main text. The data of the numerical simulations are available from the author for an appropriate request. The copyright of the code of the original NWP system used here belongs to the Japan Meteorological Agency. Since the NWP system is a huge and complex system, and its input and output data are huge, it need to be performed under collaborative framework.





