Skip to main content
Data in Brief logoLink to Data in Brief
. 2022 Oct 12;45:108669. doi: 10.1016/j.dib.2022.108669

Extending the global high-resolution downscaled projections dataset to include CMIP6 projections at increased resolution coherent with the ERA5-Land reanalysis

Thomas Noël a,, Harilaos Loukos a, Dimitri Defrance a, Mathieu Vrac b, Guillaume Levavasseur c
PMCID: PMC9679486  PMID: 36425992

Abstract

This paper describes the extension of the previously CMIP5 based high-resolution climate projections with additional ones based on the more recent climate projections from the CMIP6 experiment. The downscaling method and data processing are the same but the reference dataset is now the ERA5-Land reanalysis (compared to ERA5 previously) allowing to increase the resolution of the new downscaled projections from 0.25° x 0.25° to 0.1°x 0.1°. The extension comprises 5 climate models and includes 2 surface variables at daily resolution: air temperature and precipitation. Three greenhouse gas emissions scenarios are available: Shared Socioeconomic Pathways with mitigation policy (SSP1-2.6), an intermediate one (SSP2-4.5), and one without mitigation (SSP5-8.5).

Keywords: High-resolution, Projections, CMIP6, ERA5-Land, Downscaling, Climate change, Adaptation, Impact modeling


Specifications Table

Subject Climatology; Global and Planetary Change
Specific subject area Climate change; Natural disasters.
Evolution of near surface air temperature and precipitation.
Type of data Data Cube (Raster X Time) in NetCDF
How data were acquired CMIP6 model projections and renalysis data were obtained from the Copernicus Climate Change Service and Earth System Grid Federation data nodes. A statistical downscaling trend-preserving method (CDFt) was applied using the ERA5-Land reanalysis for calibration: 0.1°  ×  0.1° spatial resolution, calibration period 1981–2010, historical (1951-2014) and future (2015-2100) for 5 models and 3 scenarios (SSP1-2.6, SSP2-4.5 and SSP5-8.5), and daily temporal resolution.
Data format Netcdf: is a set of software libraries and self-describing, machine-independent data formats that support the creation, access, and sharing of array-oriented scientific data and is extensively used in Atmospheric and Oceanic sciences.
Description of data collection Simulated near-surface air temperature and precipitation data from 5 climate models downloaded from the Earth System Model Grid Federation and Copernicus Climate Change Service including the ERA5-Land reanalysis data for the period 1981-2010 and the same variables.
Data source location Global scale, including land surfaces only. Downloaded from
Earth System Model Grid Federation: https://esgf-node.ipsl.upmc.fr/search/cmip6-ipsl/
and Copernicus Climate Change Service https://cds.climate.copernicus.eu
Data accessibility Accessible through Earth System Grid Federation (ESGF) under a research license at https://esgf-node.ipsl.upmc.fr/projects/cmip6-adjust/

Value of the Data

  • In this second version, the dataset of high-resolution climate projections is now extended to include projections from the recent CMIP6 experiment. While keeping the same downscaling method and procedure, the reference dataset is now the ERA5-Land reanalysis (compared to ERA5 previously), from the Copernicus Climate Change Service, allowing to increase the spatial resolution from 0.25° x 0.25° to 0.1°x 0.1°. A great advantage of this dataset is thus to provide an extension of the ERA5-Land reanalysis into the future.

  • The dataset is global over land surfaces, comprises 5 climate models allowing to address model uncertainty and includes 2 surface variables at daily resolution: air temperature and precipitation. To sample future climate uncertainty from anthropogenic forcing, three greenhouse gas emissions scenarios are available: one with mitigation policy (SSP1-2.6), an intermediate one (SSP2-4.5) and one without mitigation (SSP5-8.5).

1. Data Description

This paper describes the extension of a previous high-resolution downscaled CMIP5 projections global dataset of essential surface climate variables with projections from the recent CMIP6 experiment. While keeping the same downscaling method and procedure, the reference dataset is now the ERA5-Land reanalysis (compared to ERA5 previously) allowing to increase the spatial resolution.

The new high-resolution climate projections dataset covers the globe over land at a 0.1°x0.1° spatial resolution and at daily temporal resolution for 2 surface variables. It comprises 5 models from the CMIP6 experiment [1] with simulations for the historical period (1951-2014) and the 21st century (2015 to 2100) under 3 emissions scenarios: one with mitigation policy (Shared Socioeconomic Pathway 1-2.6 or SSP1-2.6), and intermediate one (Shared Socioeconomic Pathway 2-4.5 or SSP2-4.5) and one with no mitigation (Shared Socioeconomic Pathway 5-8.5 or SSP5-8.5). The 2 downscaled land surface variables are air temperature and precipitation. The combination of models and scenarios represents 15 climate projections (5 models x 3 scenarios) for each variable. Other variables, models and emissions scenarios could be added in the near future. The data is stored in chunks of 10 to 15 years by model and variable according to Earth System Grid Federation (ESGF) conventions and total volume is approximately 12TB.

The data was produced with a statistical downscaling method using the ERA5-Land reanalysis [2] for calibration (see next section for details). The advantage of the downscaled data is the removal of model biases at a spatial resolution more compatible with the requirements of assessments and modeling of the impacts of climate change. In other terms it corrects the climatology (distribution) of model values to make them comparable with a reference observational dataset [3], which in this case is the ERA5-Land reanalysis. In the following subsections, we remind the file naming conventions then proceed with an illustration of the bias removal over the historical period and the climate change signal differences at the end of the 21st century.

1.1. File name conventions

The adopted conventions, the same as in the previous version, were adapted from the EURO-CORDEX Data Reference Syntax (DRS) for adjusted projections in order to produce a DRS for CMIP6 adjusted projections [4] as there is no such official DRS defined by the climate modeling community. It is reminded that we kept the terms “bias-adjustment” and “adjustment” even if, strictly speaking, we are producing downscaled projections, not to introduce any changes in the existing syntax. For more details and final naming examples we refer the reader to the previous paper.

1.2. Historical simulations

Here we compare the differences with the ERA5-Land reanalysis of both the original model (interpolated on the reanalysis grid and referred to as “interpolated”) and downscaled simulations (referred to as “downscaled”). We first look at the historical 30-year calibration period (1981-2010) for both temperature and precipitation. We also look at the 1951-1980 period but for simulations only (since there is no reanalysis data) to see the differences in a 30-year period different from the calibration period.

Fig. 1 illustrates the spatial differences between the interpolated and downscaled data by comparing their bias to the ERA5-Land reanalysis over the calibration period. For temperature, the ensemble mean of the five models shows a bias from -2°C to 1°C for the interpolated data. In some areas as in the Western of northern America, the bias is above +5°C. We can further notice that temperature in mountainous areas (Himalaya and the Andes) is often overestimated at the top by GCMs because of the poor representation of topography. For precipitation, there is a good representation of the precipitation amounts with the exception of the tropical area in the interpolated data. In Asia, the monsoon precipitation levels are underestimated in China and India but an overestimation to the adjacent area is highlighted in Indonesia. In South America, the Amazonia basin has an overestimation of the precipitation amounts and in the western part an underestimation. When considering the downscaled data, models have temperature and precipitation comparable to the ERA5-Land reanalysis over the world.

Fig. 1.

Fig. 1

Comparison between the interpolated data and the downscaled data averaged over the calibration period (1981-2010), for temperature (°C) (top) and precipitation (mm/day) (bottom). Difference from the reanalysis data for the interpolated data (left) and the downscaled data (right).

Fig. 2 a shows the cumulative distributions functions (CDFs) empirically estimated from monthly values averaged on the globe for the interpolated and downscaled data over the calibration period. There is a spread of the interpolated data CDFs around the ERA5-Land reanalysis CDF, with overestimations and underestimations. This spread is more important for precipitation than for temperature. The difference between the downscaled data CDFs and reanalyses data CDFs is very diminished for the downscaled data CDFs. While they are almost indistinguishable for temperature, there are small differences for low monthly precipitation amounts. This is due to the fact that downscaling is performed at a daily scale (see next section) and grid point by grid point, while CDFs are estimated on monthly and spatially-averaged data. The day-to-day (temporal) and spatial variability of the model data are preserved by the downscaling method, however residual biases can appear on monthly and spatial averages.

Fig. 2.

Fig. 2

Fig. 2

a. Cumulative distribution functions of global domain mean monthly averages over the calibration period (1981-2010), for temperature (top) and precipitation (bottom) and for interpolated (left) and downscaled data (right), from each model (gray) and the ERA5-Land reanalysis (black). Fig 2b. Cumulative distribution functions of the global domain mean monthly averages over the 1951-1980 period, for temperature (top) and precipitation (bottom), and for interpolated (left) and downscaled data (right) from each model.

Fig. 2b, is the same as Fig. 2a but over the 1951-1980 period. We can see the same type of changes between the interpolated and downscaled data and features among the variables as over the calibration period. In the interpolated data, the spread of cumulative distributions is similarly more important for precipitation than for temperature. The reduction of CDFs spread in the downscaled data is apparent for both temperature and precipitation.

1.3. Projections at the end of the century

Here we illustrate changes by the end of the century over the 2071-2100 period under scenario SSP5-8.5, by comparing the interpolated and downscaled simulations for both variables. Results for scenarios SSP1-2.6 and SSP2.4.5 are gradually less pronounced according to the scenario but similar and not shown.

Fig. 3 shows maps with the spatial differences between interpolated and downscaled data of the ensemble mean. For temperature, all continents are affected with differences of some degrees. The effect of downscaling is mostly observed in mountainous regions as in the Himalayas, the Andes and in the Rockies. For precipitation, we can see differences in the tropical areas with increases in the Northern part (West Africa, Asia) and decreases in the Southern Amazonia basin and Indonesia. In extra-tropical areas, differences are smaller, between -2 and +2 mm/day.

Fig. 3.

Fig. 3

Ensemble mean averaged over the end of the century (2071-2100), for temperature (°C) (top) and precipitation (mm/day) (bottom). Interpolated data (left), downscaled data (center), and difference between downscaled and interpolated data (right).

Fig. 4 illustrates the climate sensitivity of each model represented by the shift in the CDF of globally averaged monthly values. The shift is evaluated by subtracting, for each model, the value of the median (q50) of the corresponding CDF for present climate. For temperature, the CDFs of each model show almost identical shifts for interpolated and downscaled data. For temperature, the CDFs of each model show that downscaled data have overall a larger shift compared to interpolated model data, particularly in the higher values. It illustrates that the downscaling method is preserving the warming trend of the original model data while it modifies their precipitation changes towards wetter and more intense values.

Fig. 4.

Fig. 4

Shift (see text) of Cumulative Distribution Functions of global domain mean monthly averages for 2071-2100. Results for temperature (left) and precipitation (right), and for interpolated (light gray) and downscaled data (dark gray) from each model.

2. Experimental Design and Recommended Use

As in the first version, four datasets are used in this dataset update:

  • The reanalysis data used as reference for calibrating the statistical algorithm over a training period. The reanalysis grid sets the final resolution of the downscaled projections.

  • The original model climate projections come in a variety of spatial resolutions (typically between 2.5°x2.5° and 0.9°x0.9°) and are referred to as “raw”.

  • The raw data interpolated on the reanalysis grid and referred to as “interpolated”.

  • The downscaled data obtained from the interpolated data and the reanalysis data used for statistical calibration (both on the same grid) and referred as “downscaled”.

The raw and reanalysis data are input data that need to be sourced. The interpolated data is just an intermediary dataset needed by the methodology while the downscaled data is the final dataset. These datasets correspond to the four steps process (data sourcing, remapping, downscaling, quality control) used in our processing and reminded below.

2.1. Data sourcing

The reanalysis data in this update is the ERA5-Land reanalysis [2]. ERA5-Land is the latest climate reanalysis being produced by ECMWF as part of implementing the EU- funded Copernicus Climate Change Service (C3S), providing hourly data on atmospheric, land-surface and sea-state parameters together with estimates of uncertainty from 1981 to present day. ERA5-Land data are available on the C3S Climate Data Store on regular latitude-longitude grids at 0.1° x 0.1° resolution. We compute the daily data from the ERA5-Land hourly data for both variables.

The new climate simulations hail from The Coupled Model Intercomparison Project Phase 6 (CMIP6) experiment [1]. They support the Fifth Assessment Report (AR6) of the Intergovernmental Panel on Climate Change (IPCC). We use projections from 3 emissions scenarios: SSP1-2.6 (mitigation policy aligned with a 2° pre-Paris agreement target), SSP2-4.5 (intermediate scenario) and SSP5-8.5 (no mitigation policy). Daily data of necessary variables are extracted from the Copernicus Climate Change Service that hosts a subset of the CMIP6 archive. The data covers the period from January 1951 to December 2100 with the historical period ending in 2014 and the SSP's starting the following year. The five models have different spatial resolutions ranging between 0.9° to 2.5°.

Because of the increase in resolution (factor 6.25), the downscaling procedure requires important computational resources we therefore limited the simulations to the 5 climate models selected by the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP, [5]): GFDL-ESM4, IPSL-CM6A-LR, MPI-ESM1-2-HR, MRI-ESM2-0, and UKESM1-0-LL. Those five models are considered as a good choice in terms of climate sensitivity (i.e. magnitude of the warming signal at the end of the century), since they are good representatives of the full CMIP6 ensemble as they include three models with low climate sensitivity (GFDL-ESM4, MPI-ESM1-2-HR, MRI-ESM2-0) and two models with high climate sensitivity (IPSL-CM6A-LR, UKESM1-0-LL).

2.2. Data processing

The overall processing of the data is the same as in the previous version and comprises 4 sequential tasks: remapping, downscaling, standardization and quality control. Remapping is a preliminary task required by the downscaling methodology and consists in spatially interpolating the raw simulations onto the ERA5-Land grid (0.1° x 0.1°). The downscaling method applied is the Quantile mapping-based method (QM) called the “Cumulative Distribution Function transform” (CDF-t) method [6], [7], [8], [9], [10]. The variables are downscaled at a daily resolution over the 1951-2100 period using 1981-2010 as calibration period. The precipitation variable is downscaled with a specific version of CDF-t referred to as “Singularity Stochastic Removal” (SSR) which considers rainfall occurrence and intensity challenges [11]. Standardization consists in rewriting output data files and related metadata to comply with standards used by the climate modeling community (e.g., the Climate and Forecast metadata convention and the Data Reference Syntax). We conduct two types of quality control: we first verify data compliance with climate community's standards, data consistency and metadata, and then check for outlier values in the downscaled data. For a detailed description of those 4 tasks we refer the reader to the previous paper.

2.3. Recommended use

This dataset was validated at the global level for the needs of this paper and is provided “as-is”. This means that for use in local applications it is incumbent to the user to evaluate if it fits the purpose of their study. The quality of the present data is driven by the reference dataset, the selected climate models and the methodology applied.

ERA5-Land is a reanalysis dataset providing a consistent view of the evolution of land variables over several decades at an enhanced resolution compared to ERA5. ERA5-Land was produced by replaying the land component of the ECMWF ERA5 climate reanalysis. Nonetheless, this dataset is not of equal quality for every variable or region as it depends on the quality and volume of observations that is based upon. It is recommended to users to assess the quality of the representation of ERA5-Land weather variables over the historical period before using the downscaled projections to identify any potential shortcomings. This through comparison with other datasets and literature review.

The 5 climate models selected here are claimed to be a representative subset of the whole CMIP6 ensemble [5]. However, even the whole CMIP6 ensemble (as CMIP5 previously) is an ensemble of opportunity [12] and as such it is not designed to sample the best way possible the space of possible values. Users have to be aware that values beyond the ensemble enveloppe are possible. More generally, climate models have been improved since the last IPCC, but can still have several shortcomings as in the representation of extremes. Literature review of impact studies in the user region of interest is thus an essential preliminary task before analysis of the information derived from this dataset.

Concerning the methodology, we can note that, for some climate models, multivariate properties (e.g., inter-variable correlations) might be biased or inappropriate at the local scale and, thus, need to be corrected. CDft is an univariate method (e.g. each variable is downscaled separately) and in theory does not guarantee preservation of inter-variable correlation. In practice correlation is preserved from the climate model simulations to be downscaled (e.g. [13]). This means that if the climate model simulations have realistic dependence structures (e.g., correlations) between temperature and precipitation, the variables downscaled with CDF-t will mostly keep the realistic dependencies. However, it also means that if the climate model has inappropriate dependencies, the resulting downscaled time series will also preserve them and, thus, might be unrealistic. The user can evaluate this aspect by analyzing the simulations over the historical period.

Ethics Statement

Not applicable.

CRediT Author Statement

Thomas Noël: Methodology, Software, Computation and Data curation; Harilaos Loukos: Supervision, Funding acquisition, Writing and editing; Dimitri Defrance: Writing, figures; Mathieu Vrac: Methodology, Review and editing; Guillaume Levavasseur: Resources, Software and Data publication.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that have or could be perceived to have influenced the work reported in this article.

Acknowledgments

To process the data, this study benefited from the IPSL mesocenter ESPRI facility that is supported by CNRS, SU, and Ecole Polytechnique partly funded by IS-ENES3 project. We also thank the Institut Pierre Simon Laplace for assistance with the Synda software.

Footnotes

Refers to: Thomas Noël, Harilaos Loukos, Dimitri Defrance, Mathieu Vrac, Guillaume Levavasseur, A high-resolution downscaled CMIP5 projections dataset of essential surface climate variables over the globe coherent with the ERA5 reanalysis for climate change impact assessments, Data in Brief, Volume 35, 2021, 106900, ISSN 2352-3409, https://doi.org/10.1016/j.dib.2021.106900 (https://www.sciencedirect.com/science/article/pii/S2352340921001840).

Data Availability

References

  • 1.Eyring V., Bony S., Meehl G.A., Senior C.A., Stevens B., Stouffer R.J., Taylor K.E. Overview of the coupled model intercomparison project phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 2016;9:1937–1958. doi: 10.5194/gmd-9-1937-2016. [DOI] [Google Scholar]
  • 2.Muñoz-Sabater J., Dutra E., Agustí-Panareda A., Albergel C., Arduini G., Balsamo G., Boussetta S., Choulga M., Harrigan S., Hersbach H., Martens B., Miralles D.G., Piles M., Rodríguez-Fernández N.J., Zsoter E., Buontempo C., Thépaut J.-N. ERA5-Land: a state-of-the-art global reanalysis dataset for land applications. Earth Syst. Sci. Data. 2021;13:4349–4383. doi: 10.5194/essd-13-4349-2021. [DOI] [Google Scholar]
  • 3.Galmarini S., Cannon A.J., Ceglar A., Christensen O.B., de Noblet-Ducoudré N., Dentener F., Doblas-Reyes F.J., Dosio A., Gutierrez J.M., Iturbide M., Jury M., Lange S., Loukos H., Maiorano A., Maraun D., McGinnis S., Nikulin G., Riccio A., Sanchez E., Solazzo E., Toreti A., Vrac M., Zampieri M. Adjusting climate model bias for agricultural impact assessment: How to cut the mustard. Clim. Serv. 2019;13:65–69. doi: 10.1016/j.cliser.2019.01.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.G. Levavasseur, T. Noël, Data Reference Syntax (DRS) for bias-adjusted C3S-CMIP5 simulations (2021) doi: 10.31223/X5389H. [DOI]
  • 5.S. Lange, ISIMIP3b bias adjustment fact sheet, (2021) 40. https://www.isimip.org/gettingstarted/isimip3b-bias-correction/. Accessed September 8, 2021.
  • 6.Michelangeli P., Vrac M., Loukos H. Probabilistic downscaling approaches: application to wind cumulative distribution functions. Geophys. Res. Lett. 2009;36:L11708. doi: 10.1029/2009GL038401. [DOI] [Google Scholar]
  • 7.Vrac M., Drobinski P., Merlo A., Herrmann M., Lavaysse C., Li L., Somot S. Dynamical and statistical downscaling of the French Mediterranean climate: uncertainty assessment. Nat. Hazards Earth Syst. Sci. 2012;12:2769–2784. doi: 10.5194/nhess-12-2769-2012. [DOI] [Google Scholar]
  • 8.Vautard R., Noël T., Li L., Vrac M., Martin E., Dandin P., Cattiaux J., Joussaume S. Climate variability and trends in downscaled high-resolution simulations and projections over Metropolitan France. Clim. Dyn. 2013;41:1419–1437. doi: 10.1007/s00382-012-1621-8. [DOI] [Google Scholar]
  • 9.Famien A.M., Janicot S., Delfin Ochou A., Vrac M., Defrance D., Sultan B., Noël T. A bias-corrected CMIP5 dataset for Africa using the CDF-t method - a contribution to agricultural impact studies. Earth Syst. Dyn. 2018;9:313–338. doi: 10.5194/esd-9-313-2018. [DOI] [Google Scholar]
  • 10.Bartók B., Tobin I., Vautard R., Vrac M., Jin X., Levavasseur G., Denvil S., Dubus L., Parey S., Michelangeli P.A., Troccoli A., Saint-Drenan Y.M. A climate projection dataset tailored for the European energy sector. Clim. Serv. 2019;16 doi: 10.1016/j.cliser.2019.100138. [DOI] [Google Scholar]
  • 11.Vrac M., Noël T., Vautard R. Bias correction of precipitation through singularity stochastic removal: Because occurrences matter. J. Geophys. Res. 2016;121:5237–5258. doi: 10.1002/2015JD024511. [DOI] [Google Scholar]
  • 12.Tebaldi Claudia ]C, Knutti R. The use of the multi-model ensemble in probabilistic climate projections. Phil. Trans. R. Soc. 2007;A.365:2053–2075. doi: 10.1098/rsta.2007.2076. [DOI] [PubMed] [Google Scholar]
  • 13.François B., Vrac M., Cannon A.J., Robin Y., Allard D. Multivariate bias corrections of climate simulations: which benefits for which losses? Earth Syst. Dynam. 2020;11:537–562. doi: 10.5194/esd-11-537-2020. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement


Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES