Mexico's High Resolution Climate Database (MexHiResClimDB): a new daily high-resolution gridded climate dataset for Mexico covering 1951&ndash;2020

Carrera-Hernández, Jaime J.

doi:https://doi.org/10.5194/essd-2025-100

Preprints

https://doi.org/10.5194/essd-2025-100

Preprints

17 Mar 2025

| 17 Mar 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

Mexico's High Resolution Climate Database (MexHiResClimDB): a new daily high-resolution gridded climate dataset for Mexico covering 1951–2020

Jaime J. Carrera-Hernández

Abstract. This work presents Mexico's High Resolution Climate Database (MexHiResClimDB), which is a newly developed gridded, high-resolution climate dataset comprised of daily, monthly and yearly precipitation and temperature (T_min, T_max, T_avg). This new database provides the largest temporal coverage of the aforementioned climate variables at the highest spatial resolution (20 arc sec, or 560 m on Mexico's CCL projection) when compared to the other currently available gridded datasets for Mexico and its development has allowed to analyze the country's climate extremes for the 1951–2020 period. By comparing the spatial distribution of precipitation from the MexHiResClimDB with other gridded data (Daymet, L15, CHIRPS and PERSIANN CDR), it was found that the precipitation provided by this new dataset is the only one that adequately represents the spatial variation of extreme precipitation events, in particular for the precipitation that occurred during September 15–16 of 2013, caused by the presence of Tropical storm Manuel in the Pacific Ocean and Hurricane Ingrid (Cat 1) in the Gulf of Mexico. With this new database it was possible to summarize extreme events of precipitation and temperature in Mexico for the 1951–2020 period — a summary that was not available before: the wettest year was 1958, the wettest day 1970-09-26, and September of 2013 the wettest month. It was also found that eight out of the ten days with the highest T_min occurred in 2020, the two months with the highest T_min were July and August of 2020 and that the six years with the highest T_min were 2015–2020. When T_max was analyzed, it was found that the hottest day was 1998-06-15, while June of 1998 was the hottest month and 2020 the hottest year, and that the four hottest years occurred between 2011–2020. Nationwide (and considering 1961–1990 as the baseline period), T_min, T_avg and T_max have increased, with their anomalies drastically increasing in recent years and reaching values above 1.0 ^oC in 2020. At the same time, precipitation has also decreased in recent years — which combined with the increase in temperature will have severe impacts on water availability that need to be analyzed in detail, for example at the watershed level. This new database provides a tool to quantify — in detail — the spatio-temporal variability of climate throughout Mexico.

The MexHiResClimDB entire dataset is available on Figshare (DOI:10.6084/m9.figshare.c.7689428, Carrera-Hernández (2025a)).

Received: 23 Feb 2025 – Discussion started: 17 Mar 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this preprint. The responsibility to include appropriate place names lies with the authors.

Download & links

Jaime J. Carrera-Hernández

Status: final response (author comments only)

RC1:
'Comment on essd-2025-100', Anonymous Referee #1, 30 May 2025
General comments: The author presents a newly developed gridded, high-resolution climate dataset comprised of daily, monthly and yearly precipitation and temperature for Mexico that they have developed using stations data and Kriging with External Drift on a local neighborhood (KEDl) interpolation. The study presents a new dataset that can be very useful in understanding different aspects of climate change and its impacts at relatively high spatial and temporal resolution across Mexico.
This dataset appears to be a major improvement over existing datasets for Mexico. The station counts for the 1999-2020 period are much higher than in CRU and GPCC datasets; it would be helpful to demonstrate that they are higher than the Daymet dataset. We conducted a visual comparison of the MexiHiRes January and July 1981-2020 normals to the WorldClim, CHELSA, Daymet, and US PRISM products, and found that the MexiHiRes surfaces were relatively free of artefacts and appeared to be more credible than WorldClim, CHELSA, and Daymet, and generally more compatible with the adjacent US PRISM normals.
Our main concerns relate to the filtering of the original datasets—specifically, the criteria used, the quality of the input data, and the details of the interpolation method. The data and methods section needs substantial revision. Clear explanations, potentially a flowchart, and a discussion of how each step in the process might influence the results would be highly beneficial. The validation and comparison presented are limited and only applied to specific extreme cases. The manuscript lacks discussion of potential limitations of the datasets. It is not sufficiently demonstrated that the newly generated datasets are better than existing alternatives, despite multiple claims by the author, though we are confident this can be done with more convincing analyses.
Although this paper requires major revisions, it appears that this is an exciting and necessary improvement to climate data available for Mexico, and we commend the author for the effort.
Major comments:
Description of method: The author mentions in multiple instances that this is better data. However, the methods is not clear on the process of interpolation of stations data. How the observation data, how the interpolation works, what are benefits or limitations of the method they choose and why they chose the method need to be clearly explained.

Validation and comparison: The intercomparison with other datasets wasn’t convincing. The author needs to make a more comprehensive comparison of their new dataset to other datasets, not only for selected extreme events. Further, they have to present how their data represents climate differently than other datasets and why it is so. I also suggest that the author discusses their methodological and data limitations.

Climate normals: The focus of this paper is on the daily time series. However, the climate normals are also an important contribution and deserve some description, validation, and intercomparison.

Content Refinement: The Introduction is ambiguous and there are several instances when concise refinement of the writing is required. They have to present the need for this work, and how this dataset will be better than existing datasets. Overall writing can be more concise.

Other comments:
In the comments below. L indicate line number, * indicate major comments.
L3: remove terms like largest , highest etc and provide specific values like what temporal coverage, especially in abstract.

*L10-17: not clear if the author is just presenting these values or saying that these values are more realistic than values from other datasets. I am not clear what the author is trying to say by saying ‘a summary that was not available before’. It is available from other datasets like ERA5land. Reliability of the values can be different, but it is available.

L12-17: for which period are these values true, 1951-2020? specify.

L18-19: generic

L39: define for the first use

L46-48: redundant

L46 -48: superfluous. This whole paragraph can be summarized to few sentences, much of the details may not be relevant to the paper

*L63 (whole paragraph): I suggest that the author create a table with name of dataset, resolution ( spatial , temporal), data period, region and citation . such table will give this exact information from this and previous paragraphs and will be more succinct

L75: “monthly surfaces of precipitation, Tmin and Tmax for the 1910–2009 period (i.e. 12 surfaces in total)”. If these were normals, wouldn’t this be 36 surfaces? But also, the term “monthly surfaces” is ambiguous about whether it is a monthly gridded time series or climatological average/normal. As suggested, and table would be more succinct and explicit.

*L80: what about Daymet / ERA5 or other regional and global datasets that covers Mexico and available at daily resolution but may not be as same spatial resolution as this dataset ? However, What problem their spatial resolution creates that this MexHiresClimDB will better address ? Need to explain why this work is important and what value it adds in more detail.

L93: use numeric for XXth

L95: what is the link between the description of study area and the datasets the author developed ?

L97-99: move this paragraph after the data description

L100: what aforementioned refer to ? Provide the name of dataset. Throughout the paper try to use ‘aforementioned’ less often as it creates confusion

L100-101 : rewrite: how many stations originally were there , how many were used ? what are the criteria to select the stations

L108: what might have caused less number of stations after 2021 ?

*Figure 2. The station coverage in this dataset is much better than in the CRUts and GPCC datasets, which report a dramatic (>90%) decline in station coverage after 1998. This is a compelling advantage of the MexiHiRes product and it would be useful to highlight it.

*L110 onwards to paragraph: need more description of the process and how this method works. For example, what happens to temperature lapse rate and what value is used ? A flowchart with all the steps involved from data collection to final output will be useful.

*L125-128: can the author show how sensitive their used values are ?

L129-133: can be more succinct

L134: recommend re-expressing 1.227×10⁶ as 28 months so readers don’t have to do the math. It’s a lot of computation!

L129: Is 26GB minimum requirement? not clear

L146: “if R2=0.80, then the model explains 80% of the variability”. Reword: it explains 80% of variance, not variability. since variance is calculated from squared deviations, this interpretation will overestimate the amount of variability/dispersion that is explained.

L170: is it value or variable, be specific.

L172: it’s true that MAE for raw precipitation (in mm) is meaningless, since an error of 50mm has a different significance in a wet vs dry climate. however, MAE is meaningful if precipitation is log-transformed prior to analysis of error. By the same token, I would expect that the other metrics (R-square, COE, and IOA) are confounded by raw precipitation values in mm; wouldn’t they primarily represent wetter regions—and wet anomalies—where absolute errors (in mm) are larger.

*L200-239: I suggest the author move the description of these datasets to introduction and summarize there. Here they directly present the results of comparison.

*L290-291: This is not well justified statement but based on selected extreme event and a single coefficient. How this statement compares with what they presented in figure 5. I recommend performing relative Root mean Square Error to explore relative goodness of each dataset for both temperature and precipitation

Fig 7. The panels are labelled a,b,c,b,c.

Fig 7. It is very hard to see the distributions at typical printing size. Recommend altering the color scheme to better stand out against the background.

L311: not clear what the author is trying to say here

*L325-326: similar comments for precipitation and temperature. I suggest the author adds average comparison as well as relative comparison among the datasets to provide better perception on how their new dataset is better than existing datasets, if it is.

L328: no need to write the full form again, be consistent

L330: “it is not possible to obtain cross-validation values for L15 or Daymet”. This is a reason to use some other metric for intercomparison.

L335: “neither L15 nor Daymet were capable of showing the temperature extremes that were obtained through the MexHiResClimDB”. This is not apparent from figure 9. The maximum values seem similar to L15.

Figure 9: it is not clear what the author is trying to illustrate from these plots. It is already clear that their data has longer temporal coverage across Mexico than other mentioned datasets. It does not provide information on how better their dataset is compared to others.

Figure 11: describe what climate stripes bars represent and what additional information they provide other than the anomaly plots. Otherwise remove

L393-L395: this has only been demonstrated for a few specific events, so the statement should likely be qualified as such.
Citation: https://doi.org/10.5194/essd-2025-100-RC1
- AC2: 'Reply on RC1', Jaime J. Carrera-Hernandez, 14 Jun 2025
  
  I thank RC1 for the thorough review and the time taken to write it. Please find attached my reply.
  
  Citation: https://doi.org/10.5194/essd-2025-100-AC2
RC2:
'Comment on essd-2025-100', Anonymous Referee #2, 05 Jun 2025

The author develops a complete daily gridded dataset of precipitation and temperature for Mexico at a very high resolution considering the extension of the spatial domain. The research is mostly correct; however, I have some concerns about the method used to develop the daily grid and the validation process.
The choice of an interpolation method is not easy, especially for precipitation, since it can yield very different results depending on the method and the parameters. The KED is a trustworthy option when used for a single timestep, for example to show the daily precipitation/temperature in one day/event (as shown in the examples of extreme events). However, when you apply the same method for a long-term climate time series without further corrections, you are introducing temporal biases that can lead to unwanted inhomogeneities both in temporal and spatial dimensions. This is not new and it is a basic assumption in gridded datasets creation as shown in the wide (not cited) scientific literature (e.g. https://doi.org/10.1002/wat2.1555, https://doi.org/10.1002/joc.1322,https://doi.org/10.1029/2008JD010100, https://doi.org/10.1559/152304085783914686 ). I recommend the author to make a deeper review on the requirements for creating reliable grids, starting by some of the datasets that are cited in the article. The actual problem with creating a single grid for each day, independently from the previous and following, is that the number and location of neighboring observations change with time, and that lead to biases that have a significant impact in, for example the analysis of trends or even in the aggregation at coarser temporal scales. In this regard, my second main concern is the validation approach.
It is ok to check the errors by series with the proposed statistical tests, however, you are comparing the complete series of observations with their corresponding predictions without considering the number of missing values, the differences by elevation ranges, by months, seasons, etc. It is difficult to see biases that are useful to interpret how reliable are the results. In addition, the kriging process usually provides a variance dataset, measuring the uncertainty associated with the prediction at every specific location. It would be useful to see an associated gridded dataset with the error/uncertainty for each day and variable, as done in many other datasets, to account for the reliability of each prediction and let the user decide how to use the information.

Here are the minor and specific comments, line by line:
Introduction:
L30: “along the migratory route of the Monarch butterfly” Maybe this is too specific.
L44: Terraclimate is regularly updated and now it is available until 2024.
L66: For CONUS, I think that PRISM deserves to be mentioned since it was one of the first and still one of the more reliable gridded datasets (https://prism.oregonstate.edu)
Methods:
L101-102: how many stations was the final number?
L108: regarding the outliers, wasn't any additional quality control performed? There's a lot of scientific literature on this.
L108-109: how many stations were discarded?
L110: As mentioned before about the use of KED independently for each day, it can generate further problems in long-term trends and temporal aggregations. Also, why 30 nearest stations and a 140km radius? The number of observations can greatly vary under these conditions. Lastly, was the internal coherence of temperature (TMAX>TMIN) checked after the interpolation, for each day? The above problems are especially important if a proper quality control was not performed to control the spatial coherence of the data (and I read nothing in this regard).
L120-121: Is the code publicily available?
Validation:
I expected a more complete validation since here only daily data considering the whole series was checked. For example, how the interpolation worked at different elevation ranges? or in different months? Did the method correctly predict the number of dry/wet days? Are monthly (or other) averages and standard deviation fit between predicted and observed values? These are the basic checks for any gridded dataset.
Figure 5: I am not sure how to interpret these graphics since, for example, R2 needs complete series of predictions and observations to be compared but here you have one value per day/month/year
L242: why not comparing monthly or annual aggregates? or even trends? that would be more useful than comparing extreme events, which are not common (by definition) and the users may need a more regular use of the dataset.
L274-277: This comparison is not fair since you’re comparing predictions with observations but only in the case of your dataset you know that the observation does not participate in the interpolation, but not in the rest of datasets. Furthermore, not all of them were built with the same observations, so it is hard to justify better results on your dataset.
L298: what this function does?
L300: what was the threshold for considering a dry day? 0 mm / 0.1 mm / 0.001 mm?
L309: I dont see tendency in that table
L326-327: again, this is not a validation, just a comparison with other datasets that were not constructed with the same procedure. The only validation must be with the observations.
L332-335: a visual comparison does not guarantee a correct validation.
L347: Fig 10 shows absolute values, but this is not trends. If you want to show trends you should calculate some statistics (Mann Kendall, Sen’s slope) with their corresponding reliability value (p-value).
L350-351: this is not an acceptable way to indicate that there is a trend. Without statistical validation, this complete section must be removed.

Citation: https://doi.org/10.5194/essd-2025-100-RC2
- AC3: 'Reply on RC2', Jaime J. Carrera-Hernandez, 14 Jun 2025
  
  I thank RC2 for the thorough review and the time taken to write it. Please find attached my reply.
  
  Citation: https://doi.org/10.5194/essd-2025-100-AC3
AC1:
'Comment on essd-2025-100', Jaime J. Carrera-Hernandez, 13 Jun 2025

I would like to thank both reviewers for their thorough reviews and their positive comments on this new database. I think that the following paragraphs address their general comments, but have also commented their suggestions and observations.
To improve the validation section (which is an issue raised by both reviewers) I will use data from automatic stations operated by Mexico’s Meteorological Service located throughout the country (although their spatial and temporal coverage is limited). I plan to compare daily, monthly and yearly data from 2010, 2013 and 2015 (these stations started to be deployed around 2005, although some started to operate in 2010).
I will follow the suggestion given by R1 to add relative comparisons between the datasets. Although the focus of the MExHiResClimDB is daily data, the monthly, yearly and normal values are also important. I will use these aggregation times to compare the values provided by the MexHiresCLimDB with other datasets, as suggested by R1; this will also address the question raised by R2
I will also create a figure to show the MexHiResClimDB normals along the normals obtained with other gridded datasets, as suggested by R01 who “conducted a visual comparison of the MexiHiRes January and July 1981-2020 normals to the WorldClim, CHELSA, Daymet, and US PRISM products, and found that the MexiHiRes surfaces were relatively free of artefacts and appeared to be more credible than WorldClim, CHELSA, and Daymet, and generally more compatible with the adjacent US PRISM normals.”
I will add a flowchart to show the work flow that was followed to develop this database (starting from how the data were downloaded, processed and the R scripts used within GRASS to undertake the interpolations).
In addition, I have already modified Fig. 10 and it now includes the five year moving average of the monthly and yearly anomalies (wrt 1961-1990) of the four climate variables in order to show if the months are warming up or not.

Citation: https://doi.org/10.5194/essd-2025-100-AC1
- AC4: 'Reply on AC1', Jaime J. Carrera-Hernandez, 14 Jun 2025
  
  I am attaching a modified version of Figure 10 in order to show the five-year moving average of monthly and yearly anomalies of Tmax, Tavg and Tmin.
  
  Citation: https://doi.org/10.5194/essd-2025-100-AC4

Jaime J. Carrera-Hernández

Data sets

Mexico's High Resolution Climate Database (MexHiResClim): Daily Tmin for 1951-2020. J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462808

Mexico's High Resolution Climate Database (MexHiResClim): Daily Tavg for 1951-2020. J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462835

Mexico's High Resolution Climate Database (MexHiResClim): Daily Tmax for 1951-2020 J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462820

Mexico's High Resolution Climate Database (MexHiResClim): Daily Precip for 1951-2020 J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462796

Mexico's High Resolution Climate Database (MexHiResClim): Monthly Tmin for 1951-2020 J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28124789

Mexico's High Resolution Climate Database (MexHiResClim): Monthly Tavg for 1951-2020 J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462769

Mexico's High Resolution Climate Database (MexHiResClim): Monthly Tmax for 1951-2020 J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462679

Mexico's High Resolution Climate Database (MexHiResClim): Monthly Precip for 1951-2020 J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28462787

Mexico's High Resolution Climate Database (MexHiResClim): Yearly data for Tmin, Tavg, Tmax and Precipitation J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28074998

Mexico's High Resolution Climate Database (MexHiResClim): Monthly and yearly normals (1951-1980) for Tmin, Tavg, Tmax and Precipitation J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28464398

Mexico's High Resolution Climate Database (MexHiResClim): Monthly and yearly normals (1961-1990) for Tmin, Tavg, Tmax and Precipitation J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28464458

Mexico's High Resolution Climate Database (MexHiResClim): Monthly and yearly normals (1971-2000) for Tmin, Tavg, Tmax and Precipitation J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28464461

Mexico's High Resolution Climate Database (MexHiResClim): Monthly and yearly normals (1981-2010) for Tmin, Tavg, Tmax and Precipitation J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28464488

Mexico's High Resolution Climate Database (MexHiResClim): Monthly and yearly normals (1991-2020) for Tmin, Tavg, Tmax and Precipitation J. J. Carrera-Hernández https://dx.doi.org/10.6084/m9.figshare.28074998

Jaime J. Carrera-Hernández

Viewed

Total article views: 789 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
606	158	25	789	14	27

HTML: 606
PDF: 158
XML: 25
Total: 789
BibTeX: 14
EndNote: 27

Views and downloads (calculated since 17 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	203	20	3	226
Apr 2025	140	15	4	159
May 2025	87	28	3	118
Jun 2025	112	33	13	158
Jul 2025	55	51	2	108
Aug 2025	9	11	0	20

Cumulative views and downloads (calculated since 17 Mar 2025)

Month	HTML	PDF	XML	Total
Mar 2025	203	20	3	226
Apr 2025	140	15	4	159
May 2025	87	28	3	118
Jun 2025	112	33	13	158
Jul 2025	55	51	2	108
Aug 2025	9	11	0	20

Viewed (geographical distribution)

Total article views: 775 (including HTML, PDF, and XML) Thereof 775 with geography defined and 0 with unknown origin.

Country	#	Views	%

Discussed

Latest update: 10 Aug 2025

Short summary

Mexico's High Resolution Database (MexHiResClimDB) provides gridded, high-resolution data (600 m) of daily, monthly and yearly precipitation and T_min, T_max, T_avg for the 1951–2020 period. With this new database it was possible to summarize extreme events of precipitation and temperature in Mexico and to show that there is an undeniable warming trend in Mexico; however, further studies are needed in order to pinpoint the areas where climate change is having a profound impact.


Total:	0
HTML:	0
PDF:	0
XML:	0