The first rainfall erosivity database in Mexico: facing challenges of leveraging legacy climate data

Varón-Ramírez, Viviana Marcela; Gómez-Latorre, Douglas Andrés; Arroyo-Cruz, Carlos Eduardo; Gómez-Tagle, Alberto; Prado Pano, Blanca Lucía; Gutierrez Llantoy, Ronald Roger; Lobo-Luján, Deyanira; Guevara, Mario

doi:10.5194/essd-2025-306

Preprints

https://doi.org/10.5194/essd-2025-306

Preprints

05 Aug 2025

| 05 Aug 2025

Status: a revised version of this preprint was accepted for the journal ESSD and is expected to appear here in due course.

The first rainfall erosivity database in Mexico: facing challenges of leveraging legacy climate data

Viviana Marcela Varón-Ramírez, Douglas Andrés Gómez-Latorre, Carlos Eduardo Arroyo-Cruz, Alberto Gómez-Tagle, Blanca Lucía Prado Pano, Ronald Roger Gutierrez Llantoy, Deyanira Lobo-Luján, and Mario Guevara

Abstract. Soil water erosion (SWE) is the dominant soil degradation driver on a global scale. For quantifying SWE, erosivity is an index that reflects the potential (i.e., the energy) of rainfall to cause SWE. To support large-scale SWE studies and the assessment of the SWE process at the national scale in Mexico, the objectives of this research are a) to develop the first Mexican rainfall time series database for three climate normals CNs (1968–1997, 1978–2007, and 1988–2017) leveraging legacy climate data, and b) to estimate rainfall erosivity across continental Mexico by using daily rainfall time series. The workflow has three methodological moments: 1) development of the daily rainfall time series database, 2) identification of the best empirical relationship to estimate daily rainfall erosivity, and 3) estimation of the rainfall erosivity across Mexican territory. We compiled and harmonized 5410 rainfall time series (RTS) well distributed across the Mexican territory. We perform quality control and assurance, homogeneity analysis (using the normal homogeneity test), and the data gap-filling process (using the proportion method). Then, we tested three combinations of the α and β coefficients, proposed by three authors, in a power model to estimate rainfall erosivity; in this step, we used three validation databases (global, national, and local scales). Finally, we estimated the annual rainfall erosivity for all three CNs with multiple combinations of α and β coefficients. As principal results, the new database includes 1370, 1678, and 1676 RTS for each CN and its corresponding rainfall erosivity. The best parameter combination is the one proposed by Richardson et al. (1983) for all three validation databases. For the global and national databases, we observe a positive bias (Mean error of 956 and 324 MJ mm ha^-1 h^-1 yr^-1, respectively); in contrast, for the local database, results show a negative and higher bias (Mean error of -3699 MJ mm ha^-1 h^-1 yr^-1). About the erosivity estimation across the Mexican territory, the median values for rainfall erosivity for the three CNs were 3245, 3070, and 3327 MJ mm ha^-1 h^-1 yr^-1, respectively. The statistical distribution of the erosivity values was right-skewed for the three CNs, with high erosivity values reaching >12000 MJ mm ha^-1 h^-1 yr^-1 in all three CNs. The behavior throughout the year of the rainfall erosivity was similar for the three CNs. However, September had the highest contribution to the rainfall erosivity. The new database provides daily climatological data and analysis across Mexican territory through a multi-year period (1968 to 2017). Rainfall erosivity results support the study of SWE at the national scale by identifying areas with higher susceptibility to soil loss due to rainfall action and providing a more spatially dense and well-documented rainfall erosivity database. Following the FAIR principles (Findability, Availability, Interoperability, and Reproducibility) for scientific data, this database is available from a scholarly accepted repository https://doi.org/10.6073/pasta/e0dc8bd3501f8c19bb750e853c3289cb (Varón-Ramírez et al., 2025) for public consultation.

Received: 22 May 2025 – Discussion started: 05 Aug 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Status: closed

RC1:
'Comment on essd-2025-306', Anonymous Referee #1, 07 Jan 2026

I would first like to thank the authors for the effort invested in compiling and ‘harmonizing’ multiple historical data sources, which are not always easily accessible, particularly for researchers who do not work in Mexico. I also appreciate that the manuscript focuses not so much on a purely historical climatic analysis, but rather on how the R factor (rainfall erosivity) is estimated, whose utility—and potential future users of this dataset—is a key aspect of the work.
The manuscript “The first rainfall erosivity database in Mexico: facing challenges of leveraging legacy climate data” by Viviana Marcela Varón-Ramírez and colleagues provides a detailed dataset of historical precipitation time series for Mexico, applied to the estimation of rainfall erosivity. The dataset itself, as well as the calibration using several available empirical models, is interesting and represents a solid starting point. However, the applicability of the dataset would benefit from being presented more clearly, particularly in terms of its potential users and intended applications.
I do not have major concerns regarding the core content of the manuscript, and I have provided specific comments throughout the text that I hope the authors will find helpful and intuitive to address. My main concern—which may require more substantial work—does not relate to the dataset itself or its calibration, but rather to the discussion section. In its current form, the discussion is unsatisfactory and does not allow the reader to properly assess the potential usefulness or relevance of the dataset.
The discussion needs to be completely restructured in a more organized and focused manner, selecting and developing the strongest points of the article (some suggestions are provided in the annotated manuscript). In this sense, I consider and expect that this manuscript will be accepted subject to the revisions (they fall between minor and majors) so if authors handle the chaotic way in which they currently present their results I believe this paper can make in through and be a valuable asset for people in need of R-factor data/maps, etc.
If possible, it would be highly valuable for the authors to incorporate the suggestions provided in the manuscript so that the significant effort invested in compiling historical data for Mexico can be communicated more clearly and effectively. Revising the discussion may also require supporting it with a broader range of references than are currently included, depending on the final focus the authors choose to adopt.

Specific comments:
Authors can follow up -in my opinion- my comments and suggestions in a better way when they check their original manuscrit with anotated comments. I hope the editor finds this suitable given this manuscript structure of a ‘dataset-paper-like’.

typing errors are shortlisted within the original manuscript

Citation: https://doi.org/10.5194/essd-2025-306-RC1
- AC1: 'Reply on RC1', Viviana Marcela Varón Ramírez, 02 May 2026
  
  We thank the reviewer for their comments. We have addressed each comment, and our responses have been compiled in the attached PDF.
  
  Citation: https://doi.org/10.5194/essd-2025-306-AC1
RC2:
'Comment on essd-2025-306', Anonymous Referee #2, 06 Apr 2026

1. Significance

This manuscript presents the first national-scale daily rainfall time series and rainfall erosivity database for Mexico, covering three climate normals (1968-1997, 1978-2007, and 1988-2017). Mexico lacks publicly available sub-hourly rainfall data at the national level, which makes this contribution genuinely unique: no comparable effort exists in terms of spatial coverage, daily temporal resolution, rigorous quality control, and length of the study period for this country.
The database has broad potential uses: as input to the USLE/RUSLE model for national-scale soil loss estimates, as a reference for validating global products (GloREDa, GloRESatE), for climate trend studies, hydrological modelling, territorial planning, and soil conservation. The authors explicitly identify these applications, which is appropriate.
Two aspects deserve attention regarding significance:

The coverage of the Mediterranean California ecoregion is very limited (absent in CN3, very sparse in CN1 and CN2). The authors acknowledge this, but it is not clear to what extent this limits the database's utility for that region.

The most recent period available ends in 2017. Given that the SMN source data were available up to 2022, the authors should explain in more detail why the climate normals are not extended or why a more recent supplementary period is not included.
2. Data Quality

The methodological workflow is robust: quality control, homogeneity analysis (Alexandersson standard normal homogeneity test), data gap-filling (proportions method using the climatol package), and subsequent validation (McCuen test, 10% threshold). The use of a WMO-recommended framework (WMO 2020, 2023) is a significant strength. RMSE estimates for the gap-filling process are provided by ecoregion and month (Table A4).
However, the following issues must be addressed:
2.1 Uncertainty propagation

The highest RMSE values correspond to the Tropical Rain Forest ecoregion (up to 12.11 mm in October for CN3) and the Great Plains in CN3 (13.17 mm in July). These values are considerable. The authors should discuss their implications for the quality of erosivity estimates in those ecoregions, given that erosivity depends non-linearly on precipitation (power model with β ≈ 1.81). No uncertainty estimates propagated from the gap-filling error to the final R factor are provided. This would be an important addition for end users.
2.2 Numerical inconsistencies

Several numerical inconsistencies in the RTS counts must be resolved:

The abstract reports 1,678 RTS for CN2, but Table 1 shows a total of 1,679, and Section 3.1 states 1,676. This inconsistency appears in multiple locations and must be corrected and unified throughout the manuscript.

For CN3, the abstract cites 1,676 RTS, while Table 1 indicates 1,683, and Section 3.1 also states 1,683. The authors must review and reconcile these figures.

Table A3 shows "RS for data gap-filling process" as 1,479 / 1,776 / 1,723, whereas Table 1 shows 1,479 / 1,774 / 1,721 as "RTS before data gap-filling". This discrepancy of 2 units for CN2 and CN3 is unexplained.
2.3 Data accessibility

The data are deposited in the Environmental Data Initiative (EDI) repository with a permanent DOI (https://doi.org/10.6073/pasta/e0dc8bd3501f8c19bb750e853c3289cb), and the R code is available via Zenodo. This complies with the FAIR principles. It is strongly recommended that the authors:

Include a detailed README file in the EDI repository describing variables, units, missing value conventions, and known limitations.

Clarify whether gap-filled values are distinguished from original observations in the released files (e.g., via a flag column indicating imputed vs. observed values). This is essential for users conducting trend analyses.
3. Presentation Quality

The manuscript is well organised and follows a logical structure. The workflow diagram (Figure 2) is helpful for understanding the methodological process. The figures are generally of good quality. Figure 8 (R factor map for CN3) and Figure 9 (comparison with Panagos et al., 2017) are informative. However, the following aspects need attention:
Figures:

Figure 6 (verification of the three models) has four panels. The caption states that panel (d) corresponds to the Michoacán database. However, the x-axis is labelled "EI30 Factor" when the values shown are actually EI30 estimated from daily-resolution data — not from high-resolution sub-hourly data. This may cause confusion and the legend/caption should be clarified.

Figures A3a, A3b, and A3c show the number of locations with erosive rainfall for each day of the year across the three climate normals, but the text only discusses CN3 in detail. A more explicit comparative discussion among the three normals would be beneficial.
Language and abbreviations:

Lines 99-100: "The country is located between latitudes 14°W and 32°N and longitudes 86°W and 118°W" -latitude values cannot be expressed in °W. This appears to be a typographical error (should read °N).

The abbreviations "RTS" and "RS" are used interchangeably throughout the manuscript to refer to the same concept (rainfall time series). These must be unified.
References:

The reference list is broad and appropriate. However, Cortés (1991) -a master's thesis- serves as the national-scale validation database and plays a central role in the study. Given its importance, the authors should ensure that this source is either fully accessible or that the methods used to obtain the data from it are described in greater detail.
4. Re-usability of the Dataset

Based on the information provided in the manuscript and the data deposited in EDI, a user with basic knowledge of R and climatology would be able to re-use the database. The code available in Zenodo facilitates reproducibility. The database columns are described in Section 7. Nonetheless, the following improvements are recommended to facilitate re-use:

Include a detailed README file in the EDI repository.

Clearly flag imputed versus observed values in the gap-filled daily time series files.
Summary of Recommendations

Major (must be addressed before acceptance):

Resolve all numerical inconsistencies in RTS counts across the abstract, main text, and tables.

Discuss uncertainty propagation from gap-filling errors to the R factor, especially for ecoregions with high RMSE.

Correct the typographical error in latitude/longitude notation at lines 99–100.
Minor (recommended):

Include a detailed README file in the EDI repository.

Clarify the availability and accessibility of the Cortés (1991) dataset.

Unify the use of abbreviations (RTS vs. RS) throughout the manuscript.

Expand the comparative discussion among climate normals in Appendix Figures A3a–c.

Improve the legend of Figure 6d to avoid ambiguity regarding temporal resolution.

Citation: https://doi.org/10.5194/essd-2025-306-RC2
- AC2: 'Reply on RC2', Viviana Marcela Varón Ramírez, 02 May 2026
  
  We thank the reviewer for their comments. We have addressed each comment, and our responses have been compiled in the attached PDF
  
  Citation: https://doi.org/10.5194/essd-2025-306-AC2

Status: closed

RC1:
'Comment on essd-2025-306', Anonymous Referee #1, 07 Jan 2026

I would first like to thank the authors for the effort invested in compiling and ‘harmonizing’ multiple historical data sources, which are not always easily accessible, particularly for researchers who do not work in Mexico. I also appreciate that the manuscript focuses not so much on a purely historical climatic analysis, but rather on how the R factor (rainfall erosivity) is estimated, whose utility—and potential future users of this dataset—is a key aspect of the work.
The manuscript “The first rainfall erosivity database in Mexico: facing challenges of leveraging legacy climate data” by Viviana Marcela Varón-Ramírez and colleagues provides a detailed dataset of historical precipitation time series for Mexico, applied to the estimation of rainfall erosivity. The dataset itself, as well as the calibration using several available empirical models, is interesting and represents a solid starting point. However, the applicability of the dataset would benefit from being presented more clearly, particularly in terms of its potential users and intended applications.
I do not have major concerns regarding the core content of the manuscript, and I have provided specific comments throughout the text that I hope the authors will find helpful and intuitive to address. My main concern—which may require more substantial work—does not relate to the dataset itself or its calibration, but rather to the discussion section. In its current form, the discussion is unsatisfactory and does not allow the reader to properly assess the potential usefulness or relevance of the dataset.
The discussion needs to be completely restructured in a more organized and focused manner, selecting and developing the strongest points of the article (some suggestions are provided in the annotated manuscript). In this sense, I consider and expect that this manuscript will be accepted subject to the revisions (they fall between minor and majors) so if authors handle the chaotic way in which they currently present their results I believe this paper can make in through and be a valuable asset for people in need of R-factor data/maps, etc.
If possible, it would be highly valuable for the authors to incorporate the suggestions provided in the manuscript so that the significant effort invested in compiling historical data for Mexico can be communicated more clearly and effectively. Revising the discussion may also require supporting it with a broader range of references than are currently included, depending on the final focus the authors choose to adopt.

Specific comments:
Authors can follow up -in my opinion- my comments and suggestions in a better way when they check their original manuscrit with anotated comments. I hope the editor finds this suitable given this manuscript structure of a ‘dataset-paper-like’.

typing errors are shortlisted within the original manuscript

Citation: https://doi.org/10.5194/essd-2025-306-RC1
- AC1: 'Reply on RC1', Viviana Marcela Varón Ramírez, 02 May 2026
  
  We thank the reviewer for their comments. We have addressed each comment, and our responses have been compiled in the attached PDF.
  
  Citation: https://doi.org/10.5194/essd-2025-306-AC1
RC2:
'Comment on essd-2025-306', Anonymous Referee #2, 06 Apr 2026

1. Significance

This manuscript presents the first national-scale daily rainfall time series and rainfall erosivity database for Mexico, covering three climate normals (1968-1997, 1978-2007, and 1988-2017). Mexico lacks publicly available sub-hourly rainfall data at the national level, which makes this contribution genuinely unique: no comparable effort exists in terms of spatial coverage, daily temporal resolution, rigorous quality control, and length of the study period for this country.
The database has broad potential uses: as input to the USLE/RUSLE model for national-scale soil loss estimates, as a reference for validating global products (GloREDa, GloRESatE), for climate trend studies, hydrological modelling, territorial planning, and soil conservation. The authors explicitly identify these applications, which is appropriate.
Two aspects deserve attention regarding significance:

The coverage of the Mediterranean California ecoregion is very limited (absent in CN3, very sparse in CN1 and CN2). The authors acknowledge this, but it is not clear to what extent this limits the database's utility for that region.

The most recent period available ends in 2017. Given that the SMN source data were available up to 2022, the authors should explain in more detail why the climate normals are not extended or why a more recent supplementary period is not included.
2. Data Quality

The methodological workflow is robust: quality control, homogeneity analysis (Alexandersson standard normal homogeneity test), data gap-filling (proportions method using the climatol package), and subsequent validation (McCuen test, 10% threshold). The use of a WMO-recommended framework (WMO 2020, 2023) is a significant strength. RMSE estimates for the gap-filling process are provided by ecoregion and month (Table A4).
However, the following issues must be addressed:
2.1 Uncertainty propagation

The highest RMSE values correspond to the Tropical Rain Forest ecoregion (up to 12.11 mm in October for CN3) and the Great Plains in CN3 (13.17 mm in July). These values are considerable. The authors should discuss their implications for the quality of erosivity estimates in those ecoregions, given that erosivity depends non-linearly on precipitation (power model with β ≈ 1.81). No uncertainty estimates propagated from the gap-filling error to the final R factor are provided. This would be an important addition for end users.
2.2 Numerical inconsistencies

Several numerical inconsistencies in the RTS counts must be resolved:

The abstract reports 1,678 RTS for CN2, but Table 1 shows a total of 1,679, and Section 3.1 states 1,676. This inconsistency appears in multiple locations and must be corrected and unified throughout the manuscript.

For CN3, the abstract cites 1,676 RTS, while Table 1 indicates 1,683, and Section 3.1 also states 1,683. The authors must review and reconcile these figures.

Table A3 shows "RS for data gap-filling process" as 1,479 / 1,776 / 1,723, whereas Table 1 shows 1,479 / 1,774 / 1,721 as "RTS before data gap-filling". This discrepancy of 2 units for CN2 and CN3 is unexplained.
2.3 Data accessibility

The data are deposited in the Environmental Data Initiative (EDI) repository with a permanent DOI (https://doi.org/10.6073/pasta/e0dc8bd3501f8c19bb750e853c3289cb), and the R code is available via Zenodo. This complies with the FAIR principles. It is strongly recommended that the authors:

Include a detailed README file in the EDI repository describing variables, units, missing value conventions, and known limitations.

Clarify whether gap-filled values are distinguished from original observations in the released files (e.g., via a flag column indicating imputed vs. observed values). This is essential for users conducting trend analyses.
3. Presentation Quality

The manuscript is well organised and follows a logical structure. The workflow diagram (Figure 2) is helpful for understanding the methodological process. The figures are generally of good quality. Figure 8 (R factor map for CN3) and Figure 9 (comparison with Panagos et al., 2017) are informative. However, the following aspects need attention:
Figures:

Figure 6 (verification of the three models) has four panels. The caption states that panel (d) corresponds to the Michoacán database. However, the x-axis is labelled "EI30 Factor" when the values shown are actually EI30 estimated from daily-resolution data — not from high-resolution sub-hourly data. This may cause confusion and the legend/caption should be clarified.

Figures A3a, A3b, and A3c show the number of locations with erosive rainfall for each day of the year across the three climate normals, but the text only discusses CN3 in detail. A more explicit comparative discussion among the three normals would be beneficial.
Language and abbreviations:

Lines 99-100: "The country is located between latitudes 14°W and 32°N and longitudes 86°W and 118°W" -latitude values cannot be expressed in °W. This appears to be a typographical error (should read °N).

The abbreviations "RTS" and "RS" are used interchangeably throughout the manuscript to refer to the same concept (rainfall time series). These must be unified.
References:

The reference list is broad and appropriate. However, Cortés (1991) -a master's thesis- serves as the national-scale validation database and plays a central role in the study. Given its importance, the authors should ensure that this source is either fully accessible or that the methods used to obtain the data from it are described in greater detail.
4. Re-usability of the Dataset

Based on the information provided in the manuscript and the data deposited in EDI, a user with basic knowledge of R and climatology would be able to re-use the database. The code available in Zenodo facilitates reproducibility. The database columns are described in Section 7. Nonetheless, the following improvements are recommended to facilitate re-use:

Include a detailed README file in the EDI repository.

Clearly flag imputed versus observed values in the gap-filled daily time series files.
Summary of Recommendations

Major (must be addressed before acceptance):

Resolve all numerical inconsistencies in RTS counts across the abstract, main text, and tables.

Discuss uncertainty propagation from gap-filling errors to the R factor, especially for ecoregions with high RMSE.

Correct the typographical error in latitude/longitude notation at lines 99–100.
Minor (recommended):

Include a detailed README file in the EDI repository.

Clarify the availability and accessibility of the Cortés (1991) dataset.

Unify the use of abbreviations (RTS vs. RS) throughout the manuscript.

Expand the comparative discussion among climate normals in Appendix Figures A3a–c.

Improve the legend of Figure 6d to avoid ambiguity regarding temporal resolution.

Citation: https://doi.org/10.5194/essd-2025-306-RC2
- AC2: 'Reply on RC2', Viviana Marcela Varón Ramírez, 02 May 2026
  
  We thank the reviewer for their comments. We have addressed each comment, and our responses have been compiled in the attached PDF
  
  Citation: https://doi.org/10.5194/essd-2025-306-AC2

Data sets

Daily rainfall series and rainfall erosivity in Mexico for three climatic normals (1968-1997, 1978-2007, and 1988-2017) V. M. Varón-Ramírez et al. https://doi.org/10.6073/pasta/e0dc8bd3501f8c19bb750e853c3289cb

Model code and software

Rainfall-Erosivity-Mexico: Rainfall- Erosivity-Mexico V. M. Varón-Ramírez https://doi.org/10.5281/zenodo.15468097

Viewed

Total article views: 1,859 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
1,499	295	65	1,859	70	110

HTML: 1,499
PDF: 295
XML: 65
Total: 1,859
BibTeX: 70
EndNote: 110

Views and downloads (calculated since 05 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	288	36	8	332
Sep 2025	816	10	8	834
Oct 2025	66	31	4	101
Nov 2025	45	27	3	75
Dec 2025	43	29	11	83
Jan 2026	34	22	5	61
Feb 2026	24	36	7	67
Mar 2026	42	37	7	86
Apr 2026	83	41	6	130
May 2026	47	22	5	74
Jun 2026	10	2	1	13
Jul 2026	1	2	0	3

Cumulative views and downloads (calculated since 05 Aug 2025)

Month	HTML	PDF	XML	Total
Aug 2025	288	36	8	332
Sep 2025	816	10	8	834
Oct 2025	66	31	4	101
Nov 2025	45	27	3	75
Dec 2025	43	29	11	83
Jan 2026	34	22	5	61
Feb 2026	24	36	7	67
Mar 2026	42	37	7	86
Apr 2026	83	41	6	130
May 2026	47	22	5	74
Jun 2026	10	2	1	13
Jul 2026	1	2	0	3

Viewed (geographical distribution)

Total article views: 1,876 (including HTML, PDF, and XML) Thereof 1,876 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 14 Jul 2026

Short summary

This research focuses on Mexico's soil water erosion (SWE) using rainfall data to estimate erosivity. A database of daily rainfall series was developed for three climate normals –CNs– (1968–1997, 1978–2007, 1988–2017) with over 5,000 series. We found median erosivity values of 3245, 3070, and 3327 MJ mm ha^-1 h^-1 yr^-1 for the three CNs. The resulting publicly available datasets of rainfall series and erosivity help better understand SWE and rainfall patterns across Mexico.


Total:	0
HTML:	0
PDF:	0
XML:	0