the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
CAMELS-COL: A Large-Sample Hydrometeorological Dataset for Colombia
Abstract. Catchment Attributes and Meteorology for Large-Sample Studies (CAMELS-COL) is a large-sample hydrological dataset for Colombia that integrates daily meteorological and hydrological time series with comprehensive catchment attributes. The dataset comprises daily precipitation, evapotranspiration, and temperature from CHIRPS and MSWX satellite sources and streamflow records from 347 gauging stations covering the range 1981–2022. Additionally, CAMELS-COL provides a wide range of catchment attributes, including physiographic characteristics, climatic indices, hydrological signatures, land cover, geology, and soil properties, derived primarily from official governmental sources. CAMELS-COL follows the standardized framework of previous CAMELS datasets, such as those developed for Brazil and Chile, to ensure consistency with global hydrological datasets. By incorporating Colombian catchments across diverse hydroclimatic regions, including the Andean, Amazonian, and Caribbean basins, this dataset extends the CAMELS initiative into tropical environments, offering a unique resource for hydrological research. The analysed basins show low flood susceptibility. An analysis of the aridity index reveals that 74.7 % of basins have a subhumid climate, 20.7 % are semiarid, and only 4.6 % are classified as humid. Flow elasticity to precipitation is highest in the Amazon and Orinoco, highlighting their greater streamflow sensitivity to rainfall changes. The Base Flow Index underscores groundwater’s crucial role in stabilizing and regulating surface water, particularly in forested basins. The dataset supports studies on hydrological processes, extreme hydroclimatic events, climate change impacts, and water resource management. Public availability encourages scientific collaboration and facilitates the inclusion of Colombian catchments in continental and global hydrological analyses. The dataset is accessible at https://doi.org/10.5281/zenodo.15554735 (Jimenez et al., 2025).
- Preprint
(2474 KB) - Metadata XML
-
Supplement
(138144 KB) - BibTeX
- EndNote
Status: open (until 08 Aug 2025)
-
CC1: 'Comment on essd-2025-200', Ather Abbas, 28 Jun 2025
reply
Thank you for presenting this new dataset to the scientific community. I have a small comment/suggestion regarding the computational efficiency of using the data. The static catchment features are provided in .xlsx format, which requires installing an additional package (openpyxl) when working in Python environment. I recommend offering this data in .csv format instead, as it does not require any extra dependencies and is also more computationally efficient to read. Additionally, .csv files are compatible with Microsoft Excel, offering equivalent readability and ease of use for end users.
Citation: https://doi.org/10.5194/essd-2025-200-CC1 -
RC1: 'Comment on essd-2025-200', Anonymous Referee #1, 20 Jul 2025
reply
Jimenez et al. describe a new large-scale dataset of catchment attributes for Colombia, analyzing 347 catchments across different topographic and climatological gradients following the CAMELS initiative. CAMELS-COL is a very valuable dataset for both national and global hydrological studies, enhancing data availability across the tropics, where such datasets remain limited. However, the methods would benefit from further details to improve the clarity of the manuscript, as described below:
Major comments:
Lines 129–133: I wonder if any quality control of the streamflow data was considered. There should be a careful explanation about this in the methods. If any quality control on the data was not made, please state why, explaining the limitations associated with the use of the resulting dataset.
Line 137: Did you correct the gauge locations? This is a common issue identified in previous CAMELS studies. If so, please explain the process used to reduce this uncertainty/error. Also, what was the resolution and source of the stream network used to delineate the drainage areas?
Line 139: “Basins with inconsistent delineations were excluded, particularly in cases where inconsistencies were attributed to limitations of the Digital Elevation Model (DEM) used.”
How was the delineation process evaluated? Please include supplementary figures to show the results.
Did you compare the watershed delineations with IDEAM’s official hydrological zonification or global datasets such as hydroSHEDS (https://www.hydrosheds.org/products)?
Please check the watershed delineation for streamflow gauge 13077030. I see some inconsistencies in the shapefile.
Please check overlapping inconsistencies between watershed limits in .shp files. For example, areas associated with streamflow gauges: e.g., 32157060 vs 32207010, 13017010 vs 13037010 or 29037020 vs 13067020
Line 141-144: Authors state that “Additionally, in line with the methodology used in CAMELS-DE, only basins containing at least one central pixel with available gridded weather data are considered. This decision ensures consistency with previous CAMELS-datasets, which included basins where gridded or satellite-based climate data could be reliably assigned within the basin boundaries.” How many gauges were excluded based on this criterion? Also, would it be possible to extract climatic data from IDEAM gauges for those excluded basins?
Soil texture descriptors are important metrics for hydrological studies and are typically included in CAMELS datasets. Therefore, I suggest including sand, clay, and silt fractions, as well as soil density, using datasets such as SoilGrids (Poggio et al., 2021).
Although the Curve Number (CN) is a typical descriptor of precipitation partitioning between surface runoff and infiltration, its interpretation at an aggregated scale—particularly in large basins—may be unclear. I encourage the authors to provide further details about how CN contributes to the CAMELS project and discuss its limitations. Additionally, it would be helpful to show the locations and characteristics of dams in Colombia, which significantly influence hydrological processes.Minor comments:
Line 13: Please add the range of areas for the 347 basins
Line 17-18: Consider including the characteristics of climatic (MAP ranging between X and X) and topography (e.g., 0 to X m.a.s.l.) gradients to provide a more detailed context for CAMELS-COL. This is particularly important due to the lack of hydrometeorological datasets for wetter tropical regions (e.g., Fig 1 in Kratzert et al., 2023)
Line 18: I suggest using the term “hydrological areas” instead of “basins”, following the IDEAM classification. Please briefly describe how these “hydrological areas” were defined.
Line 19: ‘low flood susceptibility” is a too general statement. Please specify the criteria used for this statement.
Lines 49-56: I recommend including a figure showing the relationship between two hydrometeorological features (e.g., Q/P vs P/PET) of Colombia in comparison to the rest of the world (CAMELS studies). This would help to illustrate the hydrological characteristics of the country and the relevance of CAMELS-COL dataset.
Line 88-89: Please add references of those datasets.
Lines 116-117 and Figures 1b-1d: I encourage the authors to use global datasets to describe the climatology of Colombia. Although IDEAM is the official national dataset, the temperature and precipitation data used in CAMELS-COL are derived from remote-sensing products. Additionally, I am curious about why the warmest areas are in the lowlands of the Magdalena and Orinoco regions rather than in the northern Caribbean region that is the driest area of the country– Please verify this.
Line 124: Please clarify which MapBiomas classification level was used, and including its reference
Table 1. Based on a review of previous CAMELS studies, I suggest adding the following attributes to enhance the CAMELS-COL dataset:
Gauge_record_start, gauge_record_end, and gauge_n_obs (as used in CAMELS-CL)
Percentage of missing values in streamflow observations.
Catchments mean slope (e.g., CAMELS-BR)
Elev_med, elev_max, elev_min (CAMELS-CL)
Water table depth from (from Fan et al., 2013)
Please provide gauge_lat and gauge_long in WGS84 system
Please provide final datasets in .txt or .csv format. Additionally, I encourage authors to merge in one single dataset the .xlsx files (catchment information, geologic_characteristics, land_cover, soil, among others). This would make the work of future CAMELS-COL users easier.
Please include the streamflow network and locations of streamflow gauges in the catchment boundaries data (.shp files).
Figure 1.
Please use color-blind friendly palette
There seems to be an inconsistency: the figure shows 348 selected gauges, but only 347 basins are reported. Please clarify.
Consider adding the streamflow network and watersheds delineation.
Consider adding a histogram with the number of basins by area, elevation, mean annual precipitation. See CAMELS-BR and CAMELS-CL as a reference.
Line 129: Did the analysis include any transboundary basins? Please clarify.
Line 136. I am curious if there are changes in watersheds delineation using a DEM with a 90 m spatial resolution.
Line 145: Please remove the extra period “.”
Line 158. Why was MSWX selected to obtain temperature data?
Line 161: Why did you choose to calculate evapotranspiration (ET) based on Tmax and Tmin rather than using a remote-sensing-derived ET product? Consider using different methods.
Line 167: Why did you compute Tmean as the average of Tmax and Tmin when MSWX provides data every 3 hours?
Lines 177-178: This sentence is unclear. Please revise for clarity.
Line 182 and 188: What is “GSC”? GSC or GCS?
Lines 228–231: Why was the aridity index selected to describe the climatology of Colombia? I also wonder why it was computed as PET/MAP instead of the more typical formulation MAP/PET. This index can be extracted directly from Zomer et al. (2022). Also, please correct the typo in "Ponce."
Line 251: Please include a reference for the Whitebox tool. What was the software used to run this tool? R, Python, GRASS?
Table 3: Please extend the table description. I guess “S” means slope.
Lines 269-278: What are the limitations of concentration times for large basins?
Line 260 and 317-324: While compactness indices are commonly used to indicate watershed’s response, categorizing basins as having “low,” “medium,” or “high” flood risk requires integrating additional factors such as soil properties, land cover, and precipitation patterns, and socio-economic features. I encourage the authors to clarify the basis for this classification or reconsider the wording to avoid oversimplification.
Line 317-340: I suggest presenting your results in the same order as the data and methods sections.
Figures 3 - 7
Please add streamflow network
Please use color-blind friendly palette
In the legend, all circles appear to be the same size—please correct or clarify this if circle size is intended to represent a variable.
Please consider removing the background DEM to enhance the clarity of the figures.
Lines 354-355: Natural non-forest formations such as wetlands or paramo vegetation play a large role in hydrological dynamics. Please check this statement “Additionally, 86% of the catchments have less than 20% of their area associated with natural non-forest formations, indicating that these landscapes have a minimal role in the general hydrological dynamics.”
Line 449. Previously you described that the highest temperatures are observed in the Magdalena region. Please check.
Line 513-516: How did you compute the Base Flow Index (BFI)? Clarify the method used for obtaining those values. For example, if a Recursive Digital Filter was implemented, please state the empirical equation and the parameter’s values used. This statement “In Colombia, 76% of river basins have a Base Flow Index (BFI) between 0.6 and 0.8, suggesting that groundwater resources play a crucial role in the stability, availability, and regulation of surface water sources” requires further analysis, as temporal dynamics (daily or seasonal variations) are not examined. Additionally, what does the correlation between forest cover and BFI suggest in hydrological terms? I guess that the Amazon forest in Colombia is within regions with shallow water table depth.
Lines 546-448: These statements require more extensive analysis than those conducted in this study.
Line 556 – Section “Water Resources and Climate Change in Colombia”: While the topic is important, this section feels somewhat disconnected from the rest of the manuscript. Consider strengthening the link between climate change and the purpose or future application of CAMELS-COL.
I recommend adding a section discussing study limitations and potential future developments or updates for CAMELS-COL.
Line 604: The conclusion section is currently too long and reiterates several points already covered in the results. Please re-structure and highlight the main results and implications of the manuscript.References
Fan, Y., Li, H., Miguez-Macho, G., 2013. Global Patterns of Groundwater Table Depth. Science 339, 940–943. https://doi.org/10.1126/science.1229881
Kratzert, F., Nearing, G., Addor, N., Erickson, T., Gauch, M., Gilon, O., Gudmundsson, L., Hassidim, A., Klotz, D., Nevo, S., Shalev, G., Matias, Y., 2023. Caravan - A global community dataset for large-sample hydrology. Sci Data 10, 61. https://doi.org/10.1038/s41597-023-01975-w
Poggio, L., De Sousa, L.M., Batjes, N.H., Heuvelink, G.B.M., Kempen, B., Ribeiro, E., Rossiter, D., 2021. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty. SOIL 7, 217–240. https://doi.org/10.5194/soil-7-217-2021
Zomer, R.J., Xu, J., Trabucco, A., 2022. Version 3 of the Global Aridity Index and Potential Evapotranspiration Database. Sci Data 9, 409. https://doi.org/10.1038/s41597-022-01493-1Citation: https://doi.org/10.5194/essd-2025-200-RC1
Data sets
CAMELS-COL: hydrometeorological time series and catchment attributes for 347 catchments in Colombia D. A. Jimenez et al. https://doi.org/10.5281/zenodo.15554735
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
375 | 109 | 26 | 510 | 22 | 15 | 16 |
- HTML: 375
- PDF: 109
- XML: 26
- Total: 510
- Supplement: 22
- BibTeX: 15
- EndNote: 16
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1