the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A harmonized 2000–2024 dataset of daily river ice concentration and annual phenology for major Arctic rivers
Abstract. River ice plays a critical role in Arctic freshwater routing, navigation safety, and biogeochemical exchange. However, consistent, daily-resolved observations across the pan-Arctic remain scarce. Here we present a harmonized, multi-decadal dataset of daily river ice concentration (RIC) and annual phenology (freeze-up, breakup, and ice duration) for the six largest Arctic rivers—Yukon, Mackenzie, Ob, Yenisey, Lena, and Kolyma—covering hydrological years 2001–2024. Built from >590,000 MODIS Terra/Aqua scenes, our workflow integrates a scalable threshold-based classifier on Google Earth Engine with dual-satellite daily synthesis, temporal-window cloud reclassification, and a high-latitude dark-period correction. Technical validation against higher-resolution optical imagery shows a mean RIC accuracy of 0.83 across basins. Phenological metrics derived from MODIS agree with in situ records with mean absolute errors (MAE) of 10.8 days for freeze-up and 11.4 days for breakup (improving to 8.4 days relative to the onset of ice drift), and with Landsat-based river-section phenology with MAE of 10.5 days (freeze-up) and 16.0 days (breakup). RIC correlates strongly with surface air temperature (mean Pearson r = −0.91) and increases systematically with latitude. Trend analysis from 2001 through 2024 shows delayed freeze-up in over 66 % of river segments, earlier breakup in more than 65 %, and shorter ice seasons in over 65 %. On average, freeze-up is delayed by 9.0 days, breakup occurs 7.8 days earlier, and ice duration shortens by 14.1 days over the study period. These basin-consistent, temporally resolved records provide an open benchmark for diagnosing cryospheric change in Arctic river corridors and for constraining model–data intercomparisons. The river-ice dataset is available via Zenodo (https://doi.org/10.5281/zenodo.17054619, Qiu et al., 2025).
- Preprint
(4629 KB) - Metadata XML
-
Supplement
(5564 KB) - BibTeX
- EndNote
Status: final response (author comments only)
- CC1: 'Comment on essd-2025-607', Laurent de Rham, 05 Dec 2025
-
RC1: 'Comment on essd-2025-607', Anonymous Referee #1, 07 Jan 2026
The authors filtered the ‘Simplified GRWL Vector Product’ by the attribute 'width_max' ≥ 250m, retaining only those river segments whose maximum mapped width (at mean discharge) exceeds 250m. This implies that, within each retained segment, portions of the river may still have actual widths less than 250m. Subsequently, the authors applied a 250m buffer on both sides of these segments, resulting in a fixed 500m vector polygon mask. For river reaches where the actual width exceeds 500m during ice period, this approach poses no significant issues. However, for narrower reaches (actual width < 500 m), the fixed 500m mask introduces considerable uncertainty. The buffered polygon not only encompasses the main channel but also extends into floodplain beaches and adjacent non-channel areas on both banks. In pre-freeze-up and post-breakup periods, snow accumulation is common on these non-channel areas. Since the Normalized Difference Snow Index (NDSI) cannot reliably distinguish between snow and ice, such snow-covered regions are likely misclassified as river ice. To address this limitation, the authors should quantitatively assess: (1) The proportion of analyzed river segments where the actual river width is less than 500 m; (2) The potential impact of this fixed-width masking on the accuracy of the final river ice concentration (RIC) estimates, particularly the risk of overestimation due to inclusion of snow-covered non-channel areas.
Citation: https://doi.org/10.5194/essd-2025-607-RC1 -
RC2: 'Comment on essd-2025-607', Anonymous Referee #1, 07 Jan 2026
Zhang et al. (2024) applied a uniform RIC threshold of 0.2 to identify ice-on and ice-off dates along the Yenisey River. Building on this approach, the authors introduce asymmetric thresholds to improve detection sensitivity and physical realism: the Break-Up Date (BUD) is defined as the date when RIC falls by ≥60% from the climatological winter peak (computed over December 1 to February 1), while the Freeze-Up Date (FUD) is set as the first date when RIC exceeds 30% above the ice-free baseline. However, the selection of these specific RIC percentages appears somewhat arbitrary. To strengthen the threshold justification, the authors are recommended to select hydrological stations with ground-based observations of freeze-up and break-up dates across different basins. A sensitivity analysis should then be conducted, comparing remotely sensed BUD and FUD derived from various RIC thresholds against these in-situ records. This would enable a more robust, data-driven determination of optimal RIC thresholds tailored to physical processes and regional variability.
Citation: https://doi.org/10.5194/essd-2025-607-RC2 -
RC3: 'Comment on essd-2025-607', Anonymous Referee #1, 07 Jan 2026
The authors validated MODIS-based RIC estimates using medium-resolution (30 m) Landsat 7/8/9 and high-resolution (10 m) Sentinel-2 imagery. However, it remains unclear whether the fixed 500m buffer mask used for MODIS was also applied to these higher-resolution datasets. It is recommended that the authors compute RIC from Landsat and/or Sentinel-2 imagery using two distinct masking approaches for various river segments: (1) the fixed 500m mask, and (2) a more accurate mask delineating the actual ice-season river channel extent. This comparison would clarify the potential impact of inadvertently including snow-covered floodplain and non-channel areas on RIC estimates.
Citation: https://doi.org/10.5194/essd-2025-607-RC3 -
RC4: 'Comment on essd-2025-607', Anonymous Referee #2, 19 Feb 2026
Review Summary
This manuscript describes a new dataset calculating river ice concentration and ice phenology across major Arctic rivers using 500 m resolution MODIS data. The paper is clearly written, and the dataset is clearly structured and includes relevant information needed for others to use the data. I think the paper could benefit from having a stronger justification about why this particular method is better than other existing methods (and what sorts of analyses you could do with this dataset, that you couldn’t do with existing Landsat-based methods). I also have some concerns, described below, about how the water masks were developed, particularly in sub-MODIS pixel width rivers, as well as with the use of single reflectance or NDSI thresholds when there are so many mixed land/water pixels.
Major comments
I have three major comments regarding the paper.
- My understanding of how water masks were generated is that first, centerlines associated with rivers wider than 250 m were extracted from GRWL. The centerlines were then buffered by 250 m (to generate 500 m-wide river polygons), and then NDSI was evaluated within this mask for all rivers, regardless of original river width. There are several potential challenges that I see from this approach. In the case of river reaches whose ‘average’ width is close to 250 m, there are likely sections that are narrower than 250 m. For those reaches, and reaches closer in width to 500 m, the 500 m spatial resolution MODIS pixels observing the buffered area (used for the water mask) are likely to observe a lot of land, rather than water. It is thus possible that the change in NDSI being observed is snowmelt on land rather than ice breakup. While one could argue that snowmelt and ice breakup should be correlated, this is not always the case. For example, in braided rivers, sometimes snow melts off the land and sandbars prior to ice breakup – or sometimes one small channel breaks up but the rest of the braids are still frozen and appear white (or didn’t have water in them during freeze-up). Point being, breakup detection is complicated in narrow rivers, particularly if any of them are braided! I would be curious to know what fraction of the studied river reaches are in the range of 250-500 m wide, and how overall accuracy (RIC, breakup date, or freeze-up date) in these reaches compares to overall accuracy in the wider reaches. On the other side of things, there are rivers whose width is much wider than 500m, and those areas are thus potentially being excluded by the 500 m buffer water mask being used. If there are large islands in these wider rivers, the water mask could be picking up land rather than water. The work may be improved by a more detailed water mask.
- As discussed above, depending on river width, mixed pixels (i.e. a pixel that observes both water and land areas) may be more common or less common. Thus, depending on how sensitive NDSI is to vegetation on the landscape, for example, it is possible that use of a single NDSI threshold to differentiate between water/land and snow/ice may be challenging or less accurate. I think additional validation of narrow reaches, as previously mentioned, could help determine if a more complicated thresholding approach is needed. Alternatively, if there is research showing that NDSI is not particularly sensitive to differences in changes in vegetation (brown vs green), that could also help support the current approach.
- The validation approach (Landsat 7/8/9 and Sentinel-2) relies on an NDSI threshold of 0.4 and a blue-band threshold of 0.075. All these satellites have slightly different band centers and can have slight variations instrument to instrument – thus single thresholds may not be appropriate. While NDSI is an index and is thus likely less sensitive to differences between sensors, the blue band could be more sensitive in this regard. The Harmonized Landsat and Sendinel-2 (HLS - link) dataset already somewhat solves the multi-sensor problem and could be a good dataset to switch to. Plus, HLS has data every three days at the equator and is more frequent at higher latitudes. Alternatively, their methods for dataset harmonization could be used to adjust the datasets the authors are already using. In the section beginning on line 252, they mention Landsat 7 has the highest consistency with MODIS. I am curious if the thresholds mentioned above were originally developed for Landsat 7, or one of the other Landsat sensors.
Minor comments
Overall:
- Throughout the manuscript average values are typically reported without an accompanying quantification of data spread around that mean (e.g., standard deviation or interquartile range). Adding a measure of spread would help contextualize variability in the data.
- Overall, since many of the figures have acronyms, it could be useful to make sure each acronym is spelled out in each figure caption. It is not required, but I think doing this can make the figures a little more standalone and thus easier for readers to digest.
Figures:
- Figure 1. Clear figure overall! The colors used for the Yukon and Kolyma rivers are a little challenging to see on the map.
- Figure 2. This flow chart was really helpful!
- Figure 3. The boxplots are very clear, though it could be helpful to remind the reader what accuracy metric is being used for the y axis title. (i.e., ‘Overall Accuracy’, rather than just ‘Accuracy’). In the scatterplots, given the color scheme and the number of points, it is a little challenging to tease out actual trends vs. differences in dot density. Also, just looking at the scatterplots, Sentinel-2 almost always has a high RIC, regardless of MODIS’s RIC, yet the boxplots look pretty good for Sentinel-2. I imagine this is related to how the scatterplots were generated, but it is something to think about. The legends are also a little small to read.
- Figure 4. Nice color scheme and clear labels. Definitely not required, but if the authors wished to show patterns in breakup and freeze-up a little more clearly, they could divide each panel into two sub-panels. That way the y axis could zoom in to the correct time of year for each sub-panel (breakup vs. freeze-up).
- Figure 5. The time series panels are very clear. To me, the scatterplot relationships look less linear, and more like a logistic regression (see Figure 5 in Yang et al. 2020). Would this benefit from assessing whether a different statistical relationship could be a better fit?
- Figures 6 – 9. I reviewed a printed color copy of the paper, and it was difficult to differentiate between continents and oceans. Perhaps this looks ok on the computer, but I do wonder if the contrast between continents and oceans could be increased on these figures.
- Figure 10. Same comment as Figure 4.
- Figure 11. The ice duration plots are very clear. On the line plots, it could be helpful to add a measure of spread in freeze-up/breakup date across all 24 years, around the mean freeze-up and breakup dates. Like in Figure 1, the colors for the Yukon and Kolyma rivers may be challenging to read.
Introduction
- Line 72. The text mentions that existing datasets are currently deficient across the pan-Arctic region, particularly lacking those that have ‘high temporal and adequate spatial resolution.’ The introduction could be strengthened with a bit more discussion of why existing datasets do not meet the existing need. I.e., what sorts of questions can you answer with this new MODIS dataset that you could not answer with the coarser temporal resolution Landsat datasets or some of the temperature-based breakup/freeze-up models?
- Line 75. Towards the end of the introduction, river ice concentration is mentioned for the first time. However, ice concentration can mean different things to different people – spatial coverage fraction, concentration of ice in a water sample (e.g., in mg/L). It could be useful to add one short sentence describing what is meant by ‘river ice concentration’.
Data sources
- Line 120. ‘Despite this extensive dataset, an average of 4 days per year during the 24 years lacked usable data’ – what level of cloud cover on a given day means that data from that day are not usable? I imagine that for any given pixel, more than 4 days would have missing or cloudy data, particularly in the Fall, when it is cloudier (at least in AK, where I am more familiar). A brief clarification would be helpful.
Methodology
- Line 164. The state_1km band used to filter out clouds is a bit-wise band with several bits related to cloud presence/absence (cloud state, cirrus information, cloud shadow information, etc.). For reproducibility, it would be good to include which bits were used to filter out cloud-impacted pixels.
Validation
- Section 4.2. Validation can be a challenge when other satellite or in situ methods define breakup and freeze-up in different ways and on many different spatial scales (point, pixel, river reach) and I applaud the author’s effort in compiling this validation data. I think, here, the impact (and novelty) of the paper could potentially be made clearer if it were able to quantify how much better MODIS is at detecting freeze-up and breakup relative to other methods (like the existing Landsat data or simple temperature-based models) when compared against in situ records. Given that the in situ records themselves are not perfectly comparable to the satellite data, I am open to pushback on this suggestion by the authors.
Discussion
- Line 311. There is a space missing after ‘S10).’
Data availability
- Links to data and code all work. Data is well described, including units.
Citation: https://doi.org/10.5194/essd-2025-607-RC4
Data sets
A harmonized 2000–2024 dataset of daily river ice concentration and annual phenology for major Arctic rivers Jiahui Qiu et al. https://doi.org/10.5281/zenodo.17054619
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 352 | 188 | 28 | 568 | 90 | 28 | 44 |
- HTML: 352
- PDF: 188
- XML: 28
- Total: 568
- Supplement: 90
- BibTeX: 28
- EndNote: 44
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
Thank-you for this global scale work on river ice the transparency of methods and sharing data with the community. As reported in the abstract, the mean absolute error (MAE) values for the three metrics (freeze-up, breakup, duration) are larger numbers than the trend values. Some discussion of the results is warranted in-so-far as the robustness of reported trends within the modelling framework error. A colleague with a climate background refers to this as "signal versus noise" issue.