the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
OpenRainER: an open-source dataset for studying the opportunistic sensing of rainfall in Emilia-Romagna, Italy
Abstract. We present the OpenRainER dataset of precipitation measurements, available open-source on Zenodo repository at https://doi.org/10.5281/zenodo.10593848. The dataset contains mainly precipitation related measurements over the region of Emilia-Romagna in northern Italy. Inside OpenRainER, measurement from the commercial microwave link network managed by Lepida S.c.p.A. are published, consisting in 1 min time series of the transmitted signal level and of the received signal level over 151 radio links. These data are primarily generated for link quality monitoring; however, they can be opportunistically exploited for weather monitoring as there is a well known direct relationship between rainfall intensity and decrease of the received signal level. The data are stored in NetCDF format and can be processed and converted into rainfall intensity time series along each radio link path using open-source tools developed within the COST Action OpenSense framework. We also provide concurrent data from the regional operational rain gauge network and two weather radars, as a reference for calibration and validation purposes. OpenRainER has peculiar characteristics with respect to similar open datasets: (1.) The links are distributed over very different types of terrain, including plains, hills, valleys, and mountain ridges (up to 2000 m a.s.l., and both densely inhabited cities and rural areas. (2.) While conventional CML networks show a frequency distribution inversely proportional to the link path length, here links operate at 24.6 or 25.6 GHz. This offers a wider range of sensitivities for testing classification and retrieval algorithms.
- Preprint
(2239 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2026-160', Anonymous Referee #1, 05 Jun 2026
-
RC2: 'Comment on essd-2026-160', Anonymous Referee #2, 09 Jun 2026
The paper describes a dataset of microwave link data for rainfall estimation completed by an extensive reference dataset. Two-year dataset of 151 microwave links sampled at 1-min resolution represent a valuable contribution to the reproductive research of opportunistic sensing. The paper is well structured and clearly written. It contains relevant descriptions of available data as well as rainfall activity during the covered period. All the figures are well designed with clear legends and titles. I recommend the paper for publication in ESSD. My suggestions for improvements are only minor and rather technical.
L37-46 The whole history of collaboration is not directly related to the paper and too lengthy.
L54-56 different lengths with the same frequencies are useful not only for testing dry-wet strategies but also e.g. for wet antenna correction algorithms.
Section 2.2.: Please provide at least an approximate quantization of TSL and RSL as calculated in Appendix A already here. To my understanding of Appendix A, the quantization of RSL is around 1 dB, however, I am not sure what it is the quantization of TSL.
L121: consider rephrasing such as it is clear you do not mean at least one sublink in the whole dataset but at least on sublink of a CML.
Table 3 - For completeness consider also providing statistics for the rainfall variable in this table. A reader going through the paper just briefly might miss that weather stations are actually also rain gauges.
Figure 7: For better comparison consider using the same range of y-axis in panel (b) and (c)
Figure 9 title: According to guidelines the figure title should be brief and does not need (should not) to describe what it is apparent from the legend and titles of axes.
Section 3.3: PLease provide information on selection of parameters in the CML processing chain. Have you calibrated them, or used some values from literature?
L215: provide reference to pycomlink
Appendix A:
Appendix A is substantially more difficult to read than the main sections of the paper and it is unclear what exactly is being investigated (“this behavior”). Stating clearly the objective of this analysis at the beginning would help. Do you try to reconstruct the quantization of original raw data (you do not have access to)? Or do you try to quantify the quantization error of 1 min RSL data from this dataset? Why is TSL data quantization not considered?
L256-L256,I believe ”0+k“ should be both lower index of RSL. This applies for the equation A1 and also the following description on L256. Also provide units.
Equation A3: I am convinced that the units are wrong in this equation. As RSL is not unitless the equation has to result in adding different units no matter what units of N are.
Citation: https://doi.org/10.5194/essd-2026-160-RC2 -
RC3: 'Comment on essd-2026-160', Anonymous Referee #3, 09 Jun 2026
In this manuscript, the authors present OpenRainER, an open dataset of continuous precipitation measurements across the Emilia-Romagna region from 2021 to 2022. The dataset integrates 1-minute raw transmitted and received signal levels from 151 commercial microwave links, alongside concurrent observations from 319 rain gauges and three weather radar products.
Overall, the manuscript describes a dataset that is highly relevant and worth publishing. Because raw CML data is typically restricted by network operators, the authors' effort to publish this collection openly on Zenodo is a valuable effort. It provides the research community with a practical, much-needed resource to independently test and validate rainfall estimation algorithms.
Still, the manuscript’s current presentation requires significant improvement to fully realize the dataset's value and ensure its practical accessibility. Specifically, the authors must refine their data visualizations, clarify the mathematical notation in the appendix detailing the proprietary pre-processing steps, and address several compliance issues regarding ESSD formatting and visual accessibility guidelines prior to publication.
- Dataset and Usability:
The dataset successfully fulfills its primary objective of providing a highly useful, novel, and unique resource benchmarking precipitation retrievals within the Earth system sciences.
It contains, in addition to the main focus (opportunistic sensors with CMLs), highly valuable reference data, and it seems to be reproducible and already integrated into the OpenSense repository tools. The time span (2 years), the resolution (1-min), and the number of CMLs provide a comprehensive benchmark, which is strengthened even further by the radar and gauge data, providing a complete benchmark for the opportunistic sensing community.- Comprehensive Concurrent Data: The dataset provides an excellent, highly complete "trifecta" of concurrent measurements. It pairs 1-minute data from 151 CMLs with measurements from 319 operational rain gauges and three different radar products.
- High Data Availability: The dataset is rich and robust for a two-year continuous period. 139 of the 151 CMLs and 225 of the 319 rain gauges boast a data availability of 95% or higher
- The published data complies with the format of the specified OpenSense community standards, and already have been employed for applications.
Issues
- The authors state two main specific differences from other datasets, noting, (1) “The links are distributed over very different types of terrain,” which seems to be correct and is agreed upon as a valuable contribution. This is specifically even mentioned in the abstract. However, if there is value here, it is expected to be referred to more in the paper (besides a general representation of the metadata).
For the CML network in Section 2 (Figures 1 and 2), the authors are only showing an aggregation/summary of results over all links. Although it is a data paper and does not require deep analysis, providing an example of the statistics and specific events can help the reader justify using this dataset and appreciate its uniqueness.- The claim (2)“While conventional CML networks show a frequency distribution inversely proportional to the link path length, here links operate at 24.6 or 25.6 GHz. This offers a wider range of sensitivities for testing classification and retrieval algorithms.” I am not sure this is worth mentioning as main uniqueness with respect to the other published datasets and wireless networks in general. Usually, operators work in specific frequency ranges and from specific source CML’s data usually comes with few bands. Also, from a quick look at the other datasets, the frequency bands do contain the same scale of CMLs, with also diverse lengths (OpenMRG, OpenMesh), such that I would not mark it as a core value of the presented dataset, but rather as a feature.
If there are specific characteristics that are novel and can be derived with the data's specific setup/topology regarding the frequency vs. length, it should be emphasized in the analysis part.Still, I think the variety of the dataset as mentioned, in addition to the new region and the terrain (which should be exemplified), are enough to emphasize the value of the data.
-The code reference for GitHub is only for the main library (https://github.com/OpenSenseAction), which is not specific to the OpenRainER dataset. It may contain a few examples, but I would suggest providing more focused notebooks to demonstrate the specific published data instead of the entire library. Specifically, such examples are already available at https://github.com/OpenSenseAction/opensense_example_data/tree/main/OpenRainER, which can be a solid base for data visualization, downloading, and usage.
2. Data PresentationThe data presentation and visualization visualizations do a great job showing the dataset's physical setup and data availability. The CML metadata histograms (Figure 2) clearly split plain vs. mountain conditions across path length, altitude, inclination, and roughness. Temporal coverage is also handled well, histograms for CMLs and rain gauges (Figures 3 & 6) and a monthly bar chart for weather radars (Figure 5) together give users a clear, honest picture of the network's reliability over the two years.
Issues:
- CML metadata presentation: The NetCDF variables are never listed anywhere in the paper. A simple table showing exactly what's included would save users from having to download the data just to find out. This is especially important for a data journal like ESSD.
- CML data visualization (Figure 2): The histograms show statistics about mountain links, but it can be more extended to describve the data. Figure 2 is too small to read comfortably and needs to be enlarged (see Reviewer 1 comments). Also, authors can show what those links look like physicaly, or by visual what (even I would suggest on adding a 2D elevation profile for one extreme mountain link would make the steep inclination). For Figure 3, the logarithmic y-axis is misleading, with only 151 CML links total, a linear scale would show the availability distribution much more honestly.
- Radar Modifications: The manuscript only says the radar processing chain had "minor modifications" over two years, please spell out exactly what changed and how it might affect the data. A citation to the Arpae processing chain or a validation study would also help (See Reviewer 1).
- Terminology & Rain Gauges: "Rain Gauges" and "Weather Stations" refer to the same 319 sites, please pick one term and use it consistently. Add the 15-minute precipitation availability stats to Table 3 so readers don't miss the dataset's core variable (See Reviewers 1 & 2). Also, it's unclear why 50% was chosen as the completeness threshold, is this a standard value in the field? Please justify or cite it. Finally, Figures 7 and 9 need some visual improvement (See Reviewer 2).
Analysis:
The authors established a regional rainfall baseline using the reference RGs and weather radars, then evaluated the CMLs through a qualitative storm case study and a preliminary quantitative comparison against the adjusted radar data. This comparison demonstrated the CML data's potential to track rainfall, but also revealed significant underestimations and false alarms when the raw data is processed without complex, site-specific calibration.
Overall, for a dataset paper, the analysis should not be too long, and the authors' approach of providing one high-level aggregated result and a specific CML case study is a good start. However, the current analysis does not add much new information compared to existing works. To truly demonstrate the value and uniqueness of this dataset, I strongly suggest adding an analysis that leverages the authors' claimed contributions, i.e., stratifying the performance by dataset features like complex terrain and varying link lengths operating at the same frequencies. Furthermore, since the dataset is already two years old and the `pycomlink` package is already being utilized, these tools can be directly applied without too much effort to demonstrate usability for real-world products. For example, generating 2D CML-derived rainfall maps to compare against the radar rainfall maps would be a standard addition that significantly raises the value of the data.
3. Additional Comments/Written Quality
The manuscript is generally readable but would benefit from a careful revision pass — there are a moderate number of typos, grammar issues, and repeated words throughout. Previous reviews already flag several of them, and I strongly recommend a thorough proofread before the next submission.
Just from going through the abstract alone:
CML is never formally defined, on first mention write "commercial microwave link (CML) network."
L2: "on Zenodo repository" → "on the Zenodo repository" L3: "precipitation related" → "precipitation-related" L4: "measurement from" → "measurements from" L4: "consisting in 1 min time series" → "consisting of 1-min time series" L5: "radio 5 links" → "radio links" — stray line number L7: "well known" → "well-known" L7: "decrease of the received" → "decrease in the received" L8: "peculiar" → " sounds like something is strange or odd. Better to use "distinctive" or "unique characteristics. Point (2) here + ‘,’.
It is of course not the reviewer's responsibility to catch every issue, but even from a quick read there are noticeable word repetitions and some phrasing that could simply be cleaner. The other reviewers point out additional mistakes as well.
L106: "frequency are" → "frequencies are" L178: "between between" → "between" L196: "event occurred" → "event that occurred" L208: "explains" → "explain" L219: "Results shows" → "Results show" LXX: "Days are accounted if" → "Days are accounted for if"
As mentioned, I recommendthe authors do a full end-to-end proofread, ideally with a native English speaker or grammar tool, before resubmission.
Citation: https://doi.org/10.5194/essd-2026-160-RC3 -
EC1: 'Comment on essd-2026-160', Tobias Gerken, 17 Jun 2026
I would like to thank the reviewers for their work and helpful and comprehensive comments aimed at making this a useful contribution to the community.
Based on the reviews, I am inviting the authors to respond and revise their work. Please take note of the numerous comments that are aimed at increasing the accessibility and description of the dataset and underlying data.Citation: https://doi.org/10.5194/essd-2026-160-EC1
Data sets
OpenRainER Elia Covi and Giacomo Roversi https://doi.org/10.5281/zenodo.10593848
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 227 | 42 | 17 | 286 | 18 | 13 |
- HTML: 227
- PDF: 42
- XML: 17
- Total: 286
- BibTeX: 18
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The authors present the OpenRainER data set. The data set consists of commercial microwave link (CML), weather radar, and rain gauge data with the main goal to improve the usability of the CML data for rainfall monitoring.
The data set offers a significant contribution to the community involved in rainfall monitoring with CMLs. As an open data set in a field where the raw data is often non-disclosed, this CML data set can be of high value in benchmark studies. It is uniquely different from the other three open data sets mentioned in the paper, and the addition of weather radar and rain gauge data complete the data set.
The data itself is of sufficiently high quality, though the limitations, especially of the CML data, should be elaborated on to improve the usability. The manuscript with the description and analyses, in my view, currently does not sufficiently support the data set, other than having it published on Zenodo. I have outlined my major concerns regarding these points below.
Preprocessing CML data
The TSL and RSL time series are processed in-house by the data owner. Even though the details are not disclosed, it would be valuable to be as precise as possible. Is it just the sampling frequency and quantization that are unknown? Because that is the only thing the appendix focusses on. If so, mention this concretely in the main text (L120-126). If the authors suspect other forms of preprocessing were done it would be helpful to know why the authors think so, and if they could share some of these processing steps they suspect have been done based on their experience with this data set.
The appendix on the in-house processing is an important addition and I appreciate the authors have looked into this, however, in my view it is not sufficiently well written and difficult to understand at this point. Unfortunately the example in Line 259 does not make it much clearer to me. Maybe it is clear to the authors, but I suggest having a colleague (not involved in this exact field of reserach) to read through it and iterate together to make it more understandable.
Some examples of what is unclear to me in the appendix:
L251 -257 : How I understand it is within 1 minute there are N samples being recorded, and each of these samples has a quantization somewhere between RSL0 and RSLK . So K is simply the number of possible quantization values? Then shouldn’t RSL0 + k in equation 1 actually be RSL0+k (same in L257)? If K is the quantization value itself, then I would expect equation A1 to be the sum of Nk(RSLk) and not Nk(RSL0 + k). Otherwise the range of RSL0 to RSLK mentioned in L254 does not make sense.
L259: Doesn’t Nk already vary between 0 and K, instead of between 0 and N?
EqA.3: follow the same format as in Eq. A1 and write it out completely. E.g. if K=1 I would expect Eq. A3 to be 1/N * (N0 * RSL0 + N1*(RSL0+1)) based on the format in Eq. A1
Having someone outside of your direct field of research read through this appendix will hopefully improve the readability and make it more accessible to those outside the direct CML community too. Finally I would suggest making figures A1 and possibly A2 a lot larger (w)ider) as now it is very difficult to deduce anything from these time series or follow along when the authors refer to these figures. For example, the bulk 1 dB quantization step in L251 is not deducible from Fig. A1 based on the current vertical axis.
Additional analyses
To gain more trust in this opportunistic data source some additional analyses would be recommendable. Section 3.3. on rainfall comparison is rather limited at this moment. The authors state that the data set has some peculiarities compared to other open CML data sets, such as the presence of orography, and the use of nearly identical CML operating frequencies. However, nothing is done with these features at this moment.
It would be fairly straight-forward to split the results in Figure 11 by orography / elevation, for example. The same for link length. Since the frequency is predominantly the same, how does the accuracy of the CML rainfall estimates vary with link length? This would quickly give a user some insights into the quality of the data that makes this data set unique.
Similarly, in L162 you state gauges are “the most accurate ground-truth reference against which CML-based rainfall retrievals can be compared in our case”. However no such analysis is currently performed. To gain confidence in the (reference) data a short comparison between CMLs and the nearest gauge, similar to what is done for radar would be useful. Also a comparison between gauges themselves, a simple corrollelogram or double mass curve could already yield some insights into the quality of the reference data itself and potentially spot outliers.
Added value of split between convective and stratiform – Section 3.1
The split between stratiform and convective is interesting to get an idea of the local climatology, but is not really used in any analysis at the moment. So what is the exact added value to this description of the dataset? A quick addition could be to split the results from Fig 11 in time and show the performance of CMLs vs. weather radar in stratiform and convective conditions.
In general I am not suggesting to do many different analyses, as that is not the goal of such a data paper, but simply to split the analyses currently done by some of the characteristics mentioned in the paper to get an idea of how the CML data performs in different conditions.
Specific comments on content:
L42: So you have 1 minute data but receive it every 15 minutes?
L47-56: I would switch the order of this paragraph, and put it before L37-46, so that after describing the other data sets you first describe what makes the OpenRainER data set unique.
L51: arguably “wet-antenna attenuation” is not the most frequency dependent effect, so I would give another example. A much larger effect could be the presumed uniform attenuation along the path.
L57: Nice to mention other publicly available datasets! Hyperlinks in the text don’t read nicely though. A small table would fit this better. You can then add some extra columns like whether these data a freely available which could benefit the community too.
L85: I would make “Study Area” Section 2, and “Datasets” a separate section 3.
L126: To make the main text self-contained, explicitly mention the interesting characteristics you find in your Appendix here.
Fig 2: This and all other Figures, make them a lot Larger as they are difficult to read now. Moreover, stacked bar charts are generally not a good idea since the reader is never sure if one of the bars covering the other. Instead put the orange and blue bars next to each other. If you must really have a single bar per bin, use different widths so the height of both bars are always visible. Finally, it would be useful to quickly define the term roughness in the caption as well.
Fig3: To enhance the usability, this data would better fit on a map so it becomes clear which links have less data, which would especially be of importance if they are all in the same regions.
L142: Should it be “the one only covered by SPC”? Would be good to mention the mosaic codes in your README / dataset here.
L145: Would be good to add the resolution to the README on Zenodo too.
L147-154: If there is a publication describing the entire processing chain from the Arpae, or some comparison / validation studies on the radar data it would be valuable to mention that here for reference.
L158: What did these minor modifications do? Try to be as specific as possible to give the reader/user confidence in the data.
Fig6: Similar to Fig3, a map with the gauge locations colored by availability would be useful.
Table 3: For completeness adding precipitation to this table would nice
L174: Would be useful to know which stations or at least how many stations operate at temporal resolutions below 15 minutes. Possibly add this to table 3 or again, on a map.
L198: Explain shortly how the averaging of all radar pixels is done. Weighting the length of the CML in the radar pixel?
L201-203: This hypothesis can be fairly quickly checked by coarsening the CML resolution to 15 minutes too, and/or shifting the timestamp to the beginning/end of each interval.
L204: “which reflect the different spatial sampling performed by rain gauges, CML and radars”. Not necessarily true, or in any case not only true! It could be due to rainfall variability in time, differences in altitude, etc.
L210: “ however it captures the temporal dynamics of rainfall better than the rain gauge”. How so? The timing of the rain gauge seems to agree more with timing of the radar..
Fig8: Panel b and c show number of days, this a discrete variable, hence a discrete rather than a continuous colormap would be more useful.
L223-224: In my opinion this conclusion is too easily drawn. To give the user of this data set some more confidence in the data it would be very helpful to simply stratify these results, as mentioned above in the general comments, by mountains/plains, link length, stratiform/convective.
L241: Lepida S.c.p.A. (the company that owns the CML network). -> mobile network operate Lepida S.c.p.A.
L 249: by the CML network -> in a network management system
L250: What is basically constant? Can you mention a range, i.e. -0.1 to +0.1 dB?
Textual comments:
L3: measurement -> measurements
L4: in 1 min -> of 1 min
L10: I would use “unique” instead of “peculiar”
L10: datasets: -> datasets.
L16: hereinafter -> hereafter
L17: by the -> by a, or in a
L21: “well known” is subjective, remove
L65: OpenSense has been -> OpenSense, a COST Action project that ended in October 2025, brought together..
L138: Joss and Waldvogel (1988) -> (Joss and Waldvogel, 1988)
L182: lapse? Do you mean interval?
L213: to to -> to
L214: estimate -> estimates
Comments on the README on Zenodo: