the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A benchmark laboratory calibration dataset for tipping-bucket rain gauges: comparison of manual burette and automated methods
Abstract. Reliable calibration data are essential for ensuring the accuracy and traceability of precipitation measurements obtained from tipping-bucket rain gauges (TBRGs), which are widely used in hydrological and meteorological monitoring networks. Although manual burette-based calibration remains the most commonly applied approach, its reproducibility is often limited by operator dependency and changes in discharge conditions during experiments. Automated calibration devices have been developed to address these limitations, yet publicly available benchmark datasets that allow transparent comparison between manual and automated calibration methods remain scarce.
This paper presents a benchmark laboratory calibration dataset for tipping-bucket rain gauges generated under controlled conditions using two calibration approaches: a conventional manual burette method and an automated calibration device (PRC-20AP). Calibration experiments were conducted at five target rainfall intensities (10, 20, 30, 50, and 100 mm h⁻¹), with a target total rainfall of 20 mm and 15 repeated trials for each intensity. For every trial, the dataset reports elapsed time, measured total rainfall, measured rainfall intensity, and corresponding relative errors.
In addition to raw measurements, the dataset includes intensity-wise summary statistics and a comprehensive uncertainty evaluation following the Guide to the Expression of Uncertainty in Measurement (GUM). Type A, Type B, combined, and expanded uncertainties at 95 % coverage are provided to support quantitative assessment of measurement repeatability and reliability. All data are released in machine-readable spreadsheet formats with detailed documentation of variables, units, and calculation conventions to facilitate reuse.
The dataset is publicly available through a persistent DOI and is intended to serve as a reference benchmark for laboratory calibration of tipping-bucket rain gauges. Potential applications include calibration protocol validation, uncertainty budgeting, intercomparison of calibration methods, and the development and evaluation of automated calibration technologies for precipitation measurement.
- Preprint
(620 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 09 May 2026)
-
RC1: 'Comment on essd-2025-791', Anonymous Referee #1, 24 Mar 2026
reply
-
AC1: 'Reply on RC1', Bokjin Jang, 03 Apr 2026
reply
-
The method is focused on a benchmark making procedure, using a certain calibrating device (PRC-20AP), planned and installed by the author and his colleagues (https://doi.org/10.1016/j.flowmeasinst.2025.103063). The paper proposes this tool to be a widely accepted benchmark for other calibration techniques, so offering a quasy standard for the data quality, based on the development of Tippinng Bucket Gauge based rainfall intensity and rain depth measurement, the related WMO proposals etc.
Regarding the novel PRC-20AP device, the paper presents news, if this tool will be accepted and applied extendedly. Since there is a gap in the field of high accuracy calibration, the dataset of the calibration performance of the device can have potential in use, so on my opinion, the attached data can have potential as well.
The article is appropriate to support the publication of data set.
Response:
Thank you for your careful evaluation of our manuscript and dataset, and for your positive comments regarding their potential usefulness and suitability for data publication.
The primary objective of this study is not to introduce a new calibration device itself, but to provide a benchmark-oriented dataset and evaluation framework for assessing the calibration performance of tipping-bucket rain gauges under controlled rainfall intensity conditions. The PRC-20AP system serves as the experimental basis for generating the dataset.
As noted by the reviewer, the broader applicability of this approach depends on its future adoption. In this regard, we clarify that the dataset is not intended to serve as a formal standard, but rather as a benchmark reference to support comparison among different calibration methods. This clarification has been incorporated into the revised manuscript.
The dataset is constructed based on repeated experiments and uncertainty analysis, and is designed to support various applications, including comparative evaluation of calibration methods, validation of new calibration systems, and analysis of repeatability and uncertainty.
Thank you again for your valuable comments.
-
The dataset’s files were available at the given site, and opened correctly. The data set contains simple excel worksheets without cell interconnections, formulae. The worksheets seem to be adequate for the comparison of an actual measurement method’s statistics to the proposed benchmark values shown by the tables. Maybe a prepared worksheet with automatic calculation would be helpful for the users, referring to the potential applications, as it was mentioned in Chapter 6. In its actual form I find it a little bit poor.
There are error estimations and separation of sources of errors in the article. The error calculation follows the principles of the Guide to the Expression of Uncertainty in Measurement (GUM), what principle is the core of the several related national standards. The article takes into considerations of these standards.
The data set’s unique as a reference calibration-benchmark’s parameters, it is complete, clear, if the data set would be used as a comparison for other calibration processes or directly, in calibration, it can be useful.
Response:
Thank you for your detailed and constructive evaluation of the dataset. We appreciate your positive comments regarding its accessibility, completeness, and potential usefulness, as well as your recognition of the uncertainty analysis based on GUM principles.
We agree with the reviewer that the original dataset had limitations in terms of usability, particularly due to the absence of embedded calculations and automated analysis tools. In response, the dataset has been substantially revised to improve its usability and practical applicability, as summarized below:
(1) Introduction of an interactive analysis interface
An integrated Main Analysis worksheet has been introduced, allowing users to directly select rainfall intensity conditions and obtain key performance metrics, including total rainfall, rainfall intensity, relative error, repeatability, and uncertainty, as well as comparisons between manual and automated calibration methods.(2) Implementation of formula-based calculations
The summary worksheets (Summary_Manual and Summary_Automated) have been reconstructed using formula-based calculations. Mean values, standard deviation, and uncertainty components are now automatically derived from the raw data, ensuring consistency and full traceability.(3) Addition of structured comparison worksheets
Dedicated comparison worksheets (Total_Rainfall_Comparison and Rainfall_Intensity_Comparison) have been added to present side-by-side evaluation results and improvement metrics between calibration methods.(4) Integration into a unified dataset structure
Previously separated data files have been reorganized into a single integrated spreadsheet file, combining raw data, summary statistics, analysis tools, and documentation within a coherent structure.(5) Enhancement of usability and documentation
A comprehensive README worksheet and descriptive annotations have been added to each worksheet, providing clear explanations of data structure, variable definitions, calculation procedures, and usage instructions to support direct application of the dataset.These improvements have been reflected in the revised manuscript. In particular, the updated dataset structure and workflow are described in Section 2.4 (Data files and structure) and Section 2.5 (Variables and units).
In addition, Section 5 (Data availability and access) has been revised to clearly state that the improved dataset has been updated in the Mendeley Data repository. The updated dataset is provided as a single integrated spreadsheet file that enables users to directly access, analyze, and compare calibration results without additional preprocessing.
Regarding uncertainty estimation, the dataset consistently follows the Guide to the Expression of Uncertainty in Measurement (GUM), including the separation of Type A and Type B components and the calculation of combined and expanded uncertainty.
With these revisions, we believe that the dataset has been significantly improved in terms of usability, transparency, and practical applicability for calibration comparison and benchmarking purposes.
Thank you again for your valuable comments.
-
Investigating the text, I could not find traces of inconsistencies, implausible assertions or any other issues. The data set in this context is so simple and so short, that the test of the data was basically the repetition of the calculation. I found no issues in the table.
Response:
Thank you for your careful evaluation of the manuscript and dataset, and for confirming that no inconsistencies or issues were identified.
As noted by the reviewer, the dataset has a relatively simple structure. This simplicity is intentional, as the dataset is designed to ensure transparency and reproducibility of the calculation process. All statistical results are derived from repeated experimental data, allowing users to directly verify the results through straightforward calculations.
In addition, the dataset is structured to maintain clear traceability between raw data and derived statistics, enabling users to fully understand and validate the data processing workflow.
Thank you again for your valuable comments.
-
The data set is usable but I do not find it user-friend, because for the use of the table the cells should be re-formulated, and for this, the user must work a lot. A proposal should be mention, the rounding of the statistical parameter would be better for four decimal digits. On my opinion, if one decides to make a benchmark for the TBR devices, finding the 15 repetition and the selected rainfall intensities enough (it’s OK). I think that the completion of the excel files with an input worksheet, and some hidden calculation sheets, and an output one where the results are compared automatically to the benchmark values would be useful. The metadata could have been a little bit more informative.
The length of the article is appropriate, well structured, and clear enough for understanding. I found the language of the text consistent and precise, but I am not a native speaker, so I am not the best person to judge this question. The authors show the minimum of formulas, the variables are listed correctly, generally, the representation of quantities and their units is clear and correct. The table’s outfit, arrangement is clear too. Regarding the figures, for figures 1b and 2b, 3a, 3b, wider column would be more easily readable.
Finally: By reading the article and downloading the data set, would you be able to understand and (re-)use the data set in the future?
I think, I could.
Response:
Thank you for your detailed and constructive evaluation of the dataset, including its usability, metadata, and overall presentation quality. We also appreciate your positive comments regarding the structure, clarity, and readability of the manuscript.
In response to the reviewer’s comments, the following revisions have been made.
(1) Improvement of dataset usability
To address the usability limitations of the original dataset, an interactive Main Analysis worksheet has been introduced, together with formula-based summary worksheets and dedicated comparison worksheets. In addition, the dataset has been reorganized into a single integrated spreadsheet file, enabling direct use without additional preprocessing.(2) Adjustment of numerical precision
The numerical representation of statistical variables has been revised to avoid over-precision while maintaining practical relevance. Most statistical values (e.g., rainfall and rainfall intensity) are presented with two decimal places, while time-related variables are reported at higher precision (up to four decimal places) to maintain calculation accuracy.(3) Enhancement of metadata and documentation
The README worksheet has been expanded, and descriptive annotations have been added to each worksheet to clearly explain data structure, variable definitions, calculation procedures, and usage instructions from a user-oriented perspective.(4) Improvement of figure readability
The readability of Figs. 1, 2, and 3 has been improved by adjusting their layout and size for clearer visualization.These revisions have been reflected in both the dataset and the revised manuscript. In particular, improvements related to dataset structure and usability are described in Section 2.4 (Data files and structure) and Section 5 (Data availability and access), while variable definitions and numerical representation are clarified in Section 2.5 (Variables and units). In addition, the improvements in figure readability have been applied to Figs. 1–3.
Thank you again for your valuable comments.
- The revised manuscript and updated dataset will be provided in the revision stage.
Citation: https://doi.org/10.5194/essd-2025-791-AC1 -
-
AC1: 'Reply on RC1', Bokjin Jang, 03 Apr 2026
reply
-
AC2: 'Comment on essd-2025-791 (Suggested reviewers)', Bokjin Jang, 03 Apr 2026
reply
In response to the editor’s request, we would like to suggest the following potential reviewers for our manuscript:
1. Prof. Dongsu Kim (dongsu-kim@dankook.ac.kr)
Dankook University, Republic of Korea
Expertise: hydrology, hydraulic measurements, and uncertainty analysis2. Prof. Jooheon Lee (leejh@joongbu.ac.kr)
Joongbu University, Republic of Korea
Expertise: hydrology and rainfall-related studiesI confirm that there are no conflicts of interest with the suggested reviewers.
Thank you for your consideration.
Citation: https://doi.org/10.5194/essd-2025-791-AC2 -
RC2: 'Comment on essd-2025-791', Anonymous Referee #2, 05 Apr 2026
reply
General comment
The manuscript illustrates data derived from the laboratory calibration of a tipping-bucket rain gauge using manual devices (burettes) and an automatic calibration system developed by the author. The dataset also contains the quantification of uncertainty following the Guide to the Expression of Uncertainty in Measurement (GUM), including the Type A, Type B, combined, and expanded uncertainties at 95 % coverage level. The results are proposed as a benchmark laboratory calibration dataset for tipping-bucket rain gauges.
Laboratory calibration is a fundamental initial step to ensure the accuracy of precipitation measurements, regardless of the specific instrument or its measurement principle. Tipping-bucket rain gauges are known to underestimate rainfall as the rain rate increases, so correction of the systematic mechanical bias must always be implemented based on the results of laboratory calibration.
However, there are a number of concerns about the presented method and the usability of the resulting dataset. In my opinion, this means the manuscript cannot be accepted for publication.
Firstly, I disagree that using burettes for calibration is suitable, let alone a benchmark method, for calibrating rain gauges in a laboratory. I am aware that it is still used in some countries, but it is rather outdated and has been replaced since at least 20 years by different either manual or automatic methods. Note that the definition of manual vs. automatic only refers to the ability of the system to generate multiple reference flow rates, with multiple burettes being used to generate individual flow rates (equivalent to rainfall intensity values) while the automatic system would switch between different flow rates by just changing the pump settings.
The main issue is that using burettes does not ensure to provide a constant reference flow of water to the instrument during the test, which makes it impossible to quantify the error. Constant flow rates can be obtained either manually, based on the pressure-balancing principle of a Mariotte bottle, or automatically, by activating one or more precision pumps. Therefore, comparing calibration results obtained using burettes with an automatic calibration system has little relevance.
Second, the specific rain gauge used in the laboratory tests is not declared, but just the measurement resolution – or equivalent bucket size – is provided (0.1 mm). Since not only each rain gauge model but even each individual instrument of the same model has a different behaviour, the proposed dataset can hardly be used as a benchmark for any comparison. At least the specific model, manufacturer, collector’s area and bucket resolution should be given to allow reproducing the described tests and compare the results.
Third, a standard method already exists for laboratory calibration of catching type rain gauges and is published as the European Standard EN 17277:2019 “Hydrometry - Measurement requirements and classification of rainfall intensity measuring instruments”. No mention is made in the manuscript to this standard, and no reason is provided to avoid using the standard methodology in the laboratory calibration performed by the author, nor any criticism that would make the proposed method preferable over the standard one. The duration of the tests performed in this study and the use of a single value of the percentage relative bias per each rainfall intensity value tested are two main factors that make the adopted approach not compliant with the available standard.
Finally, the resulting error curve included in the dataset do not have the typical form expected for reproducing the systematic mechanical bias of commonly used tipping-bucket rain gauges. Since the rain gauge model is not specified this may be due to some specific feature of the rain gauge tested, but I tested many gauges in the laboratory and cannot figure out an instrument that may show such a curve after laboratory calibration. In particular, the fact that the error is about constant between 10 and 20 mm/h and then abruptly decreases cannot be explained by considering the dynamics of the tipping-bucket mechanism alone. Therefore, some bias in the calibration procedure may have affected the results, and serious concerns arise about the reliability of the proposed dataset.
Detailed comments
I’ll just mention a couple of aspects that would need to be considered in revising the manuscript. The text is not easy to follow being unnecessarily repetitive in some aspect and lacking essential information.
References are listed with many inaccuracies, some of them are non-existing, or referenced with inaccurate details, and cannot be easily retrieved by the reader. A few examples are given below, but the whole list of references must be carefully revised.
Line 455: The correct reference is: Sevruk, B. and Hamon, W. R.: International comparison of national precipitation gauges with a reference Pit gauge. Instruments and Observing Methods Report No. 17, WMO/TD-No. 38, 1984.
Line 458: The mentioned paper cannot be found at the indicated reference: Journal of Applied Meteorology, 40, 1687–1693, 2001. I could not find any paper with the same title and from the indicated authors.
Line 465: the reference is inaccurately reported
… and so on.
Citation: https://doi.org/10.5194/essd-2025-791-RC2 -
AC3: 'Reply on RC2', Bokjin Jang, 06 Apr 2026
reply
[Response to General Comment]
We sincerely thank the reviewer for the thorough evaluation of our manuscript and for providing constructive and insightful comments.
We understand that the main concerns raised by the reviewer can be summarized as follows:
(1) the structural limitations and suitability of the manual burette-based calibration method, (2) insufficient information on the rain gauge and the use of the term “benchmark dataset”,
(3) the lack of reference to and compliance with the European Standard EN 17277:2019,
(4) concerns regarding the reliability of the results, particularly the atypical error patterns with increasing rainfall intensity, and (5) issues related to manuscript clarity and reference accuracy.We address each of these points below and have revised the manuscript accordingly.
(1) Limitations and suitability of the burette-based calibration method
As pointed out by the reviewer, the manual burette-based calibration method has inherent structural limitations, particularly in maintaining a constant reference flow during experiments. We fully agree that this limitation may affect rainfall intensity reproducibility and repeatability.
Specifically, due to the decreasing hydraulic head during the experiment, the discharge rate gradually decreases over time, making it difficult to provide a constant reference flow throughout the test. This can influence the quantitative interpretation of calibration errors.
We also acknowledge that more advanced methods, such as constant-head systems (e.g., Mariotte bottle) or pump-controlled automated systems, provide more stable flow conditions and are increasingly adopted in recent studies.
However, the inclusion of the burette-based method in this study is not intended to propose it as a standard or recommended calibration technique. Rather, it is included as a representative conventional manual approach still used in some regions, in order to enable a direct comparison with automated calibration under identical experimental conditions.
The objective of this study is not to identify the best calibration method, but to examine how differences in flow control conditions (non-constant vs. controlled constant flow) influence rainfall measurement outcomes and their interpretation.
In this context, the non-ideal flow characteristics of the burette-based method are not merely a limitation, but a key experimental condition that reveals method-dependent differences in calibration results.
We also acknowledge that the use of the term “benchmark dataset” in this context may have led to misunderstanding. Therefore, this term has been revised to “reference dataset” to clarify that it does not represent a standard calibration method.
In addition, the limitations and applicability of the burette-based method have been explicitly addressed in the revised manuscript (Section 7: Limitations).
(2) Rain gauge specification and benchmark terminology
We agree with the reviewer that the specifications of the tipping-bucket rain gauge used in this study were not sufficiently described in the original manuscript.
Even for instruments with identical nominal resolution, individual gauges may exhibit different measurement characteristics. Therefore, providing detailed instrument information is essential for reproducibility and interpretation.
Accordingly, the revised manuscript now includes the manufacturer, model name, collector area, and bucket resolution of the rain gauge used in the experiments.
We also agree that describing the dataset as a “benchmark” may be misleading, given that the results are dependent on a specific instrument and experimental setup.
Therefore, the term has been revised to “reference dataset”, emphasizing that the dataset is intended for comparative analysis of calibration methods rather than as a universal standard.
In addition, we have clarified throughout the manuscript that the purpose of this study is not to evaluate the absolute performance of a specific instrument, but to analyze method-dependent differences in calibration outcomes.
(3) Consideration of EN 17277:2019
We agree with the reviewer that standardized laboratory calibration procedures for catching-type rain gauges have been established, such as the European Standard EN 17277:2019, which provides requirements for performance evaluation of rainfall intensity measuring instruments.
However, the objective of this study is not to perform calibration in compliance with standard certification procedures, but to investigate differences in calibration outcomes between manual and automated methods under identical and controlled experimental conditions.
To ensure a consistent and direct comparison, experimental conditions such as rainfall intensity levels, target total rainfall, and the number of repeated tests were fixed across both calibration methods.
As noted by the reviewer, certain aspects of the experimental design (e.g., test duration and evaluation metrics) are not fully compliant with EN 17277:2019. Therefore, the results of this study should not be interpreted as standard-based performance evaluations.
This point has been clarified in the revised manuscript, and the role of EN 17277:2019 has been explicitly acknowledged in the Introduction.
(4) Reliability of results and error pattern
We understand the reviewer’s concern that the observed error curves differ from the typical behavior expected for tipping-bucket rain gauges, particularly the apparent reduction in error at higher rainfall intensities.
Our analysis indicates that this behavior is primarily associated with the flow characteristics of the calibration process rather than with the mechanical behavior of the rain gauge itself.
In the burette-based method, the discharge rate decreases progressively due to the reduction in hydraulic head, resulting in effective rainfall intensities that are lower than the preset values.
This effect becomes more pronounced at higher nominal rainfall intensities, leading to a systematic bias in rainfall intensity estimation. At the same time, this bias may produce an apparent decrease in relative error for total rainfall.
Although this mechanism was partially described in the original manuscript, we acknowledge that it may not have been sufficiently clear, potentially leading to misinterpretation as abnormal instrument behavior.
Therefore, we have revised the Results and Discussion section to more clearly explain the underlying mechanism and to distinguish between instrument behavior and calibration-induced effects.
(5) Minor comments (manuscript clarity and references)
We appreciate the reviewer’s comments regarding redundancy and clarity in the manuscript. The text has been revised to reduce unnecessary repetition and improve overall readability.
We also acknowledge the issues identified in the reference list. The entire set of references has been carefully reviewed and corrected to ensure accuracy and consistency.
Citation: https://doi.org/10.5194/essd-2025-791-AC3
-
AC3: 'Reply on RC2', Bokjin Jang, 06 Apr 2026
reply
Data sets
Laboratory calibration dataset for tipping-bucket rain gauges: manual burette vs automated device Bokjin Jang https://doi.org/10.17632/czzzth6z26.3
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 179 | 64 | 39 | 282 | 21 | 31 |
- HTML: 179
- PDF: 64
- XML: 39
- Total: 282
- BibTeX: 21
- EndNote: 31
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
The method is focused on a benchmark making procedure, using a certain calibrating device (PRC-20AP), planned and installed by the author and his colleagues (https://doi.org/10.1016/j.flowmeasinst.2025.103063). The paper proposes this tool to be a widely accepted benchmark for other calibration techniques, so offering a quasy standard for the data quality, based on the development of Tippinng Bucket Gauge based rainfall intensity and rain depth measurement, the related WMO proposals etc.
Regarding the novel PRC-20AP device, the paper presents news, if this tool will be accepted and applied extendedly. Since there is a gap in the field of high accuracy calibration, the dataset of the calibration performance of the device can have potential in use, so on my opinion, the attached data can have potential as well.
The article is appropriate to support the publication of data set.
The dataset’s files were available at the given site, and opened correctly. The data set contains simple excel worksheets without cell interconnections, formulae. The worksheets seem to be adequate for the comparison of an actual measurement method’s statistics to the proposed benchmark values shown by the tables. Maybe a prepared worksheet with automatic calculation would be helpful for the users, referring to the potential applications, as it was mentioned in Chapter 6. In its actual form I find it a little bit poor.
There are error estimations and separation of sources of errors in the article. The error calculation follows the principles of the Guide to the Expression of Uncertainty in Measurement (GUM), what principle is the core of the several related national standards. The article takes into considerations of these standards.
The data set’s unique as a reference calibration-benchmark’s parameters, it is complete, clear, if the data set would be used as a comparison for other calibration processes or directly, in calibration, it can be useful.
Investigating the text, I could not find traces of inconsistencies, implausible assertions or any other issues. The data set in this context is so simple and so short, that the test of the data was basically the repetition of the calculation. I found no issues in the table.
The data set is usable but I do not find it user-friend, because for the use of the table the cells should be re-formulated, and for this, the user must work a lot. A proposal should be mention, the rounding of the statistical parameter would be better for four decimal digits. On my opinion, if one decides to make a benchmark for the TBR devices, finding the 15 repetition and the selected rainfall intensities enough (it’s OK). I think that the completion of the excel files with an input worksheet, and some hidden calculation sheets, and an output one where the results are compared automatically to the benchmark values would be useful. The metadata could have been a little bit more informative.
The length of the article is appropriate, well structured, and clear enough for understanding. I found the language of the text consistent and precise, but I am not a native speaker, so I am not the best person to judge this question. The authors show the minimum of formulas, the variables are listed correctly, generally, the representation of quantities and their units is clear and correct. The table’s outfit, arrangement is clear too. Regarding the figures, for figures 1b and 2b, 3a, 3b, wider column would be more easily readable.
Finally: By reading the article and downloading the data set, would you be able to understand and (re-)use the data set in the future?
I think, I could.