the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Water quality dataset in China
Jingyu Lin
Peng Wang
Jinzhu Wang
Youping Zhou
Xudong Zhou
Hao Zhang
Yanpeng Cai
Zhifeng Yang
Abstract. Water data is a crucial asset for sustainable water resource management. However, the availability of China’s water datasets lags far behind modern expectations for open geoscientific data. This dataset is a part of the China Water Data Archive (CWDA), an upcoming national collection of water-related data covering all aspects of water data for boosting data sharing in China. The CWDA aims at providing free, clean, non-sensitive, coherent, and reliable water data within China for global researchers to support the national and global water resources management and the United Nations-Water Integrated Monitoring Initiative for Sustainable Development Goals 6 and 14. In this paper, we used Python and R language to collect, tidy, reorganize, and curate the publicly available inland and coastal/ocean surface water quality data in China, following a series of data quality dimensions (integrity, completeness, consistency, and accuracy). As the most comprehensive, publicly available, handy, and clean water quality dataset in China so far, it included water quality data for daily, weekly, and monthly in the period of 1980–2022, with 17 indicators for over 330,000 observations at 2384 sites from inland to coastal/ocean areas. This dataset will greatly support works relevant to the assessment, modelling, and projection of water quality, ocean biomass, and biodiversity in China.
- Preprint
(833 KB) - Metadata XML
- BibTeX
- EndNote
Jingyu Lin et al.
Status: final response (author comments only)
-
CC1: 'Comment on essd-2023-151', Rui Li, 19 May 2023
This manuscript is a great work to water environment research because it shares with many valuable water quality dataset across China. However, the shared dataset seems to be not in good agreement with the information in the manuscript. We found the authors did not share the full dataset. I suggest the authors resubmitting the full dataset in the revised version.
Citation: https://doi.org/10.5194/essd-2023-151-CC1 -
CC2: 'Reply on CC1', Jingyu Lin, 20 May 2023
Thanks for the comments. The full dataset should be available for downloading at the private link https://figshare.com/s/4f4af7fa7b8457467ea7. Otherwise, would you please let us know the missing data information so that we can supplement the dataset at the next version ?
Citation: https://doi.org/10.5194/essd-2023-151-CC2
-
CC2: 'Reply on CC1', Jingyu Lin, 20 May 2023
-
CC3: 'Comment on essd-2023-151', Evelyn Uuemaa, 16 Jun 2023
The paper aims to improve availability of water quality data in China by adding weekly and 3-montly averages to a global water quality dataset GRQA (Virro et al., 2021). The weekly and 3-monthly averages were extracted from pdf-s that raises additional data quality issues. The authors needed to geocode the points semi-automatically and validate against stream and watershed datasets. Water quality data is quite scarce and therefore any attempt to improve the availability of the data is highly welcome.
The paper is in general well written, clearly structured and illustrated with tables and figures sufficiently. My main concern is that the main part of the dataset are the weekly and 3-montly water quality indicators that do not have any information about how many samples are in the averages, nor do they have any basic statistics (range, variance etc) about the original data based on which the averages have been obtained. Without having this information, the value and use of the data is severely limited. Also, how adequate is the average for water quality data? Water quality usually does not exhibit normal distribution and therefore average might be quite biased. This should be addressed in the paper. Moreover, I believe that the paper is more appropriate to publish in a local/regional journals or data repositories rather in the Earth Systems Science Data because it only covers data for China.
The compilation of the dataset is partly not sufficiently described, and it is not possible to fully understand based on which criteria the authors decided to include/exclude some measurements or recode. Please see my additional comments on this in the attached file.
The data must be properly deposited in an open data repository with a DOI and relevant metadata. Currently, the DOI indicated in the paper is not working.
-
RC1: 'Comment on essd-2023-151', Anonymous Referee #1, 21 Jul 2023
General Comment
Lin et al. derived a new dataset of surface water quality in China from three sources. Due to the limited water quality data of China in current global dataset, this dataset presented in this study represents a significant contribution to the water quality community. However, I found the current version of manuscript reads more like a technical report that documents how the dataset was derived. The authors should implement more analysis with the new dataset to demonstrate its reliability and usability. I am not asking the authors to implement novel analysis or come up with new insights on water quality based on the dataset. But I think it will be very helpful for the authors to implement more common analysis (e.g., seasonality, trending, etc.). Based on this reason, I would like to recommend a major revision before publication. Please also see additional comments in the following.
Major Comment
Should clarify the number of sites daily, weekly, and monthly observations accordingly. The authors mentioned the observation is available for the period of 1980-2022. But I believe the temporal coverage can be very different among the sites, thus another useful metric is length of data.
Minor Comments
Line 34: Need to introduce SDG before using the acronym.
Line 55-57: This statement is confusing. What do you mean by “different metadata information”?
Line 96: I suggest:” Data presented in this paper…”.
Line 90-97: In my understanding, CWDA is already a public data archive and the authors added new water quality data to this archive. If so, please focus on describing more for the water quality data that is presented in this study.
Line 182: What does “messed into” mean? Mixed?
Line 185: Should clarify the meaning of “未检出” and “河南信阳徐桥”. And is the later the only station removed?
Line 188: Do you mean the dataset is provided with two versions?
Line 216: The statement about the outliers is ambiguous. I don’t get if the authors were trying to argue the data is less impact by the outliers or not. In addition, more explanations and quantification of the outliers’ number will be very helpful.
Figure 4: I think it is better to use different color to represent the sites from different sources.
Citation: https://doi.org/10.5194/essd-2023-151-RC1 -
RC2: 'Comment on essd-2023-151', Anonymous Referee #2, 10 Aug 2023
This paper reconstructed the historical water quality data in inland, coastal and ocean areas of China. This dataset would be useful for further water quality related research in China. However, this paper does not appily the dataset to any researches and the reliability of the dataset does not be proved. Overall, this manuscript is clearly organized, but I think this manuscript should be reconsidered after major revision.
Specific comments
Line 39-40: “Amongst the water quality data” what “is a key aspect used...”, or you want to say “water quality data is a key aspect…”
Table 2: “spatial resolution” to “Spatial resolution”
Citation: https://doi.org/10.5194/essd-2023-151-RC2 -
RC3: 'Comment on essd-2023-151', Anonymous Referee #3, 14 Aug 2023
The study introduces a water quality dataset for China by reorganizing and consolidating data from various sources, including the Global River Water Quality Archive (Virro et al. 2021), China National Environmental Monitoring Centre, and National Marine Environmental Monitoring Center. The dataset holds significant potential interest for the community; however, the manuscript's overall quality is low. I recommend that the authors undertake a comprehensive revision of the manuscript before proceeding to its resubmission.
The Introduction section would benefit from a thorough rewrite, while the Data & Methodology section should be augmented with additional details. Moreover, the Results section should encompass independent validation and dataset intercomparison. It is recommended to include a foundational analysis of basic consistency or continuity, thus substantiating the reliability of the processing undertaken. Lastly, meticulous attention to English grammar should be given during the manuscript's revision process.
Specifically,
1. Introduction Section: The current presentation of the introduction begins with a discussion of water data, yet it lacks a central focus on water quality. Notably absent are clear definitions of water quality indicators with their potential significance. To enhance this section, I propose a restructuring along the following lines:
a. Establish a fundamental academic context surrounding water quality, incorporating key indicators that are widely recognized.
b. Emphasize the critical importance of maintaining high water quality standards across various domains.
c. Address the existing landscape of water quality datasets and their application examples, highlighting the shortcomings.
d. Convey the distinctive innovations and contributions that this study brings to the field.This will lend greater clarity and engagement to the introduction, better aligning it with the study's objectives and significance.
2. Data Section: Given that the raw data was collected rather than generated by this study, please provide additional details and context for the original datasets, such as sensors, quality maintenance methods, etc.
3. Methodology: It's imperative to elaborate on the data cleaning process. Explain the methods employed to remove abnormal values and ensure data consistency from different data sources.
4. Results: Introduce the selected water quality indicators and consider including a summary of these indicators along with a temporal coverage variation figure sourced by following Virro et al. 2021. Any analysis in this previous paper can be followed as this paper is closely related to it.
5. Dataset Assessment: Present comprehensive assessments of the dataset, including its spatial and temporal consistency. Address questions regarding spatiotemporal overlap between data sources and the congruence of processed outputs from different sources.
6. Language and Grammar: Carefully edit and proofread the manuscript for English grammar and language usage.
Technical Issues
Line 34: 'SDG' should be clarified.
Line 37: “China aims at maintaining water resources while improving resources management. To achieve the United Nation’s SDGs and President Xi’s version of Chinese Dream, it is important to compile water data from inland to coastal/ocean areas” -> China is committed to the preservation of water resources while simultaneously advancing resource management methodologies. To effectively accomplish the United Nations' Sustainable Development Goals (SDGs) and align with China's comprehensive policy plan, it is crucial to systematically compile water-related data across both inland and coastal/oceanic domains.
Line 39: “Amongst the water quality data is a key aspect used to identify the pollutions in the Source-to-Sea (S2S) aquatic continuum for sustaining water resources and sanitation services” ->
Within the context of the Source-to-Sea (S2S) aquatic continuum, water quality data emerges as a pivotal factor in discerning pollution levels. This information plays a critical role in the preservation of water resources and the provision of sanitation services.
Line 42: what does ‘accelerated dataset’ mean here?
Line 45: The inclusion of Chinese water quality data within the comprehensive global dataset is notably limited, and there is a notable absence of data originating from coastal and oceanic regions.
Line 54: Besides -> Moreover
I won’t continue editing the sentence but I strongly the authors utilize professional English editing to revise the manuscript.
Line 60: "if..." then what?
Line 65: this paragraph introduces several papers that were withdrawn without proving the corresponding reference or links. The writing here is more like telling stories rather than an academic paper review. The authors should pay attention to the data and review the previous datasets, applications, and drawbacks, and finally focus on stating the contributions of this work.
Line 185: those characters are not explained in English.
Line 202: This reference is missing from the reference list, suggest double-checking the whole manuscript to prevent it from such issues again.
Figure 4: all points at the coastal are clustered, suggest including regional maps to show the points clearly; and then mark the physical locations of all regional maps on a national map that can be drawn smaller than the current version. The sites from different data sources should be marked with different colors.
Abstract: The doi is not working, and the proposed dataset link and data reference should be provided in the abstract, please double-check the policy of ESSD.
Conclusion and reference list: conclusion is too general and referred papers are limited, which makes the manuscript quality even lower.Reference
Virro, Holger, et al. "GRQA: global river water quality archive." Earth System Science Data 13.12 (2021): 5483-5507.
Citation: https://doi.org/10.5194/essd-2023-151-RC3 -
RC4: 'Comment on essd-2023-151', Anonymous Referee #4, 24 Aug 2023
Water quality data is important for modeling biogeochemical cycles in aquatic ecosystems, assessing drivers of the interannual change of water quality and making policy on catchment management and utilization. Nonetheless, publicly available water quanlity data in China is still very limited. To address this issue, Lin et al. provides a clean, editable, and sharable national water quality dataset across inland and coastal/oceanic regions in China by compiling three previous datasets from the public and government. It included water quality data for daily, weekly, and monthly in the period of 1980-2022, with 330,000 observations for 17 indicators at 2384 sites.
The paper is well organized and the methods for producing this dataset is described clearly. In particular, this dataset is urgently required by researchers in environmental science, climate change, biogeochemical cycle …. I recommend to accept this manuscript after a minor revision.
Please see my specific comments below:
L27-28: I suggest to change the original text to “it included daily, weekly, and monthly water quality data in the period of 1980-2022, with over 330,000 observations for 17 indicators at 2384 sites from inland to coastal/ocean areas.”
L29: change the ‘works’ to ‘studies’
L34: Give an explanation on “SDG” (full name)
L42: Recognition of importance of aquatic systems to ** has accelerated the arising of local and national water datasets, for example, datasets for United States *.
L53: covering China or covering whole China
L63: delete “there are”
L64-65: these datasets are not publicly available **
L109: spanning over period 1898-2020, or spanning from 1898 to 2020.
L165: which converted ***, we validated **
L176: ** a single table and then imported into ArcGIS **
Fig. 1a & 2a are confusing. What does the black line means? Does it denote the cumulative percentage of the missing values? The bars denote the percentage of missing values or the number of missing values? What does the right y-axis means?
Fig. 3: Please provide a title with unit of the y-axis, and also the number e.g. a, b, c, … for each sub-plot.
L256-257: with 330,000 observations for 17 indicators at 2384 sites.
Citation: https://doi.org/10.5194/essd-2023-151-RC4 -
AC1: 'Comment on essd-2023-151', Jingyu Lin, 31 Oct 2023
We greatly appreciate the constructive comments raised by four reviewers, two community reviewers, editorial board, and for the editor's very helpful overarching guidance on the revision of our manuscript. We have made our best efforts to address all comments. We believe that these modifications will significantly increase the reliability of our paper. Please find the updated version of the paper tracked with changes via the attachment.
Jingyu Lin et al.
Jingyu Lin et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
970 | 394 | 72 | 1,436 | 14 | 25 |
- HTML: 970
- PDF: 394
- XML: 72
- Total: 1,436
- BibTeX: 14
- EndNote: 25
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1