the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A merged continental planetary boundary layer height dataset based on high-resolution radiosonde measurements, ERA5 reanalysis, and GLDAS
Jianping Guo
Jian Zhang
Tianmeng Chen
Kaixu Bai
Jia Shao
Yuping Sun
Ning Li
Jingyan Wu
Rui Li
Jian Li
Qiyun Guo
Jason B. Cohen
Panmao Zhai
Xiaofeng Xu
Fei Hu
Abstract. The planetary boundary layer (PBL) is the lowermost part of the troposphere that governs the exchange of momentum, mass and heat between surface and atmosphere. To date the radiosonde measurements have been extensively used to estimate PBLH; suffering from low spatial coverage and temporal resolution, the radiosonde data is incapable of providing the diurnal description of PBLH across the globe. To fill this data gap, this paper aims to produce a temporally continuous PBLH dataset during the course of a day over the global land by applying the machine learning algorithms to integrate high-resolution radiosonde measurements, ERA5 reanalysis, and GLDAS product. This dataset covers the period from 2011 to 2021 with a temporal resolution of 3-hour and a horizontal resolution of 0.25°×0.25°. The radiosonde dataset contained around 180 million profiles over 370 stations across the globe. The machine learning model was established by taking 18 parameters derived from ERA5 reanalysis and GLDAS as input variables while the PBLH biases between radiosonde observations and ERA5 reanalysis were used as the learning targets. The input variables were presumably representative regarding the land properties, near-surface meteorological conditions, terrain elevations, lower tropospheric stabilities, and solar cycles. Once a state-of-the-art model had been trained, the model was then used to predict the PBLH bias at other grids across the globe with parameters acquired or derived from ERA5 and GLDAS. Eventually, the merged PBLH can be taken as the sum of the predicted PBLH bias and the PBLH retrieved from ERA5 reanalysis. Overall, this merged high-resolution PBLH dataset was globally consistent with the PBLH retrieved from radiosonde observations both in magnitude and spatiotemporal variation, with a mean bias of as low as –0.9 m. The dataset and related codes are publicly available at https://doi.org/10.5281/zenodo.6498004 (Guo et al., 2022), which are of significance for a multitude of scientific research and applications, including air quality, convection initiation, climate and climate change, just to name a few.
- Preprint
(2761 KB) - Metadata XML
- BibTeX
- EndNote
Jianping Guo et al.
Status: closed
-
RC1: 'Comment on essd-2022-150', Anonymous Referee #1, 23 May 2022
General comments:
This study developed a state-of-the-science method to derive a global-wide PBLH dataset merging in situ observations and reanalysis dataset, which has optimized the performance of a so-called “data fusion” technology and provided critical data for climate research. There are no obvious flaws in the methodology, and the final output is informative enough to compensate for the disadvantages of current atmospheric datasets existing as the spatial-temporal discrepancy. Despite the good structure and comprehensive analysis, the authors are required to answer or address the following questions or comments. After that, I think this manuscript can be accepted for publication.
Specific comments:
- Line 123: It is suggested that the authors explain a little bit more of the relationship by a gradient of terrain or lower-tropospheric stability induced underestimation of the PBLH.
- The title of the paper is ‘’…ERA5 reanalysis, and GLDAS’’. However, GLDAS didn’t occur until the last paragraph. It is suggested that the authors can add some descriptions of GLDAS.
- Line 154: please clarify if the interpolation is based on altitude or elevation.
- Line 158: It seems to me not correct to say spatially even coverage. The coverage in Australia is substantially not even especially in Figure 1d.
- Line 173: Any reference for the definition of LST?
- Line 207: how did the authors match the stational PBLH and gridded PBLH in the comparison?
- Line 259: Please specify clearly if all the data from 2011-2021 were included in the model training stage. Were they divided by the measuring time (e.g., 0000, 0006…)?
- A simple question: What is the merit of ~100/200 m improvement of PBLH (compared with the raw method) considering the future application of this dataset? Any impacts on climate-scale studies?
Technical corrections:
- Line 99 and 116: the definition of ERA-5 should be moved ahead.
- Please keep it consistent by using either ERA5 or ERA-5 in the whole manuscript.
- Line 233: in the main text, the authors mentioned that Table 2 shows the correlation coefficients between PBLH and each variable, but the caption of Table 2 says that it is a correlation coefficient with PBLH bias between radiosonde and ERA5 reanalysis, which is easy to be misinterpreted. Please address.
- Line 242, please use subscripts or other notations to mark PBLH-M and PBLH-E in the equation. Otherwise, it will be easy to be recognized as a minus.
Citation: https://doi.org/10.5194/essd-2022-150-RC1 -
AC1: 'Reply on RC1', Jian Zhang, 02 Sep 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-150/essd-2022-150-AC1-supplement.pdf
-
RC2: 'Comment on essd-2022-150', Anonymous Referee #2, 08 Jul 2022
The authors have compared radiosonde-based PBL height estimates with PBL heights derived from the ERA5 reanalysis for over 300 available land stations, showing a significant bias in ERA5 PBL heights. A machine learning routine is developed to predict the ERA5 PBL height bias based on numerous input parameters, and this bias is subtracted from the ERA5 PBL height to produce a corrected dataset. This produces an immediately useful and relevant dataset that can be applied in many future studies. The work is novel, well-constructed, and succinctly explained in the paper. There are a few non-structural fixes that could improve the manuscript, but no major issues with the work, so I would only call these minor revisions.
Notes:
Line 203: can you add some detail on what you mean by the ‘second level’?
In equation 2, it appears that PBLH-M and PBLH-E are mis-formatted as PBLH – M and PBLH – E (where M and E are variables being subtracted) This is probably a formatting error, but is initially very confusing.
Line 260: I worry that randomly dividing the data can cause an issue if certain geographic regions are underrepresented in the training data. I would recommend dividing your stations into specific regions (for example: valleys, mountains, coastal, continental, tropical, polar…) and ensuring that a subset for each region is then randomly drawn for each training/validation pool. An easier solution may be just to show that the randomly selected data already chosen for training represents these differing types of regions using a map and/or histogram.
Figure 2: Dividing by calendar season for stations on both sides of the equator is not recommended, since you are lumping winter with summer, autumn with spring, etc… It would be better to combine similar seasons, so that southern hemisphere DJF is combined with northern hemisphere JJA, etc… This would better illustrate seasonal biases.
Figure 8: the panels are too small for a meaningful comparison between PBLH-R and PBLH-M (comparing the dots to the shading). I recommend making larger maps available as supporting material, or showing this comparison some other way
Citation: https://doi.org/10.5194/essd-2022-150-RC2 -
AC2: 'Reply on RC2', Jian Zhang, 02 Sep 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-150/essd-2022-150-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jian Zhang, 02 Sep 2022
Status: closed
-
RC1: 'Comment on essd-2022-150', Anonymous Referee #1, 23 May 2022
General comments:
This study developed a state-of-the-science method to derive a global-wide PBLH dataset merging in situ observations and reanalysis dataset, which has optimized the performance of a so-called “data fusion” technology and provided critical data for climate research. There are no obvious flaws in the methodology, and the final output is informative enough to compensate for the disadvantages of current atmospheric datasets existing as the spatial-temporal discrepancy. Despite the good structure and comprehensive analysis, the authors are required to answer or address the following questions or comments. After that, I think this manuscript can be accepted for publication.
Specific comments:
- Line 123: It is suggested that the authors explain a little bit more of the relationship by a gradient of terrain or lower-tropospheric stability induced underestimation of the PBLH.
- The title of the paper is ‘’…ERA5 reanalysis, and GLDAS’’. However, GLDAS didn’t occur until the last paragraph. It is suggested that the authors can add some descriptions of GLDAS.
- Line 154: please clarify if the interpolation is based on altitude or elevation.
- Line 158: It seems to me not correct to say spatially even coverage. The coverage in Australia is substantially not even especially in Figure 1d.
- Line 173: Any reference for the definition of LST?
- Line 207: how did the authors match the stational PBLH and gridded PBLH in the comparison?
- Line 259: Please specify clearly if all the data from 2011-2021 were included in the model training stage. Were they divided by the measuring time (e.g., 0000, 0006…)?
- A simple question: What is the merit of ~100/200 m improvement of PBLH (compared with the raw method) considering the future application of this dataset? Any impacts on climate-scale studies?
Technical corrections:
- Line 99 and 116: the definition of ERA-5 should be moved ahead.
- Please keep it consistent by using either ERA5 or ERA-5 in the whole manuscript.
- Line 233: in the main text, the authors mentioned that Table 2 shows the correlation coefficients between PBLH and each variable, but the caption of Table 2 says that it is a correlation coefficient with PBLH bias between radiosonde and ERA5 reanalysis, which is easy to be misinterpreted. Please address.
- Line 242, please use subscripts or other notations to mark PBLH-M and PBLH-E in the equation. Otherwise, it will be easy to be recognized as a minus.
Citation: https://doi.org/10.5194/essd-2022-150-RC1 -
AC1: 'Reply on RC1', Jian Zhang, 02 Sep 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-150/essd-2022-150-AC1-supplement.pdf
-
RC2: 'Comment on essd-2022-150', Anonymous Referee #2, 08 Jul 2022
The authors have compared radiosonde-based PBL height estimates with PBL heights derived from the ERA5 reanalysis for over 300 available land stations, showing a significant bias in ERA5 PBL heights. A machine learning routine is developed to predict the ERA5 PBL height bias based on numerous input parameters, and this bias is subtracted from the ERA5 PBL height to produce a corrected dataset. This produces an immediately useful and relevant dataset that can be applied in many future studies. The work is novel, well-constructed, and succinctly explained in the paper. There are a few non-structural fixes that could improve the manuscript, but no major issues with the work, so I would only call these minor revisions.
Notes:
Line 203: can you add some detail on what you mean by the ‘second level’?
In equation 2, it appears that PBLH-M and PBLH-E are mis-formatted as PBLH – M and PBLH – E (where M and E are variables being subtracted) This is probably a formatting error, but is initially very confusing.
Line 260: I worry that randomly dividing the data can cause an issue if certain geographic regions are underrepresented in the training data. I would recommend dividing your stations into specific regions (for example: valleys, mountains, coastal, continental, tropical, polar…) and ensuring that a subset for each region is then randomly drawn for each training/validation pool. An easier solution may be just to show that the randomly selected data already chosen for training represents these differing types of regions using a map and/or histogram.
Figure 2: Dividing by calendar season for stations on both sides of the equator is not recommended, since you are lumping winter with summer, autumn with spring, etc… It would be better to combine similar seasons, so that southern hemisphere DJF is combined with northern hemisphere JJA, etc… This would better illustrate seasonal biases.
Figure 8: the panels are too small for a meaningful comparison between PBLH-R and PBLH-M (comparing the dots to the shading). I recommend making larger maps available as supporting material, or showing this comparison some other way
Citation: https://doi.org/10.5194/essd-2022-150-RC2 -
AC2: 'Reply on RC2', Jian Zhang, 02 Sep 2022
The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2022-150/essd-2022-150-AC2-supplement.pdf
-
AC2: 'Reply on RC2', Jian Zhang, 02 Sep 2022
Jianping Guo et al.
Data sets
A Harmonized Global Continental High-resolution Planetary Boundary Layer Height Dataset Covering 2017-2021 Jianping GUO; Jian ZHANG; Jia SHAO https://doi.org/10.5281/zenodo.6498004
Model code and software
A Harmonized Global Continental High-resolution Planetary Boundary Layer Height Dataset Covering 2017-2021 Jianping GUO; Jian ZHANG; Jia SHAO https://doi.org/10.5281/zenodo.6498004
Jianping Guo et al.
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
990 | 369 | 30 | 1,389 | 15 | 29 |
- HTML: 990
- PDF: 369
- XML: 30
- Total: 1,389
- BibTeX: 15
- EndNote: 29
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1