the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Corrected event dataset of FY-4A LMI, 2019–2023
Abstract. The Lightning Mapping Imager (LMI) aboard the Fengyun-4A (FY-4A) satellite, once one of the only two geostationary lightning detection payloads operating in space, has accumulated a substantial volume of observational data. Extensive efforts have been made to correct lightning geolocation deviations, including payload misalignment correction, cloud-top height parallax correction, and thermal deformation correction. These measures have substantially improved the geolocation accuracy of LMI observations. However, individual correction schemes are not necessarily applicable across the entire LMI field of view; furthermore, a comprehensive, unified correction dataset has yet to be established, which has limited the wider utilization of LMI data. To address the remaining systematic geolocation deviations in current LMI lightning products, we propose a new correction method based on reference data from the World Wide Lightning Location Network (WWLLN). Using ground-based lightning observations as a benchmark, the LMI field of view is subdivided into 400 subregions arranged in a 20 × 20 grid. Within each subregion, sensitivity experiments are conducted to match spaceborne LMI detections with ground-based lightning events, thereby quantifying the systematic deviation for each subregion. A weighted curve-fitting approach is then applied to the coordinate deviations derived from the matched events to obtain a correction curve for each subregion. These subregional correction curves are subsequently mapped back to the image coordinates, enabling an analysis of the spatiotemporal variability of lightning geolocation deviations across the full LMI coverage. Finally, the fitted curve values are applied as correction terms to the original data, resulting in the construction of a new, refined correction dataset. Building upon the existing Level-2 lightning products, this method significantly enhances the geolocation accuracy of LMI observations. The coordinate deviations between LMI detections and ground-based lightning network observations exhibit pronounced convergence in both the zonal and meridional components, indicating a substantial improvement in geolocation performance. In addition, a domain-wide assessment of geolocation accuracy reveals that, except for regions such as Xinjiang and Mongolia where lightning occurrence is too sparse to support robust curve fitting, the geolocation accuracy across most of the LMI field of view is relatively stable, with an average error of approximately 15 km (about 1.5 pixels), achieving high practical accuracy. The corrected dataset is publicly available at https://doi.org/10.11888/Atmos.tpdc.303312 (Zhang et al., 2026).
- Preprint
(1670 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2026-70', Rupraj Biswasharma, 26 Apr 2026
-
CC1: 'Comment on essd-2026-70', Yang Zhao, 27 Apr 2026
General Comments
This manuscript addresses the systematic geolocation bias in the LMI aboard the Fengyun-4A satellite. The research framework is sound, the proposed method offers meaningful improvements over existing approaches, and the public release of the corrected dataset is a valuable contribution to the community. Overall, the manuscript meets the publication standards of ESSD. That said, several issues related to methodological transparency, result interpretation, and writing quality need to be addressed before the paper is ready for publication.
Specific Comments
- The detection efficiency of WWLLN is spatially non-uniform, and is relatively low in regions with sparse station coverage (e.g., the Tibetan Plateau, Xinjiang, and Mongolia). Although the authors mention in Section 2.3 that the mean location accuracy of WWLLN is approximately 10 km, the following issues are insufficiently discussed: How does the intrinsic location error of WWLLN propagate into the final correction results? Does it significantly affect the geolocation accuracy assessment (approximately 15 km)? The manuscript notes that sparse lightning occurrence in Xinjiang and Mongolia leads to fitting difficulties, but makes no attempt to disentangle the two. Given that this distinction has direct implications for the reliability of the correction in these regions, a brief discussion would be warranted.
- A temporal threshold of 3 seconds is selected for matching LMI events with WWLLN flashes. Since individual lightning flashes typically occur within a fraction of a second, a 3-second window is physically quite broad. Please clarify the rationale for this choice. Specifically, is this intended to account for clock synchronization uncertainties between the satellite platform and the ground-based network, or to capture a statistically sufficient number of regional events?
- The smoothing parameter λ in Equation (1) is a key parameter controlling the trade-off between data fidelity and curve smoothness (line 175). However, the manuscript does not specify the value of λ or the rationale behind its selection (e.g., cross-validation, the L-curve method). Nor does it discuss the sensitivity of the correction results across different subregions to the choice of λ. It is also unclear whether a single uniform value of λ is applied to all 400 subregions or whether it is determined adaptively for each subregion. The authors should either justify this choice in the text or present a sensitivity test in a supplementary appendix.
- The assessment of correction performance (Figures 6 and 7) relies exclusively on WWLLN data, which is the same reference used to construct the correction curves. This is, in effect, a circular validation, as the same reference data are used both to derive the correction and to assess its accuracy. We strongly recommend including at least one independent dataset for cross-validation, such as BLNET or ADTD. If access to such data is restricted, this limitation should be stated plainly, along with a candid discussion of how it affects confidence in the reported results.
- The manuscript mentions that additional constraints are applied to downweight time intervals with extremely small standard deviations (lines 191–194) in order to prevent unrealistically large weights. However, neither the specific form of these constraints nor the threshold values are provided. This information is essential for reproducibility and should be included in the main text or in an appendix.
- The color scale in Figure 5 is set to [−0.3, 0.3], with values exceeding the range annotated as “0.3<” and “−0.3>”. However, no information is provided regarding the proportion of pixels falling outside this range or their geographic distribution. It would be helpful to report this in the caption or the main text.
Technical Corrections
- Line 56: The citation “(Peterson and Mach et al. (2022)” contains a formatting error and should be corrected.
- Line 120: The section number “2.3” should be “2.2”.
- There are two typographical issues worth noting: a spurious colon at the end of line 200, and a Chinese period ("。") at the end of line 235. Both should be removed. Please also verify whether the captions of Figures 1, 3, and 7 require a terminal period “.” in accordance with journal style.
- The abstract runs longer than necessary. Tightening the abstract to focus on three elements — the principle of the correction method, the quantified improvement in accuracy, and the key value of the dataset — would improve its impact.
- The Conclusion (Section 6) repeats much of what is already in the abstract. Rather than recapping the methodology, the authors could use this space to look ahead. For instance, a brief discussion of how this dataset might support AI-based lightning analysis or serve as a benchmark for numerical model evaluation would connect naturally with the remarks in the Introduction (lines 97–98).
- There are multiple instances where a space is missing between a word and a citation bracket (e.g., Lines 34, 37, 43, 48, and 59).
Citation: https://doi.org/10.5194/essd-2026-70-CC1 -
RC2: 'Comment on essd-2026-70', Anonymous Referee #2, 17 May 2026
The authors describe a geolocation correction methodology for the FY-4A lightning mapping imager (LMI). They compare to ground-based lightning detection (WWLLN) in post-processing to produce a uniformly QC’d dataset for the full period of operation of the satellite.
The manuscript is clear and the availability of the data and its characteristics is most welcome.
I have a few minor comments below, which are all easily corrected.
Line 49: It will be possible to apply the authors’ correction method to FY-4C LMI data, but not in real-time since it relies on other lightning data. Is such a post-processed dataset planned? More generally, will lessons learned from FY-4A mitigate the need for such a method due to improved operational algorithms for FY-4C?
Line 75: How are thermal distortions (forced by the diurnal cycle) related to launch-induced payload displacement? Presumably those would occur even in the absence of any launch effects.
Line 129: what were the other “ground-based lightning observations” used by Fan et al. (2018) for comparison with WWLLN?
Line 131: “of [WWLLN is] around 10 km”
Line 141: please provide a reference for “marginal effect analysis”
Line 179: “the weight assigned to the 𝑘-th data point” is defined as omega sub k with an overbar, but the formula also has a tilde above the overbar. In the definition of Eq. 2, there is an additional notation that adds a superscript “raw” to omega. Please clarify how each of these symbols are related to one another, and their possibly varied role in the spline fitting process. For example, is Eq. 2 only a starting estimate for omega?
Section 3.1: matching of WWLLN strokes to LMI events is described here. I would have expected matching to group centroids. In the case of the a bright optical pulse comprised of many pixels, does the authors’ fitting procedure only include those events that are in the 3s and 30 km space/time window? Are the events outside this window simply discarded in the fitting analysis?
Please mark the four subregions shown in Fig. 4 on Fig. 3
Line 243: I don’t see an east-to-west propagation of the error pattern. To me it looks more like a steady amplification of regions that always have a relative maximum of errors.
Line 289: Typically, more samples should lead to improved location, but the conclusion here is that a low number of samples leads to better fits. Why? Does this suggest a shortcoming in some part of the fitting approach?
Line 291: I’m not convinced that the large errors in the NE are due to the large ground footprint of the detector elements. If that were the case, the NW part of the domain should exhibit the same pattern.
Line 300: While I don’t object to distribution of a CSV file format, which is easily understood and machine readable (if lacking in terms of metadata), it is not true that NetCDF files will be larger. Naively reading the 20190406.csv dataset (49.2 MB) with pandas and writing it back to NetCDF with xarray was a bit smaller (48.7 MB), even naively using 64 bit ints and floats for all columns. However, the file can easily be made much smaller. Year, month, day, hour, minute and second can all be stored in much fewer than 64 bits, and packing the floats as either 16 or 32 bit floats, or as ints using scale_factor and add_offset, will save further space. Finally, internal zlib compression can also be used on NetCDF files and will be transparently decoded by the library. Finally, the authors might also consider using the CF metadata standard to encode time as a single variable.
Citation: https://doi.org/10.5194/essd-2026-70-RC2
Data sets
Corrected event dataset of FY-4A LMI in Northern Hemisphere (2019-2023, March-September) Yuansheng Zhang, Xiushu Qie, Rubin Jiang, Dongjie Cao, Jing Yang, Dongfang Wang, Mingyuan Liu, Dongxia Liu, Zhuling Sun, Hongbo Zhang, and Shanfeng Yuan https://doi.org/10.11888/Atmos.tpdc.303312
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 224 | 72 | 21 | 317 | 18 | 26 |
- HTML: 224
- PDF: 72
- XML: 21
- Total: 317
- BibTeX: 18
- EndNote: 26
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
A valuable dataset is presented in this manuscript, and it also addresses an important problem in satellite lightning geolocation. My concerns are regarding the corrections that are dependent on reference datasets. In addition spatial correctness, and validation strategy need to be addressed before the dataset can be considered fully reliable for broad scientific applications.
Minor Comments