Corrected event dataset of FY-4A LMI, 2019&ndash;2023

Zhang, Yuansheng; Qie, Xiushu; Jiang, Rubin; Cao, Dongjie; Yang, Jing; Wang, Dongfang; Liu, Mingyuan; Liu, Dongxia; Sun, Zhuling; Zhang, Hongbo; Yuan, Shanfeng

doi:10.5194/essd-2026-70

Preprints

https://doi.org/10.5194/essd-2026-70

Preprints

20 Mar 2026

| 20 Mar 2026

Status: this preprint is currently under review for the journal ESSD.

Corrected event dataset of FY-4A LMI, 2019–2023

Yuansheng Zhang, Xiushu Qie, Rubin Jiang, Dongjie Cao, Jing Yang, Dongfang Wang, Mingyuan Liu, Dongxia Liu, Zhuling Sun, Hongbo Zhang, and Shanfeng Yuan

Abstract. The Lightning Mapping Imager (LMI) aboard the Fengyun-4A (FY-4A) satellite, once one of the only two geostationary lightning detection payloads operating in space, has accumulated a substantial volume of observational data. Extensive efforts have been made to correct lightning geolocation deviations, including payload misalignment correction, cloud-top height parallax correction, and thermal deformation correction. These measures have substantially improved the geolocation accuracy of LMI observations. However, individual correction schemes are not necessarily applicable across the entire LMI field of view; furthermore, a comprehensive, unified correction dataset has yet to be established, which has limited the wider utilization of LMI data. To address the remaining systematic geolocation deviations in current LMI lightning products, we propose a new correction method based on reference data from the World Wide Lightning Location Network (WWLLN). Using ground-based lightning observations as a benchmark, the LMI field of view is subdivided into 400 subregions arranged in a 20 × 20 grid. Within each subregion, sensitivity experiments are conducted to match spaceborne LMI detections with ground-based lightning events, thereby quantifying the systematic deviation for each subregion. A weighted curve-fitting approach is then applied to the coordinate deviations derived from the matched events to obtain a correction curve for each subregion. These subregional correction curves are subsequently mapped back to the image coordinates, enabling an analysis of the spatiotemporal variability of lightning geolocation deviations across the full LMI coverage. Finally, the fitted curve values are applied as correction terms to the original data, resulting in the construction of a new, refined correction dataset. Building upon the existing Level-2 lightning products, this method significantly enhances the geolocation accuracy of LMI observations. The coordinate deviations between LMI detections and ground-based lightning network observations exhibit pronounced convergence in both the zonal and meridional components, indicating a substantial improvement in geolocation performance. In addition, a domain-wide assessment of geolocation accuracy reveals that, except for regions such as Xinjiang and Mongolia where lightning occurrence is too sparse to support robust curve fitting, the geolocation accuracy across most of the LMI field of view is relatively stable, with an average error of approximately 15 km (about 1.5 pixels), achieving high practical accuracy. The corrected dataset is publicly available at https://doi.org/10.11888/Atmos.tpdc.303312 (Zhang et al., 2026).

Received: 27 Jan 2026 – Discussion started: 20 Mar 2026

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Yuansheng Zhang, Xiushu Qie, Rubin Jiang, Dongjie Cao, Jing Yang, Dongfang Wang, Mingyuan Liu, Dongxia Liu, Zhuling Sun, Hongbo Zhang, and Shanfeng Yuan

Status: final response (author comments only)

RC1:
'Comment on essd-2026-70', Rupraj Biswasharma, 26 Apr 2026
A valuable dataset is presented in this manuscript, and it also addresses an important problem in satellite lightning geolocation. My concerns are regarding the corrections that are dependent on reference datasets. In addition spatial correctness, and validation strategy need to be addressed before the dataset can be considered fully reliable for broad scientific applications.
Physics-based corrections (e.g., payload misalignment, cloud-top height parallax, thermal deformation), followed by an additional empirical correction based on the WWLN were applied. This combination is reasonable, but it remains unclear why a substantial residual empirical correction is still required after applying physically based models. Whether these arise from CTH uncertainties, instrument/navigation limitations, or incomplete physical representation.

Authors produced a “unified correction dataset” using subregional (20 × 20) curve fitting. But that depends on lightning occurrence. This questions about the spatial consistency, especially in data-sparse regions.

The manuscript relies on WWLLN. It has biases due to its low and spatially variable detection efficiency, bias toward strong CG strokes, and location uncertainty (several km). The authors should therefore: Quantify residual error after physics-based correction alone, Justify the need for the empirical step, and Discuss how WWLLN limitations affect the robustness of the final dataset. Authors partially justified the use of the WWLN (by its global coverage); however, this should be balanced against its known limitations in detection efficiency and accuracy.

Recent studies (e.g., Wu et al., 2024, https://doi.org/10.1016/j.jastp.2024.106194) using higher-quality regional networks, such as the National Lightning Monitoring Network, have shown significant discrepancies in FY-4A LMI performance. The authors should:

Discuss the trade-off between coverage and data quality,

Evaluate how WWLLN limitations affect correction accuracy, and

Consider (or discuss) the role of regional high-DE networks for validation or hybrid approaches. The exclusion of oceanic regions further indicates that the correction is not uniformly applicable, which should be explicitly acknowledged.

The authors should include case studies comparing lightning locations before and after correction with CTT/CTH and/or radar reflectivity, to demonstrate improved alignment with convective structures. Where possible, comparison with regional networks such as the China Lightning Detection Network or National Lightning Monitoring Network would strengthen validation.

Please add a discussion of remaining limitations after correction is needed, including residual errors (~15 km), dependence on WWLLN, and implications for different applications (storm-scale vs climatology). This is particularly important given reduced reliability in low-lightning regions, as acknowledged by the authors.

It is not clear whether the correction varies with viewing geometry (e.g., satellite zenith angle) or latitude; this should be discussed.

Minor Comments
The description of the matching criteria between FY-4A LMI events and the WWLN should be clarified, specifically the choice of spatial and temporal thresholds.

Please clearly explain how the 20 × 20 subregional division was selected. Are the results sensitive to grid size?

The authors should explicitly state whether day–night differences in LMI detection efficiency were considered in the correction process.

Uncertainty estimates associated with the fitted correction functions are not clearly presented and should be included.

The reported ~15 km residual error should be discussed in terms of its implications for different applications (e.g., storm-scale vs climatological studies).

The manuscript would benefit from a brief comparison with other space-based lightning sensors, such as the GOES-R Geostationary Lightning Mapper or TRMM Lightning Imaging Sensor, to place the results in context.

The terminology distinguishing event, group, and flash should be defined clearly at the beginning for consistency.
Citation: https://doi.org/10.5194/essd-2026-70-RC1
CC1:
'Comment on essd-2026-70', Yang Zhao, 27 Apr 2026
General Comments
This manuscript addresses the systematic geolocation bias in the LMI aboard the Fengyun-4A satellite. The research framework is sound, the proposed method offers meaningful improvements over existing approaches, and the public release of the corrected dataset is a valuable contribution to the community. Overall, the manuscript meets the publication standards of ESSD. That said, several issues related to methodological transparency, result interpretation, and writing quality need to be addressed before the paper is ready for publication.
Specific Comments
The detection efficiency of WWLLN is spatially non-uniform, and is relatively low in regions with sparse station coverage (e.g., the Tibetan Plateau, Xinjiang, and Mongolia). Although the authors mention in Section 2.3 that the mean location accuracy of WWLLN is approximately 10 km, the following issues are insufficiently discussed: How does the intrinsic location error of WWLLN propagate into the final correction results? Does it significantly affect the geolocation accuracy assessment (approximately 15 km)? The manuscript notes that sparse lightning occurrence in Xinjiang and Mongolia leads to fitting difficulties, but makes no attempt to disentangle the two. Given that this distinction has direct implications for the reliability of the correction in these regions, a brief discussion would be warranted.

A temporal threshold of 3 seconds is selected for matching LMI events with WWLLN flashes. Since individual lightning flashes typically occur within a fraction of a second, a 3-second window is physically quite broad. Please clarify the rationale for this choice. Specifically, is this intended to account for clock synchronization uncertainties between the satellite platform and the ground-based network, or to capture a statistically sufficient number of regional events?

The smoothing parameter λ in Equation (1) is a key parameter controlling the trade-off between data fidelity and curve smoothness (line 175). However, the manuscript does not specify the value of λ or the rationale behind its selection (e.g., cross-validation, the L-curve method). Nor does it discuss the sensitivity of the correction results across different subregions to the choice of λ. It is also unclear whether a single uniform value of λ is applied to all 400 subregions or whether it is determined adaptively for each subregion. The authors should either justify this choice in the text or present a sensitivity test in a supplementary appendix.

The assessment of correction performance (Figures 6 and 7) relies exclusively on WWLLN data, which is the same reference used to construct the correction curves. This is, in effect, a circular validation, as the same reference data are used both to derive the correction and to assess its accuracy. We strongly recommend including at least one independent dataset for cross-validation, such as BLNET or ADTD. If access to such data is restricted, this limitation should be stated plainly, along with a candid discussion of how it affects confidence in the reported results.

The manuscript mentions that additional constraints are applied to downweight time intervals with extremely small standard deviations (lines 191–194) in order to prevent unrealistically large weights. However, neither the specific form of these constraints nor the threshold values are provided. This information is essential for reproducibility and should be included in the main text or in an appendix.

The color scale in Figure 5 is set to [−0.3, 0.3], with values exceeding the range annotated as “0.3<” and “−0.3>”. However, no information is provided regarding the proportion of pixels falling outside this range or their geographic distribution. It would be helpful to report this in the caption or the main text.

Technical Corrections
Line 56: The citation “(Peterson and Mach et al. (2022)” contains a formatting error and should be corrected.

Line 120: The section number “2.3” should be “2.2”.

There are two typographical issues worth noting: a spurious colon at the end of line 200, and a Chinese period ("。") at the end of line 235. Both should be removed. Please also verify whether the captions of Figures 1, 3, and 7 require a terminal period “.” in accordance with journal style.

The abstract runs longer than necessary. Tightening the abstract to focus on three elements — the principle of the correction method, the quantified improvement in accuracy, and the key value of the dataset — would improve its impact.

The Conclusion (Section 6) repeats much of what is already in the abstract. Rather than recapping the methodology, the authors could use this space to look ahead. For instance, a brief discussion of how this dataset might support AI-based lightning analysis or serve as a benchmark for numerical model evaluation would connect naturally with the remarks in the Introduction (lines 97–98).

There are multiple instances where a space is missing between a word and a citation bracket (e.g., Lines 34, 37, 43, 48, and 59).
Citation: https://doi.org/10.5194/essd-2026-70-CC1
RC2: 'Comment on essd-2026-70', Anonymous Referee #2, 17 May 2026

The authors describe a geolocation correction methodology for the FY-4A lightning mapping imager (LMI). They compare to ground-based lightning detection (WWLLN) in post-processing to produce a uniformly QC’d dataset for the full period of operation of the satellite.

The manuscript is clear and the availability of the data and its characteristics is most welcome.

I have a few minor comments below, which are all easily corrected.

Line 49: It will be possible to apply the authors’ correction method to FY-4C LMI data, but not in real-time since it relies on other lightning data. Is such a post-processed dataset planned? More generally, will lessons learned from FY-4A mitigate the need for such a method due to improved operational algorithms for FY-4C?

Line 75: How are thermal distortions (forced by the diurnal cycle) related to launch-induced payload displacement? Presumably those would occur even in the absence of any launch effects.

Line 129: what were the other “ground-based lightning observations” used by Fan et al. (2018) for comparison with WWLLN?

Line 131: “of [WWLLN is] around 10 km”

Line 141: please provide a reference for “marginal effect analysis”

Line 179: “the weight assigned to the 𝑘-th data point” is defined as omega sub k with an overbar, but the formula also has a tilde above the overbar. In the definition of Eq. 2, there is an additional notation that adds a superscript “raw” to omega. Please clarify how each of these symbols are related to one another, and their possibly varied role in the spline fitting process. For example, is Eq. 2 only a starting estimate for omega?

Section 3.1: matching of WWLLN strokes to LMI events is described here. I would have expected matching to group centroids. In the case of the a bright optical pulse comprised of many pixels, does the authors’ fitting procedure only include those events that are in the 3s and 30 km space/time window? Are the events outside this window simply discarded in the fitting analysis?

Please mark the four subregions shown in Fig. 4 on Fig. 3

Line 243: I don’t see an east-to-west propagation of the error pattern. To me it looks more like a steady amplification of regions that always have a relative maximum of errors.

Line 289: Typically, more samples should lead to improved location, but the conclusion here is that a low number of samples leads to better fits. Why? Does this suggest a shortcoming in some part of the fitting approach?

Line 291: I’m not convinced that the large errors in the NE are due to the large ground footprint of the detector elements. If that were the case, the NW part of the domain should exhibit the same pattern.

Line 300: While I don’t object to distribution of a CSV file format, which is easily understood and machine readable (if lacking in terms of metadata), it is not true that NetCDF files will be larger. Naively reading the 20190406.csv dataset (49.2 MB) with pandas and writing it back to NetCDF with xarray was a bit smaller (48.7 MB), even naively using 64 bit ints and floats for all columns. However, the file can easily be made much smaller. Year, month, day, hour, minute and second can all be stored in much fewer than 64 bits, and packing the floats as either 16 or 32 bit floats, or as ints using scale_factor and add_offset, will save further space. Finally, internal zlib compression can also be used on NetCDF files and will be transparently decoded by the library. Finally, the authors might also consider using the CF metadata standard to encode time as a single variable.

Citation: https://doi.org/10.5194/essd-2026-70-RC2

Yuansheng Zhang, Xiushu Qie, Rubin Jiang, Dongjie Cao, Jing Yang, Dongfang Wang, Mingyuan Liu, Dongxia Liu, Zhuling Sun, Hongbo Zhang, and Shanfeng Yuan

Data sets

Corrected event dataset of FY-4A LMI in Northern Hemisphere (2019-2023, March-September) Yuansheng Zhang, Xiushu Qie, Rubin Jiang, Dongjie Cao, Jing Yang, Dongfang Wang, Mingyuan Liu, Dongxia Liu, Zhuling Sun, Hongbo Zhang, and Shanfeng Yuan https://doi.org/10.11888/Atmos.tpdc.303312

Yuansheng Zhang, Xiushu Qie, Rubin Jiang, Dongjie Cao, Jing Yang, Dongfang Wang, Mingyuan Liu, Dongxia Liu, Zhuling Sun, Hongbo Zhang, and Shanfeng Yuan

Viewed

Total article views: 317 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
224	72	21	317	18	26

HTML: 224
PDF: 72
XML: 21
Total: 317
BibTeX: 18
EndNote: 26

Views and downloads (calculated since 20 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	68	21	5	94
Apr 2026	129	39	12	180
May 2026	27	12	4	43

Cumulative views and downloads (calculated since 20 Mar 2026)

Month	HTML	PDF	XML	Total
Mar 2026	68	21	5	94
Apr 2026	129	39	12	180
May 2026	27	12	4	43

Viewed (geographical distribution)

Total article views: 327 (including HTML, PDF, and XML) Thereof 327 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 27 May 2026

Short summary

This study presents a corrected 2019–2023 dataset for the FengYun-4A Lightning Mapping Imager, achieving ~15 km geolocation accuracy through correction referenced to the World Wide Lightning Location Network. This open-access resource enhances lightning monitoring and atmospheric research.


Total:	0
HTML:	0
PDF:	0
XML:	0