Three-Dimensional Biomass Burning Emission Inventory for Southeast and East Asia Based on Multi-Source Data Fusion and Machine Learning

Jin, Yinbao; Huang, Heng; Liu, Jian; Liu, Yiming; Chen, Xiaoyang; Chen, Yongqiang; Li, Licheng; Fan, Qi

doi:10.5194/essd-2025-515

Preprints

https://doi.org/10.5194/essd-2025-515

Preprints

21 Oct 2025

| 21 Oct 2025

Status: this preprint is currently under review for the journal ESSD.

Three-Dimensional Biomass Burning Emission Inventory for Southeast and East Asia Based on Multi-Source Data Fusion and Machine Learning

Yinbao Jin, Heng Huang, Jian Liu, Yiming Liu, Xiaoyang Chen, Yongqiang Chen, Licheng Li, and Qi Fan

Abstract. Biomass burning (BB) is a major source of atmospheric pollutants in Southeast and East Asia (SEA), yet most existing emission inventories lack accurate diurnal cycles and vertical injection profiles, limiting the accuracy of air quality and climate simulations. This study develops the Southeast and East Asia Fire (SEAF) inventory, an hourly 3 km three-dimensional (3D) emission dataset for 2023, by fusing fire radiative power (FRP) from Himawari-8/9 AHI and VIIRS through cloud correction, cross-calibration, and a region–vegetation-specific Gaussian diurnal reconstruction with dynamic gap filling. Vertical profiles are further constrained using a random forest (RF) – Shapley Additive Explanations (SHAP) framework trained with Multi-angle Imaging SpectroRadiometer (MISR) smoke plume heights (SPH) and ERA5 meteorology. The SEAF inventory exhibited strong consistency with TROPOMI CO, showing a correlation of R = 0.97 in monthly columns and differing by only 7.81 % during a representative event on 9 March 2023. Annual PM_2.5 emissions in SEAF are approximately 2362 Gg y^-1, which is 67 % lower than the Fire INventory from NCAR (FINN) but aligns well with the Fire Energetics and Emissions Research (FEER) and the Quick Fire Emissions Dataset (QFED) estimates. The RF–SHAP framework successfully predicted SPH, with over 90 % of estimates within ± 500 m. This approach corrects the near-surface overweighting of conventional schemes by reducing emissions below 0.3 km and enhancing injection between 2.7–5.5 km during the spring burning peak, yielding vertical profiles that closely align with satellite observations. SHAP analysis identified temperature- and radiation-related factors, particularly the vertical integral of temperature (Vit) and terrain elevation, as the primary drivers of SPH, with additional contributions from FRP, planetary boundary layer height, and seasonal–meteorological interactions. These advances in both diurnal timing and vertical injection are anticipated to provide an observation-driven, hourly 3D BB emission dataset for SEA that can improve the reliability of air quality, climate, and policy assessment models.

Received: 21 Aug 2025 – Discussion started: 21 Oct 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 4671 KB)

Supplement (2070 KB)

Download & links

Yinbao Jin, Heng Huang, Jian Liu, Yiming Liu, Xiaoyang Chen, Yongqiang Chen, Licheng Li, and Qi Fan

Status: open (until 06 Jan 2026)

Post a comment Subscribe to comment alert

RC1:
'Comment on essd-2025-515', Anonymous Referee #1, 03 Dec 2025 reply
Referee Comment
Title: Three-Dimensional Biomass Burning Emission Inventory for Southeast and East Asia Based on Multi-Source Data Fusion and Machine Learning

Author(s): Yinbao Jin et al.

MS No.: essd-2025-515

MS type: Data description paper
General Comments:
Jin et al. clearly define the issue at hand and how they have contributed to the solution and what the resulting benefits are to science and society; namely, that common biomass burning emission inventories often omit diurnal information and vertical injection heights of fires, and that by incorporating their ideas, they have an emissions product for Southeast and East Asia (SEAF) that includes these two important pieces of information that ultimately improves the accuracy of models used for reporting and assessment of air quality, climate and public policy. They have done careful work to compare their results with measurements from TROPOMI, MISR and CALIPSO, and they present a comparison of their emissions dataset with common established BB emission datasets.
Of particular note is that they have been able to generate SEAF in such a way as to closely match not only the 2D structure of emissions observed from satellite, but the 3D structure as well, with the help of machine learning, albeit difficult for them to capture short-lived fires and emissions from low-lofted plumes. This is an encouraging contribution to the biomass burning emissions research community, and furthers the community’s desire to see uncertainty in regional and global emissions decrease significantly.
The main concern of this reviewer involves their method for generating and applying the diurnal FRP cycle. The authors need to include more discussion and review of the community’s efforts regarding this very needed step in emission inventory development, and to show how their efforts are either similar to these previously published methods or are an improvement upon them.
Overall, I recommend this article for publication with minor revisions.
Specific Comments:
Lines 74-76: I believe this statement is a bit misleading. The authors are claiming here that the “integrated inventories” (i.e. the multi-source emissions datasets that incorporate diurnal cycles) generally have a large uncertainty when compared to the standard emission inventories. There may be two issues: 1) the authors are making the case that the standard inventories have uncertainty themselves and thus the need for integrated inventories, so although comparing against these standard inventories is important, we need to know if the areas where these standard inventories are weak and if the integrated inventories improve upon them, and 2) Xu et al. 2023b is given as a paper that has assessed these integrated inventories and found large uncertainty (the numbers given in this sentence), but it seems as if the paper is only assessing their CHOBE inventory, which is not mentioned in their list of examples previously, and moreover, it does not assess any of the more well-known integrated inventories that the authors here present to us. It would be good to better present the issues with the integrated inventories that already exist, and to show how they improve upon the standard inventories but still underperform in areas that a new inventory can accomplish.

Lines 76-79: It is mentioned that using a Gaussian scheme to reconstruct the diurnal cycle is a problem, but the authors use a Gaussian scheme themselves, so it should be specified better how the authors use the Gaussian scheme differently than the other integrated inventories. These works should be mentioned and discussed: Ellicott et al. 2009, Vermote et al. 2009, Andela et al. 2015, Kaiser et al. 2012, Li et al. 2019/2022, and Zheng 2021. Ellicott and Kaiser are not mentioned in the paper at all. Kaiser in particular provides an alternative method for cloud correction using a Kalman filter temporal prediction.

Lines 208-209: Using a beta value of 0.95 seems really high to me. Does this mean that if you have only 5% non-cloud, you still estimate the undetected FRP for the 95% cloud-filled portion of a grid cell from the 5%? Have you tried a lower threshold to see how the results change, or not?

Eqs. 3 & 4: If you combine the equations, the 1 cancels out and you are simply left with multiplying the AHI FRP by the ratio of VIIRS to AHI FRP for the coincident data. Perhaps it would be simpler to define r as simply the fraction, unless it is beneficial to center the ratios around zero.

Eq. 6: I would be very interested to know if using ri instead of rml makes much of a difference in general. My idea would have been to prioritize the general calibration (rml) over of instantaneous calibrations (ri) in developing the diurnal cycle, but here you prioritize the instantaneous calibrations.

Eq. 7: Did you mean to say “VIIRS/AHI FRP = 0” instead of “… < 0”?

Line 263: I may not be understanding correctly, but if I do, I think the term “T_gap” conveys the opposite of what is being described. “T_gap” makes me think it is the period between high intensity burn periods, not how long the high intensity burn lasts.

Lines 268-269: Please explain how these Gaussian curves were constructed, e.g. were the peaks adjusted before averaging between days to account for any daily differences?

Lines 340-342: citation needed

Lines 342-355: There doesn’t seem to be any citations or analysis done to show how the increased estimates of FRP in cloud-filled regions compare to reality. Clearly, there must be an increase, but without any general idea of how much is missing due to cloud cover, it doesn’t seem to decrease the uncertainty in the results.

Line 370 / Fig. S4: Please list the units for time in the caption – I can’t seem to make the “9:00-16:00 local time” for NE China (region 4) cropland (column 1) correspond to the figure. The x-axis shows this to peak at ~6:00, which if in UTC would correspond to a local CST time of 14:00, which is not the center of 9:00-16:00. Also, it seems as if for Region 1 the Gaussian doesn’t work too well because of the prolonged right tail – perhaps a skew term should be introduced.

Lines 378-380 / Fig. 5: Would you please briefly discuss what is going on in panels g and l. How are the filled data points so far off the GLS in g, and why was the peak not shifted right in l? Of all the panels, only a, m, p, q and t seem convincing and substantial; the rest of the panels have peaks that are difficult to corroborate with the presented data.

Lines 419-440 / Fig. 7: The differences between the original and filled mean FRP values are so low compared with their standard deviations that it’s hard to argue for the significance of these changes. Please address this. Also, do I understand correctly that the conventional Gaussian fits result in lower FRP for most of the regions? How did you define the conventional fit? If simply by not using GLS and GVM, then would you mind mentioning how e.g. Ellicott et al. 2009, Zheng et al. 2021, and Andela et al. 2015 all have dynamic diurnal Gaussian fits that change amplitude and/or duration with each day’s data, and how you are similar or different in your approach.

Lines 556-557: Can you remind the reader to reference Figure 12a for this claim about the comparisons to MISR?

Fig. 4: Please explain how the cloud-corrected AHI FRP data can be less than the non-cloud-corrected data.

Figs. 4, 7 & S7b: The resolution of the images seems to be too coarse. There appears to be a gridded/stripped pattern in the images – it appears to be an artifact in the data (particularly in Fig. 7), but it could just be the poor image resolution. Please update the resolution – I would like to be able to see more detail when I zoom in. Please also confirm what the stripped pattern is if it is indeed an artifact.

Fig. 11: I think the images would be easier to interpret if you kept the SEAF panel as is and converted the rest of the panels to difference maps.

Fig. 14: It appears to me that the SEAF emissions in panel d is greater than that of panel e, but in panel f, it is reported as lower. Is this a mistake, or is there a dynamic that is not visually observable with panels d and e?

Fig. S7: The color scale is a bit unhelpful since the values saturate too quickly to be able to do any useful visual comparisons. Either stretch the scale, or as suggested with Fig. 11, convert the images to difference maps. Also, the units need to be changed to do proper comparisons since the spatial resolutions are different between the panels. Please convert them to Mg/yr/km^2.

Technical Corrections:
Line 72: missing period

Line 77: “understate” might not be the best word, maybe underestimate, limit, etc.

Line 120: missing space

Line 332: delete first comma?

Reply
Citation: https://doi.org/10.5194/essd-2025-515-RC1
RC2: 'Comment on essd-2025-515', Anonymous Referee #2, 10 Dec 2025 reply

The comment was uploaded in the form of a supplement: https://essd.copernicus.org/preprints/essd-2025-515/essd-2025-515-RC2-supplement.pdf
Reply

Citation: https://doi.org/10.5194/essd-2025-515-RC2

Yinbao Jin, Heng Huang, Jian Liu, Yiming Liu, Xiaoyang Chen, Yongqiang Chen, Licheng Li, and Qi Fan

Supplement

https://doi.org/10.5194/essd-2025-515-supplement

Data sets

Three-Dimensional Biomass Burning Emission Inventory for Southeast and East Asia Based on Multi-Source Data Fusion and Machine Learning Yinbao Jin https://doi.org/10.5281/zenodo.16793129

Yinbao Jin, Heng Huang, Jian Liu, Yiming Liu, Xiaoyang Chen, Yongqiang Chen, Licheng Li, and Qi Fan

Viewed

Total article views: 390 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
296	70	24	390	33	21	28

HTML: 296
PDF: 70
XML: 24
Total: 390
Supplement: 33
BibTeX: 21
EndNote: 28

Views and downloads (calculated since 21 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	135	16	6	157
Nov 2025	88	7	9	104
Dec 2025	73	47	9	129

Cumulative views and downloads (calculated since 21 Oct 2025)

Month	HTML	PDF	XML	Total
Oct 2025	135	16	6	157
Nov 2025	88	7	9	104
Dec 2025	73	47	9	129

Viewed (geographical distribution)

Total article views: 383 (including HTML, PDF, and XML) Thereof 383 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 22 Dec 2025

Download

Preprint (4671 KB)
Metadata XML

Short summary

Fires in Southeast and East Asia release large amounts of smoke that harm air quality, weather, and climate. Existing datasets often miss night-time burning and how smoke rises in the atmosphere. We created an open dataset for 2023 that records fire emissions every hour in three dimensions at high resolution. By combining satellite data and machine learning, it improves understanding of when and where smoke is released and supports better forecasts and policy decisions.


Total:	0
HTML:	0
PDF:	0
XML:	0