the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
WetCH4: A Machine Learning-based Upscaling of Methane Fluxes of Northern Wetlands during 2016–2022
Abstract. Wetlands are the largest natural source of methane (CH4) emissions globally. Northern wetlands (>45° N), accounting for 42 % of global wetland area, are increasingly vulnerable to carbon loss, especially as CH4 emissions may accelerate under intensified high-latitude warming. However, the magnitude and spatial patterns of high-latitude CH4 emissions remain relatively uncertain. Here we present estimates of daily CH4 fluxes obtained using a new machine learning-based wetland CH4 upscaling framework (WetCH4) that applies the most complete database of eddy covariance (EC) observations available to date, and satellite remote sensing informed observations of environmental conditions at 10-km resolution. The most important predictor variables included near-surface soil temperatures (top 40 cm), vegetation reflectance, and soil moisture. Our results, modeled from 138 site-years across 26 sites, had relatively strong predictive skill with a mean R2 of 0.46 and 0.62 and a mean absolute error (MAE) of 23 nmol m-2 s-1 and 21 nmol m-2 s-1 for daily and monthly fluxes, respectively. Based on the model results, we estimated an annual average of 20.8 ±2.1 Tg CH4 yr-1 for the northern wetland region (2016–2022) and total budgets ranged from 13.7–44.1 Tg CH4 yr-1, depending on wetland map extents. Although 86 % of the estimated CH4 budget occurred during the May–October period, a considerable amount (1.4 ±0.2 Tg CH4) occurred during winter. Regionally, the West Siberian wetlands accounted for a majority (51 %) of the interannual variation in domain CH4 emissions. Significant issues with data coverage remain, with only 23 % of the sites observing year-round and most of the data from 11 wetland sites in Alaska and 10 bog/fen sites in Canada and Fennoscandia, and in general, Western Siberian Lowlands are underrepresented by EC CH4 sites. Our results provide high spatiotemporal information on the wetland emissions in the high-latitude carbon cycle and possible responses to climate change. Continued, all-season tower observations and improved soil moisture products are needed for future improvement of CH4 upscaling. The dataset can be found at https://doi.org/10.5281/zenodo.10802154 (Ying et al., 2024).
- Preprint
(5692 KB) - Metadata XML
-
Supplement
(4919 KB) - BibTeX
- EndNote
Status: final response (author comments only)
-
CC1: 'Comment on essd-2024-84', Tyler Herrington, 10 Apr 2024
Publisher’s note: the content of this comment was removed on 11 April 2024 since the comment was posted by mistake.
Citation: https://doi.org/10.5194/essd-2024-84-CC1 -
RC1: 'Comment on essd-2024-84', Anonymous Referee #1, 27 May 2024
The study by Ying et al. sets out to upscale methane fluxes across the northern high latitudes (>45° N) with the use of machine learning (i.e., random forest). While there have been recent studies with similar approaches, this study is a useful addition to those existing ones, exploring some new directions. There is a lot of detail here, and I appreciate that the authors try to evaluate their results using different wetland maps.
That being said, the paper still needs quite some improvement. The writing is sometimes hard to follow, or imprecise, and this should be improved. I have suggested a large number of fixes down below, but it would be good if the language of the whole paper is checked thoroughly.
I was also surprised to see DEM coming on top as the most important variable in the LOOCV scheme (Fig. 6), which doesn’t make sense to me. If I understand correctly, then DEM refers here only to elevation (since slope, spi and cti are defined separately). It is not explained properly how elevation would influence methane emissions. Temperatures become lower and precipitation increases with altitude, but temperature and wetness are already included as variables and score lower in this scheme. Is DEM simply a good predictor because most wetlands are found at low elevations rather than that it’s a driver of emissions? I would like to see a better explanation for this result, and evidence that it’s not an artificial signal.
Other than that, I have some comments about definitions. First of all, the paper mentions a few times that it aims to be a study of the Arctic-Boreal region, while in fact it looks at the whole region north of 45° N and includes two sites from northern Germany, which are clearly outside of the Arctic-Boreal region. I see from Table S2 that multiple sites in Canada and the USA are also classified as temperate. Either restrict your domain to the Arctic-Boreal region or rephrase in the document that you are looking at northern high latitudes. In that case, please add information on methane emissions in temperate biomes to the introduction.
Finally, I understand the use of WAD2M, and this manuscript covers some of its limitations, but this is a missed opportunity to improve the applicability of this product for high latitude wetlands. Correct me if I’m wrong but WAD2M shows a seasonal cycle, going towards zero where soils are frozen and underestimating ecosystems such as bogs where methane emissions occur also when the soil is not inundated. Also, northern wetlands are rather stable, in contrast to wetlands at lower latitudes, and observations show that methane can still be emitted in winter. So, a seasonal cycle in wetland extent is not that useful for these northern environments.
The authors have taken the average seasonal cycle for WAD2M, but this does not solve this problem. In fact, it may introduce new problems since you use SMAP soil wetness, which will correlate with the inundation dynamics in WAD2M. So why not keep wetland extent from WAD2M fixed throughout each year, for example by taking the maximum annual extent, and then model methane emissions according to your observations of soil moisture and other variables? Other solutions may be possible, and the authors acknowledge that WAD2M is not perfect (give the comparison to CALU on the North Slope). Since WAD2M is being used by many people in the community, I would have liked to have seen a discussion on how to improve its usefulness for cold climates from this paper.
Detailed comments:
Line 79-80: please reference Thornton et al. (2016) who originally raised this issue of double counting:
Thornton, B. F., Wik, M., & Crill, P. M. (2016). Double counting challenges the accuracy of high latitude methane inventories. Geophysical Research Letters, 43(24), 12,569-12,577. https://doi.org/10.1002/2016GL071772
Line 81-89: This paragraph feels more like a list of bullet points rather than text. Please rewrite this to make it more readable.
Line 82: “to wet tundra” should be “as wet tundra”
Line 83: which exceptions?
Line 91-92: The wording “recent increase” suggests systematic change (i.e. a trend), but you talk only about the difference between 2019 and 2020. That’s interannual variability. Please rephrase.
Line 104: “half hourly” should be “at half-hourly intervals”.
Line 107: “outside the network”: please change to “outside of the network”.
Line 113: Independent to what?
Line 132: how would this approach lead to a bias? And in which direction?
Line 141: “for the computation efficiency”: change to “for computation efficiency”
Line 163: The word “freshwater” can be removed here.
Line 174: “ensembled” should be “ensemble”
Line 187-188: please check the structure of this sentence, right now it’s rather confusing.
Line 193: “model RF”: do you mean “model with RF”?
Line 201: Don’t you mean latitudes? Longitudinal width of grid cells becomes smaller the further North you get, but they stay the same width along latitudes.
Fig 2: It’s very difficult to see from this map how many EC-towers you used. I count 19 circles, but the text mentions 26 sites? Also, please change the color for wet tundra. Very hard to distinguish this from the wetland fraction.
Line 219: which 8 sites? Please mention the Fluxnet codes here.
Line 220: which 4 sites? Again, please mention with Fluxnet codes.
Line 22: “largest high latitude EC-data compilation”: please add “for methane. There are larger syntheses for CO2.
Line 238-239: Only 2.5% of sites had winter data? So only one in 40? Of your 26 sites?
Line 248-249: This is a missed opportunity! Water table depth is a much better predictor than soil moisture. Why not run the site-level model for the subset of sites with water level data?
Line 263: “modeling”: Grid-level or site-level?
Line 264: how was this interpolation done? Does it use spatial data with a higher resolution?
Line 273: Did you check with MERRA that temperatures were below freezing for these gaps? That you’re relatively confident that soils were indeed frozen? Might not be true for near-coastal areas.
Line 284: This is a very deep root zone for the Arctic! Especially in wetlands that are dominated by sedges, rushes and grasses or in areas where the active layer never becomes deeper than half a meter or less. A root zone of 15 or 20 cm makes much more sense for high latitude wetlands. Maybe SMAP only gives this for the top meter? How does this affect your results?
Line 293: “existing upscaling models”: which?
Line 305: “a multiple” should be “the multiple”.
Line 308: “based” should be “based on”
Line 331: “Rice paddies”? Or “a rice paddy”?
Line 411: Resampled how?
Line 429-430: What is the motivation to say that GWLD is the maximum potential emission surface? There are other wetland maps out there. Does GWLD have the highest extent of them all?
Line 446: “thought to be”: by whom? And if GWLD underestimates in the north slope, then it is clearly not the maximum potential emission surface. Perhaps it underestimates elsewhere also? Can you show a comparison of CALU to WAD2M and GWLD for the North Slope?
Line 521-525: if these mismatches are due to scale discrepancies, then why would this differ among ecotypes? In particular, why is it so large for fens?
Line 543-544: So, this site cannot be well represented by WAD2M since it assumes inundation?
Figure 4: Please add that these are grid-level model simulations.
Table 2: What is the value of showing this in the main document? Can it be moved to the supplemental?
Line 580 and Figure 6: I don’t understand why DEM comes out as a good predictor. Elevation does not control methane emissions. It influences precipitation and temperature, but you already have those variables included.
Line 605-606: It’s no surprise that these spatial patterns are similar, since the models from the GCP use the same wetland map as you do!
Line 612: Why use monthly inundation data? Wetland extent at high latitudes does not vary much over the year (unlike tropical wetlands). Aren’t you enforcing a seasonal cycle that could be better simulated by using temperature?
Line 667-668: And large bursts of methane, see the paper by Mastepanov et al. that you cite later on.
Line 689: Why would emissions double due to permafrost thaw? Very speculative, also because permafrost thaw changes the hydrology of the landscape. If this leads to more drainage, then it can lower emissions!
Line 708: remove “despite”
Line 710-712: Not sure I agree. If I understand correctly, then WAD2M shows low wetland extent when soils are frozen. If you are using the mean over several years, then it is not possible to have higher emissions in years where spring thaw comes early and wetland extent in WAD2M would have been higher as a result.
Line 725 and Fig S11: the colors in S11 are very hard to distinguish from each other. Please use a more distinct palette.
Line 731: Which domain?
Line 732: “the percentage of a variation to the period mean of a subregion”. Very unclear. What does this mean?
Figure 10: I don’t understand the numbers behind this graph. If the interannual variability is with respect to a mean, then how can the average be non-zero? Or are these boxplots showing medians? With so few years it’s better to replot this in a similar style as Fig 9.
Line 761: it’s correct that soil temperature is more variable than air temperature, but I don’t think that the coarse scale from MERRA can help there since it doesn’t model snow cover at the resolution that you need. This goes against your argument in line 765. Perhaps the soil temperatures work better because they have a different amplitude and also perhaps showing a lag to air temperature?
Line 792: Which performance requirements?
Line 800: Is RF even useful for hysteresis effects?
Line 825: “Data deficiency in winter” is this the flux data? Or the limited applicability of WAD2M when soils are frozen?
Line 880: How are they comparable? What’s the difference on an annual basis?
Supplemental Text 2, line 16-27: please move this to the main document, because it answers a lot of questions that I had on why sites were missing from your analysis. Also, what was the quality control mentioned in line 26?
Citation: https://doi.org/10.5194/essd-2024-84-RC1 -
RC2: 'Comment on essd-2024-84', Anonymous Referee #2, 05 Jun 2024
Wetlands are the largest natural source of global methane (CH4) emissions, but with the largest uncertainty. Ying et al. generated a machine learning based regional (>45) wetland CH4 upscaling dataset. In general, their work is very important, and they provided a new data-driven benchmark dataset constraint by the most eddy covariance observations, with the highest spatial and temporal resolution, compared with previous ML-based wetland CH4 upscaling products. However, there remain many parts that are not clear or rigorous enough. Detailed comments can be seen as follows:
Major comments:
- For the feature selection part, why did you choose the first 10 variables? Did you test other numbers of input features? In section 3.1.1, you mentioned that using all the variables and using selected 10 variables showed no significant difference in wetland CH4. Is this strategy still reasonable? Maybe the strategy in Peltola et al., 2019 could be helpful. They calculated the feature importance of all the variables, but finally chose four variables, because that group achieved the best performance. (Peltola, O., Vesala, T., Gao, Y., Räty, O., Alekseychik, P., Aurela, M., ... & Aalto, T. (2019). Monthly gridded data product of northern wetland methane emissions based on upscaling eddy covariance observations. Earth System Science Data, 11(3), 1263-1289.)
- The workflow seems a little bit confusing to me. Please feel free to correct me if I misunderstood. It seems that the feature selection only included the variables you get from the MERA2 dataset. Why are the variables from remote sensing dataset excluded in the feature selection step, but directly added into the final RF model? Is that fair to all the variables?
- The final produced dataset is 0.098*0.098degree, but the spatial resolution of input datasets (e.g., MERRA2) is much lower. Similarly, the wetland extent dataset (WAD2M, GIEMS) also has lower spatial and temporal resolution. Will that lead to uncertainties in your final estimation? At least, some discussion of this issue should be added to the manuscript.
- L362-364: I think it is not surprising to see that groups (2) and (3) have lower accuracy, because they only contain features from soil wetness or NBAR, but missed the most important information from the features provided by MERRA2 (which you revealed in the feature selection part). Thus, if my understanding is correct, would it be more reasonable to set the input feature as MERRA2, MERRA2 + NBAR, MERRA2+SMAP, and compare them to MERRA2+ all RS data?
- Did you test uncertainties from MERRA2? Will the estimation and key findings be the same if using different reanalysis datasets?
- Figure 6: I am curious why DEM is the most important feature. You mentioned it highly correlated with air pressure, but the importance of air pressure is very low. Please share more explanation of the mechanisms of how DEM affects wetland CH4.
Minor comments:
- L183: The boundary of Arctic-boreal is not exactly the same as ‘>45 degree’. If your final dataset is >45 area, you cannot say it is Arctic-boreal region. Similar problems appeared several times in the manuscript. Please go through the whole paper and correct them.
- Are the important features the same at different sites? Or are they the same across different wetland types? Did you build separate models for different wetland types? Or use one model for all types?
- Vegetation activity showed significant impacts on wetland CH4 emissions in many previous studies, especially in the northern wetlands. Why not include proxies of vegetation (such as, LAI, GPP, …) into your feature selection?
- Figure 4: Why is monthly prediction much better than that of daily prediction, especially in terms of R2? Please add more explanation to the manuscript.
- Figure 7: For carbon-tracker, why did you use natural microbial emissions instead of wetland emissions? It seems that carbon-tracker also has an output layer of wetland CH4.
- What GCP models did you include in comparison? All the top-down and bottom-up models in Saunois et al., 2019? It would be better to give more information of what model did you used in the supplementary. Or at least, add the citation of GCP models.
- Figure 10: Why exclude WetCH4-GIEMS?
- Figure 8d: Please give more description of land and CALU data, and explain how you generate wetCH4-land and wetCH4-CALU, and why did you use them.
- Add citations: L78-80, L99-104.
- L921-928: Font style.
Citation: https://doi.org/10.5194/essd-2024-84-RC2
Data sets
WetCH4: A Machine Learning-based Upscaling of Methane Fluxes of Northern Wetlands during 2016-2022 Qing Ying, Benjamin Poulter, Jennifer D. Watts, Kyle A. Arndt, Anna-Maria Virkkala, Lori Bruhwiler, Youmi Oh, Brendan M. Rogers, Susan M. Natali, Hilary Sullivan, Luke D. Schiferl, Clayton Elder, Olli Peltola, Annett Bartsch, Amanda Armstrong, Ankur R. Desai, Eugénie Euskirchen, Mathias Göckede, Bernhard Lehner, Mats B. Nilsson, Matthias Peichl, Oliver Sonnentag, Eeva-Stiina Tuittila, Torsten Sachs, Aram Kalhori, Masahito Ueyama, and Zhen Zhang https://doi.org/10.5281/zenodo.10802154
Model code and software
WetCH4 Qing Ying https://github.com/qlearwater/WetCH4.git
Viewed
HTML | XML | Total | Supplement | BibTeX | EndNote | |
---|---|---|---|---|---|---|
727 | 187 | 48 | 962 | 56 | 40 | 44 |
- HTML: 727
- PDF: 187
- XML: 48
- Total: 962
- Supplement: 56
- BibTeX: 40
- EndNote: 44
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1