the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Imputation of missing IPCC AR6 data on land carbon sequestration
Abstract. The AR6 Scenario Database is a vital repository of climate change mitigation pathways used in the latest IPCC assessment cycle. In its current version, several scenarios in the database lack information about the level of gross carbon removal on land, as net and gross removals on land are not always separated and consistently reported across models. This makes scenario analyses focusing on carbon removals challenging. We test and compare the performance of different regression models to impute missing data on land carbon sequestration from available data on net CO2 emissions in agriculture, forestry, and other land use. We find that a gradient boosting regression performs best among the tested regression models and provide a publicly available imputation dataset [https://doi.org/10.5281/zenodo.10696654] (Prütz et al., 2024) on carbon removal on land for 404 incomplete scenarios in the AR6 Scenario Database. We discuss the limitations of our approach, its use cases, and how this approach compares to other recent AR6 data re-analyses.
- Preprint
(1956 KB) - Metadata XML
- BibTeX
- EndNote
Status: open (until 27 May 2024)
-
RC1: 'Comment on essd-2024-68', Thomas Bossy, 08 Apr 2024
reply
Review of Prütz et al:
The manuscript presents a new dataset that imputes carbon removal on land for 404 incomplete scenarios in the AR6 Scenario Database. The authors identify a gap in the existing literature for overcoming missing data in the AR6 Scenario Database and propose a solution to fill it. Although their method is imperfect, the dataset is clearly presented and easy to use. Therefore, it deserves to be published.
General comments on the dataset:
The dataset uses the same model and scenario names as in the AR6 database, making it very easy to use as a supplement for someone willing to fill in the missing data. However, as the authors themselves acknowledge, the Gidden et al. re-analysis is "considered more useful in terms of consistency and accuracy". Therefore, I see the interest of their dataset mainly for someone who needs land carbon sequestration for as many scenarios as possible. In that case, why not provide a dataset with all scenarios, and not just those where 'Carbon sequestration|Land use' is missing? This would make it even more convenient for everyone to work with all scenarios in the same file, and it would provide an adjusted version that also corrects inconsistencies found in the original database, with net removals being greater than gross removals.
Since the Gidden et al. reanalysis seems to be of better quality, would it be possible to add a paragraph discussing the feasibility/relevance (or not) of applying the same reanalysis to the imputed scenarios?
Specific comments:
l35. “While the AR6 re-analysis dataset by Gidden et al. manages to resolve several of the data issues linked to carbon removal on land, it still combines gross and net CO2 emissions on land in their land-based CDR variable, resulting in both positive and negative CDR values, which conflicts with the concept and clean definition of gross CDR.”
Is it related to the fact that it is already inconsistent in the original Database or is it due to the re-analysis itself?
Figure 1.
Would it be possible to add one sentence explaining why do re-analysis look so different than the other 2 curves? In particular, the mismatch between the points in 2020.
l71-74.
After reading carefully, I think I understood how the metrics were calculated. However, it was quite fuzzy at first. Would it be possible to add an illustrative figure showing what the four metrics correspond to?
l107-108. “Figure 3 suggests some variance in performance across these categories – for C8 scenarios, the drop in resemblance of the actual variable is most visible”
Isn’t it because the independent variable is mostly null or near zero for C8 (and C7) scenarios that the prediction is bad?
l120-121. “our imputed dataset does not account for perceived land sequestration related data issues in the AR6 Scenario Database beyond data availability”
I find the phrasing unclear.
Citation: https://doi.org/10.5194/essd-2024-68-RC1 -
AC1: 'Reply on RC1', Ruben Prütz, 22 Apr 2024
reply
Thanks a lot for taking the time to provide these thoughtful and constructive comments — they are much appreciated! Below, we respond to the general and specific comments point by point.
General comments:
Regarding a merged dataset of imputed and original scenarios:
We agree that our imputation dataset is especially useful for researchers who aim to include the largest possible set of scenarios in their analyses. We also agree that it would be very convenient for researchers to have all scenarios (original and imputed) together to keep the required data preprocessing as simple as possible. However, we are a bit reluctant to simply add the original scenarios to our dataset as this would simply be copy pasted from the AR6 Scenario Database (Byers et al. 2022), which is publicly accessible, but respective institutions and modeling teams hold the copyright. Nevertheless, we are eager to make our imputation dataset as easily accessible and usable as possible. Therefore, we plan to add a script to our revised manuscript that allows researchers to easily merge our imputation dataset with the AR6 Scenario Database to get the largest set of scenarios.
Regarding the reproducibility of the approach by Gidden et al.:
To our understanding, the OSCAR-based approach by Gidden et al. requires more comprehensive land use change data per scenario (compare Gidden et al. 2023) and is therefore restricted to 914 scenarios, which meet the data requirements. Our alternative approach does not have these requirements and can, therefore, be applied to a larger set of scenarios (n=783+404). Beyond the number of imputed scenarios, our approach does not allow for both positive and negative CDR values to be closely aligned with a clean conceptual definition of gross CDR - this differs from the OSCAR-based approach, as shown in Figure 1. We have already flagged this in the current version of the manuscript but aim to make it even clearer as part of the revisions.
Specific comments:
Regarding the positive and negative CDR values in Gidden et al.:
As this phenomenon is only shown by some models and scenarios in the reanalysis by Gidden et al., we suspect that this is at least partly driven by the properties of the original database. However, from the information at hand, we cannot say this with certainty. Gidden et al. would be better equipped to explain the underlying dynamics of the OSCAR model that might explain the conceptually unintuitive coexistence of both positive and negative “gross” CDR values.
Regarding Figure 1:
The mismatch in points in 2020 is due to different emission baselines, which have or have not been aligned across scenarios. This is partly already described in the manuscript (see lines 124, 35-42). We will try to make this clearer as part of the revisions.
Regarding the evaluation metrics:
Thanks for sharing this reflection. We will work on this and either provide a table or illustration to highlight what the evaluation metrics correspond to.
Regarding figure 3:
Generally, our regression approach seems to also reasonably impute values based on near zero net-negative AFOLU CO2 emissions. However, we agree that the very low levels of net-negative AFOLU in C7 and especially C8 may at least partly explain the drop in prediction performance, while emphasising that our prediction is still substantially more accurate in scale and shape resemblance than the formerly used net-negative AFOLU proxy. We will add a note on this to reflect on it in the revised manuscript.
Regarding the unclear paragraph:
L120-121 means to highlight that our approach cannot resolve underlying inconsistencies in the original dataset and mainly addresses the issue of lacking data availability. We will make sure to rephrase this as part of the revisions to make this clear and to avoid ambiguity.
Citation: https://doi.org/10.5194/essd-2024-68-AC1
-
AC1: 'Reply on RC1', Ruben Prütz, 22 Apr 2024
reply
Data sets
Imputation of missing IPCC AR6 data on land carbon sequestration Ruben Prütz, Sabine Fuss, and Joeri Rogelj https://zenodo.org/doi/10.5281/zenodo.10696653
Model code and software
Imputation of missing IPCC AR6 data on land carbon sequestration Ruben Prütz, Sabine Fuss, and Joeri Rogelj https://zenodo.org/doi/10.5281/zenodo.10696653
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
322 | 42 | 22 | 386 | 17 | 18 |
- HTML: 322
- PDF: 42
- XML: 22
- Total: 386
- BibTeX: 17
- EndNote: 18
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1