the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global Scenario Reference Datasets for Climate Change Integrated Assessment with Machine Learning
Abstract. The deepening of global climate change research and increasingly complex integrated assessment methods generate large amounts of heterogeneous data. The rapid development of artificial intelligence (AI) models, particularly large language models (LLMs) and deep learning techniques, has enhanced the ability to handle vast data, providing new approaches and perspectives for climate analysis. To address the demand for multi-dimensional and comparable scenario design in climate change prediction and policy simulation, this study employs hybrid machine learning techniques to collect and process scenario data from existing literature, developing the Global Climate Scenario Reference datasets (GCSR). The GCSR incorporates data from approximately 90,000 articles across multiple temporal and spatial scales and extracts approximately 53,185 scenarios. With its large scale, extensive coverage, and detailed classification, the GCSR provides a robust foundation for climate change prediction, risk assessment, mitigation policy, and adaptation strategy planning, supporting scenario design in related fields.
This preprint has been withdrawn.
-
Withdrawal notice
This preprint has been withdrawn.
-
Preprint
(1584 KB)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2025-299', Yang Ou, 15 Jun 2025
Wei et al present a valuable effort to develop a Global Climate Scenario Reference (GCSR) dataset using hybrid machine learning and large language model techniques. The authors have done an impressive job in collecting a vast amount of literature, extracting scenario-relevant information, and building a searchable database that could support the climate modeling and policy communities. The technical description of methods—from scenario extraction to semantic cleaning, keyword recognition, and topic classification—is detailed and appears rigorous. However, several aspects, especially related to the practical value and interpretability of the dataset, would benefit from further clarification.
One major concern relates to the nature of the keywords extracted by the ML models. As shown in Figure 4, many of the top keywords, such as “carbon,” “emissions,” “future,” “scenario,” “represents,” or even “high,” appear overly generic or linguistic in nature rather than offering deep insight into the content of a scenario. While these extracted terms may reflect high-frequency usage, they do not always help differentiate scenario narratives in a policy-relevant or disciplinary sense. Compared to author-provided keywords, which may be more targeted (e.g., “carbon pricing,” “renewable deployment,” “bioenergy with CCS”), the ML-extracted terms risk being semantically shallow. It would strengthen the contribution of the paper to more clearly demonstrate how the ML-based keyword extraction adds value beyond the original metadata—perhaps through examples where the system uncovers meaningful connections that conventional indexing would miss.
The results section, while methodologically rich, could be significantly enhanced by including concrete use cases. For example, it would be useful to see how a researcher interested in "carbon tax" scenarios could use the GCSR to find the top 10 most relevant articles or narratives. At present, the paper primarily demonstrates labeling and classification capabilities, but it falls short in showing how this system operates in practice. A few scenario search walkthroughs would help illustrate how the system assists users in navigating complex and fragmented literature. These applications are important, especially if the database aims to support scenario design or policy planning, as claimed.
Additionally, the spatial scale of the collected scenarios is not explicitly discussed. Since climate policy and scenario relevance often depend on spatial context—global, national, or subnational—it is important to clarify whether the GCSR provides regional tags or metadata. Can users search for mitigation strategies in China versus Sub-Saharan Africa? Do scenarios specify context-sensitive assumptions or are they abstracted from place? These questions are critical for users who work at the interface of regional policy and global climate modeling.
The positioning of the GCSR relative to established IAM scenario databases like the IPCC AR6 Scenario Explorer could also be articulated more clearly. While the paper mentions that existing datasets emphasize quantitative results and that GCSR focuses on narratives, the potential complementarity is not fully developed. The IPCC, for example, provides structured scenario classification systems like C1–C9 (based on climate outcomes) and P1–P4 (based on mitigation strategies), and it would be valuable to discuss whether similar crosswalks could be created between GCSR classifications and those IPCC categories. Doing so would not only help validate the extracted scenario dimensions but also offer users a richer, multidimensional perspective on scenario content that bridges qualitative and quantitative insights.
Lastly, the abstract and conclusions hint at broader applications, such as supporting prediction, risk assessment, and policy development, but these remain vague. Beyond indexing literature, what can the GCSR enable in terms of scenario co-design, participatory workshops, or identifying blind spots in existing scenario narratives? There is potential for this work to support innovative scenario generation or narrative-based model input creation, but that vision should be more clearly described. A fuller articulation of future use cases would help readers grasp the transformative potential of the GCSR and distinguish it from existing bibliometric or scenario archives.
A minor point: the technical sections (especially on BM25 and BERTopic) are sound but could be lightened with intuitive explanation for general ESSD readers.
Citation: https://doi.org/10.5194/essd-2025-299-RC1 -
RC2: 'Comment on essd-2025-299', Anonymous Referee #2, 05 Aug 2025
Review of “Global Scenario Reference Datasets for Climate Change Integrated Assessment with Machine Learning”
Summary and recommendation- In this paper, authors generate a dataset of climate change integrated assessment studies using a machine learning based approach. The authors approach combines usage of an LLM with clustering methods combined with quality control to generate a dataset that classifies several studies across different characteristics. While the paper is well written and I am generally supportive of using LLMs and other machine learning (ML) methods to understand datasets, I found the paper generally lacking a strong justification for publication in ESSD. I therefore recommend rejection of the paper in its current state. I have added detailed comments below that hopefully explain my decision. My major concerns are as follows-
- Novelty and utility relative to current studies- Firstly, the authors have used existing ML methods to “classify” rather than analyze existing papers on climate change integrated assessment. While this is somewhat useful, I believe this is no more than a classification exercise rather than actual data development. I think the utility of such a dataset to the community is rather overstated. There have been several papers that have used ML methods to understand drivers of climate effects (e.g. see here- https://www.nature.com/articles/s44168-025-00251-4) or papers that have used LLMs to evaluate claims related to climate change (e.g. - https://www.nature.com/articles/s44168-025-00215-8). Relative to the existing body of research, a classification exercise that the authors have performed while interesting does not justify publication of a dataset style paper.
- Treatment of the outcome variable as a discrete variable- One aspect of this paper that is especially problematic is that scenarios are classified as discrete i.e. they can belong to one group or the other. This largely ignores the high levels of multi-disciplinary efforts that go into integrated assessment modelling studies. For example, there are probably several studies on integrated assessment that address causes and impacts across several dimensions. In fact, I would say ML techniques would capture such heterogeneity inherent in the scenarios. Why would the results from the methods presented here be any different than a classification and regression tree (CART)?
- Comparison to other methods- Building on point number 2, how would this method compare to a simple classification algorithm since the end product is a classified dataset? Also, if text classification is the most important part of this analysis, then a simple tf-idf vectorizer would have provided the results the authors were looking for. In fact, a tf-idf vectorizer would provide a “score” as opposed to a simple classification. Also, features such as “duplication removal”, “text cleaning”, “high frequency word statistics” (which are mentioned in the paper) are all available within python packages for the tf-idf vectorizer. This is an important point to address since if this same dataset can be constructed using simple classification, it calls into question the need for such complexity.
- Utilization of existing LLM- A key part of this paper is the usage of an LLM (DeepSeek) to analyze the current body of scenarios. While this is not a problem by itself, since this is a large and prominent part of the paper (simply the usage of an existing tool), I do not see much value add beyond that. I acknowledge that the authors have tried describing DeepSeek’s usage in detail, but there is no way of evaluating the effect of the current weights in DeepSeek’s algorithm on the search results and would thus make the results presented here questionable or at the very least unreproducible.
- Evaluation of results (Lack of out of sample testing)- One very important part of the analysis in a paper which uses any kind of ML based methods is out of sample testing to ensure that there is no overfitting involved. I could not find any mention of out of sample testing to evaluate these methods. An example of how this could be conducted is to give the method a sample that it is not trained on to see if it can reproduce a classification.
- Lack of emphasis on the usage of Bertopic values- One part of the manuscript I did find intriguing was the Bertopic values. On examination in the final dataset, this seems to indicate some kind of continuous value. The interpretation of this variable should be explained in more detail. If the authors ever consider resubmitting this paper, they should focus on this variable rather than the simple text-based classification shown here.
Citation: https://doi.org/10.5194/essd-2025-299-RC2 -
RC3: 'Comment on essd-2025-299', Alaa Al Khourdajie, 27 Aug 2025
Global Scenario Reference Datasets for Climate Change Integrated Assessment with Machine Learning
Overview
This paper presents the ‘Global Climate Scenario Reference’ (GCSR) dataset, which was developed by applying machine learning techniques to automatically extract and categorise scenarios from approximately 90,000 scientific articles.
My review concludes that while the technical endeavour is considerable, the manuscript suffers from fundamental flaws at the conceptual, methodological, and scientific levels.
Major issues:
- Fundamental conceptual and terminological flaws
- Misunderstanding of scenarios: The authors repeatedly conflate projections with predictions (e.g., L27, L32, L52). This is a critical misunderstanding. Scenarios in this context are explicitly not predictions but are explorations of plausible, internally consistent futures, contingent on specific assumptions.
- Imprecise scope: The dataset is labelled as containing “climate scenarios” when it appears to be primarily composed of climate change mitigation scenarios from IAMs and related literature (L29, L56). This misrepresents the scope and utility of the dataset.
- Mischaracterisation of uncertainty and objectivity: The authors incorrectly frame scenario ensembles as a tool to capture “climate and economic systems uncertainties” (L62-63) and claim their larger dataset provides a “more objective” view (L72-76). Scenario ensembles explore parametric uncertainty within a specific framing, but they are not exhaustive and are deeply influenced by normative modelling choices. Creating a larger, un-curated collection does not confer objectivity.
- Weak and arbitrary methodological framework
- Unjustified classification scheme: The central organising principle of the work, classifying scenarios into “causes, impact, predictions, and governance”, is not conceptually motivated (L102, L117-129). The reference to the IPCC WGII report is insufficient justification for this specific four-part structure for classifying mitigation scenarios. The framework appears arbitrary and post-hoc.
- Contradictory logic: The authors claim their tool supports the “exploration of uncertainties” (L85-98) but then describe a filtering function that allows users to select specific scenarios. This function does the exact opposite: it narrows the view and strips away the ensemble context that is essential for understanding uncertainty.
- Lack of reproducibility and subjectivity: The workflow for literature collection and refinement appears to involve subjective choices that may not be fully reproducible (L130-137), undermining the claims of creating an objective reference dataset.
- Overstated contribution and lack of scientific vetting
- Absence of quality control: The most significant flaw is that the dataset appears to be a large-scale aggregation without the necessary scientific vetting, harmonisation, or quality control that makes established databases (like the IPCC AR6 scenario database) scientifically robust. Scenarios are not of equal quality, plausibility, or relevance.
- Unsubstantiated claims of significance: The paper makes unsupported claims about its utility. The assertion that it will “Enhanc[e] Scientific Decision-Making” and reduce “blindness and uncertainty” (L323-end) is conceptually flawed. Providing unvetted scenarios without context is more likely to increase confusion than reduce uncertainty. The claim that it will drive the “development of scenario design methods” is also asserted without evidence.
Detailed comments
- L27: the scenario literature is projections of various futures rather than predictions. Same applies to L32. This is fundamental to the understanding of the nature of these scenarios. L52 as well. Please apply throughout the manuscript.
- L29: the data generated from IAMs is for mitigation scenarios, rather than generic “climate scenarios”. Therefore, it should be GMSR dataset. The label “climate scenarios” is used again in L67. These are mitigation scenarios or climate change mitigation scenarios. Reading lines 100-105 it seems the focus is beyond IAMs.
- L56: IAMs simply generate mitigation scenarios. Climate governance is a different domain. Also, how IAMs are explained in this para is missing many details and nuances. It is not about giving an extensive intro to IAMs, but they are certainly beyond economics systems only (same for L63), and as for climate, the resulting scenarios are ran though climate emulators as post-processing. So the climate modules are not typically integral to the modelling / simulation process.
- L62-63: to be more specific, scenario ensemble, rather than scenarios in their own right, are used to explore the parametric uncertainty of mitigation scenarios generated using IAMs. Again, it is problematic to label these scenarios (same as the underlying models) as attempting to capture the climate and economic systems uncertainties. Please check O’Neill et al., 2020 quoted for accurate understanding.
- L67: “core hypotheses”? inaccurate description, and not used in the quoted reference. Please elaborate, what do you mean by blind spots and creative solutions? Looking at the quoted reference (Finch et al., 2024), what they meant by the point of blind spots is the exact opposite of what the authors in this manuscript imply.
- L72-76: “significant variations in research results” followed by “more objective”: Im not sure I can follow here. The hallmark of ensemble approach to scenario synthesis is allowing for variability in futures projections, without imposing any normative judgement as to what objective is. All of these scenarios are subject to biases in their underlying designs (choices by modellers) and tools (IAMs structures and calibrations). My understanding from the remainder of this paragraph is that the authors build a larger scenario database than AR6, a laudable endeavour, but still this does not imply objectivity in any meaningful way.
- L85-98: the authors expand the scenario space (beyond for instance what is captured in AR6 database) to a wider range of scenarios. This in principle expands the uncertainty spaces being explored. Then they allow for filtering down to specific scenario by users “guiding their exploration of uncertainties”. However, the filtering function is doing the exact opposite of what they claim. By filtering to one scenario (for instance) we are stripping the choices from the range of parametric uncertainties enabled in the ensemble. Thus far, the work seems to be 1) data collection and curation of large scenario ensemble, 2) labelling, classifying and categorising and 3) filtering to specific scenarios. It is more accurate to describe the work in such terms. Should the final steps follow different strategy that truly allow for exploring the uncertainty space, then the description can be different.
- The classification according to causes, impact, predictions and governance is not well motivated conceptually. The reference to IPCC, 2022 (L102) does not justify such a choice, as it refers to WGII AR6 report.
- The factors identified in L117-129 seems arbitrary. What are the conceptual underpinning here? Could the authors simple identify the key features from the data?
- Is the workflow L130-137 reproducible without undertaking any subjective choices? Which seems to be the case here.
- Again, the categorisation and words frequency (L299-308) are not conceptually well motivated. The most frequent words are not surprising, but their attribution to such categories is not clear.
- L323-end: the significance of the dataset is collecting, curating scenario, and allowing the use to filter for specific one. A major flow here is that these scenarios are not vetted, harmonised, infilled, and therefore can not be claimed to be equally robust scientifically. As for the 3rd point on significance “Enhancing Scientific Decision-Making”: This can not be supported due to the various reasons I list above. Reducing the “blindness and uncertainty” advantages are conceptually flawed at best. As for the 4th: how the availability of the database improves the development of scenario design methods? Is that the narratives, storylines, modelling tools, parameters?
Citation: https://doi.org/10.5194/essd-2025-299-RC3
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2025-299', Yang Ou, 15 Jun 2025
Wei et al present a valuable effort to develop a Global Climate Scenario Reference (GCSR) dataset using hybrid machine learning and large language model techniques. The authors have done an impressive job in collecting a vast amount of literature, extracting scenario-relevant information, and building a searchable database that could support the climate modeling and policy communities. The technical description of methods—from scenario extraction to semantic cleaning, keyword recognition, and topic classification—is detailed and appears rigorous. However, several aspects, especially related to the practical value and interpretability of the dataset, would benefit from further clarification.
One major concern relates to the nature of the keywords extracted by the ML models. As shown in Figure 4, many of the top keywords, such as “carbon,” “emissions,” “future,” “scenario,” “represents,” or even “high,” appear overly generic or linguistic in nature rather than offering deep insight into the content of a scenario. While these extracted terms may reflect high-frequency usage, they do not always help differentiate scenario narratives in a policy-relevant or disciplinary sense. Compared to author-provided keywords, which may be more targeted (e.g., “carbon pricing,” “renewable deployment,” “bioenergy with CCS”), the ML-extracted terms risk being semantically shallow. It would strengthen the contribution of the paper to more clearly demonstrate how the ML-based keyword extraction adds value beyond the original metadata—perhaps through examples where the system uncovers meaningful connections that conventional indexing would miss.
The results section, while methodologically rich, could be significantly enhanced by including concrete use cases. For example, it would be useful to see how a researcher interested in "carbon tax" scenarios could use the GCSR to find the top 10 most relevant articles or narratives. At present, the paper primarily demonstrates labeling and classification capabilities, but it falls short in showing how this system operates in practice. A few scenario search walkthroughs would help illustrate how the system assists users in navigating complex and fragmented literature. These applications are important, especially if the database aims to support scenario design or policy planning, as claimed.
Additionally, the spatial scale of the collected scenarios is not explicitly discussed. Since climate policy and scenario relevance often depend on spatial context—global, national, or subnational—it is important to clarify whether the GCSR provides regional tags or metadata. Can users search for mitigation strategies in China versus Sub-Saharan Africa? Do scenarios specify context-sensitive assumptions or are they abstracted from place? These questions are critical for users who work at the interface of regional policy and global climate modeling.
The positioning of the GCSR relative to established IAM scenario databases like the IPCC AR6 Scenario Explorer could also be articulated more clearly. While the paper mentions that existing datasets emphasize quantitative results and that GCSR focuses on narratives, the potential complementarity is not fully developed. The IPCC, for example, provides structured scenario classification systems like C1–C9 (based on climate outcomes) and P1–P4 (based on mitigation strategies), and it would be valuable to discuss whether similar crosswalks could be created between GCSR classifications and those IPCC categories. Doing so would not only help validate the extracted scenario dimensions but also offer users a richer, multidimensional perspective on scenario content that bridges qualitative and quantitative insights.
Lastly, the abstract and conclusions hint at broader applications, such as supporting prediction, risk assessment, and policy development, but these remain vague. Beyond indexing literature, what can the GCSR enable in terms of scenario co-design, participatory workshops, or identifying blind spots in existing scenario narratives? There is potential for this work to support innovative scenario generation or narrative-based model input creation, but that vision should be more clearly described. A fuller articulation of future use cases would help readers grasp the transformative potential of the GCSR and distinguish it from existing bibliometric or scenario archives.
A minor point: the technical sections (especially on BM25 and BERTopic) are sound but could be lightened with intuitive explanation for general ESSD readers.
Citation: https://doi.org/10.5194/essd-2025-299-RC1 -
RC2: 'Comment on essd-2025-299', Anonymous Referee #2, 05 Aug 2025
Review of “Global Scenario Reference Datasets for Climate Change Integrated Assessment with Machine Learning”
Summary and recommendation- In this paper, authors generate a dataset of climate change integrated assessment studies using a machine learning based approach. The authors approach combines usage of an LLM with clustering methods combined with quality control to generate a dataset that classifies several studies across different characteristics. While the paper is well written and I am generally supportive of using LLMs and other machine learning (ML) methods to understand datasets, I found the paper generally lacking a strong justification for publication in ESSD. I therefore recommend rejection of the paper in its current state. I have added detailed comments below that hopefully explain my decision. My major concerns are as follows-
- Novelty and utility relative to current studies- Firstly, the authors have used existing ML methods to “classify” rather than analyze existing papers on climate change integrated assessment. While this is somewhat useful, I believe this is no more than a classification exercise rather than actual data development. I think the utility of such a dataset to the community is rather overstated. There have been several papers that have used ML methods to understand drivers of climate effects (e.g. see here- https://www.nature.com/articles/s44168-025-00251-4) or papers that have used LLMs to evaluate claims related to climate change (e.g. - https://www.nature.com/articles/s44168-025-00215-8). Relative to the existing body of research, a classification exercise that the authors have performed while interesting does not justify publication of a dataset style paper.
- Treatment of the outcome variable as a discrete variable- One aspect of this paper that is especially problematic is that scenarios are classified as discrete i.e. they can belong to one group or the other. This largely ignores the high levels of multi-disciplinary efforts that go into integrated assessment modelling studies. For example, there are probably several studies on integrated assessment that address causes and impacts across several dimensions. In fact, I would say ML techniques would capture such heterogeneity inherent in the scenarios. Why would the results from the methods presented here be any different than a classification and regression tree (CART)?
- Comparison to other methods- Building on point number 2, how would this method compare to a simple classification algorithm since the end product is a classified dataset? Also, if text classification is the most important part of this analysis, then a simple tf-idf vectorizer would have provided the results the authors were looking for. In fact, a tf-idf vectorizer would provide a “score” as opposed to a simple classification. Also, features such as “duplication removal”, “text cleaning”, “high frequency word statistics” (which are mentioned in the paper) are all available within python packages for the tf-idf vectorizer. This is an important point to address since if this same dataset can be constructed using simple classification, it calls into question the need for such complexity.
- Utilization of existing LLM- A key part of this paper is the usage of an LLM (DeepSeek) to analyze the current body of scenarios. While this is not a problem by itself, since this is a large and prominent part of the paper (simply the usage of an existing tool), I do not see much value add beyond that. I acknowledge that the authors have tried describing DeepSeek’s usage in detail, but there is no way of evaluating the effect of the current weights in DeepSeek’s algorithm on the search results and would thus make the results presented here questionable or at the very least unreproducible.
- Evaluation of results (Lack of out of sample testing)- One very important part of the analysis in a paper which uses any kind of ML based methods is out of sample testing to ensure that there is no overfitting involved. I could not find any mention of out of sample testing to evaluate these methods. An example of how this could be conducted is to give the method a sample that it is not trained on to see if it can reproduce a classification.
- Lack of emphasis on the usage of Bertopic values- One part of the manuscript I did find intriguing was the Bertopic values. On examination in the final dataset, this seems to indicate some kind of continuous value. The interpretation of this variable should be explained in more detail. If the authors ever consider resubmitting this paper, they should focus on this variable rather than the simple text-based classification shown here.
Citation: https://doi.org/10.5194/essd-2025-299-RC2 -
RC3: 'Comment on essd-2025-299', Alaa Al Khourdajie, 27 Aug 2025
Global Scenario Reference Datasets for Climate Change Integrated Assessment with Machine Learning
Overview
This paper presents the ‘Global Climate Scenario Reference’ (GCSR) dataset, which was developed by applying machine learning techniques to automatically extract and categorise scenarios from approximately 90,000 scientific articles.
My review concludes that while the technical endeavour is considerable, the manuscript suffers from fundamental flaws at the conceptual, methodological, and scientific levels.
Major issues:
- Fundamental conceptual and terminological flaws
- Misunderstanding of scenarios: The authors repeatedly conflate projections with predictions (e.g., L27, L32, L52). This is a critical misunderstanding. Scenarios in this context are explicitly not predictions but are explorations of plausible, internally consistent futures, contingent on specific assumptions.
- Imprecise scope: The dataset is labelled as containing “climate scenarios” when it appears to be primarily composed of climate change mitigation scenarios from IAMs and related literature (L29, L56). This misrepresents the scope and utility of the dataset.
- Mischaracterisation of uncertainty and objectivity: The authors incorrectly frame scenario ensembles as a tool to capture “climate and economic systems uncertainties” (L62-63) and claim their larger dataset provides a “more objective” view (L72-76). Scenario ensembles explore parametric uncertainty within a specific framing, but they are not exhaustive and are deeply influenced by normative modelling choices. Creating a larger, un-curated collection does not confer objectivity.
- Weak and arbitrary methodological framework
- Unjustified classification scheme: The central organising principle of the work, classifying scenarios into “causes, impact, predictions, and governance”, is not conceptually motivated (L102, L117-129). The reference to the IPCC WGII report is insufficient justification for this specific four-part structure for classifying mitigation scenarios. The framework appears arbitrary and post-hoc.
- Contradictory logic: The authors claim their tool supports the “exploration of uncertainties” (L85-98) but then describe a filtering function that allows users to select specific scenarios. This function does the exact opposite: it narrows the view and strips away the ensemble context that is essential for understanding uncertainty.
- Lack of reproducibility and subjectivity: The workflow for literature collection and refinement appears to involve subjective choices that may not be fully reproducible (L130-137), undermining the claims of creating an objective reference dataset.
- Overstated contribution and lack of scientific vetting
- Absence of quality control: The most significant flaw is that the dataset appears to be a large-scale aggregation without the necessary scientific vetting, harmonisation, or quality control that makes established databases (like the IPCC AR6 scenario database) scientifically robust. Scenarios are not of equal quality, plausibility, or relevance.
- Unsubstantiated claims of significance: The paper makes unsupported claims about its utility. The assertion that it will “Enhanc[e] Scientific Decision-Making” and reduce “blindness and uncertainty” (L323-end) is conceptually flawed. Providing unvetted scenarios without context is more likely to increase confusion than reduce uncertainty. The claim that it will drive the “development of scenario design methods” is also asserted without evidence.
Detailed comments
- L27: the scenario literature is projections of various futures rather than predictions. Same applies to L32. This is fundamental to the understanding of the nature of these scenarios. L52 as well. Please apply throughout the manuscript.
- L29: the data generated from IAMs is for mitigation scenarios, rather than generic “climate scenarios”. Therefore, it should be GMSR dataset. The label “climate scenarios” is used again in L67. These are mitigation scenarios or climate change mitigation scenarios. Reading lines 100-105 it seems the focus is beyond IAMs.
- L56: IAMs simply generate mitigation scenarios. Climate governance is a different domain. Also, how IAMs are explained in this para is missing many details and nuances. It is not about giving an extensive intro to IAMs, but they are certainly beyond economics systems only (same for L63), and as for climate, the resulting scenarios are ran though climate emulators as post-processing. So the climate modules are not typically integral to the modelling / simulation process.
- L62-63: to be more specific, scenario ensemble, rather than scenarios in their own right, are used to explore the parametric uncertainty of mitigation scenarios generated using IAMs. Again, it is problematic to label these scenarios (same as the underlying models) as attempting to capture the climate and economic systems uncertainties. Please check O’Neill et al., 2020 quoted for accurate understanding.
- L67: “core hypotheses”? inaccurate description, and not used in the quoted reference. Please elaborate, what do you mean by blind spots and creative solutions? Looking at the quoted reference (Finch et al., 2024), what they meant by the point of blind spots is the exact opposite of what the authors in this manuscript imply.
- L72-76: “significant variations in research results” followed by “more objective”: Im not sure I can follow here. The hallmark of ensemble approach to scenario synthesis is allowing for variability in futures projections, without imposing any normative judgement as to what objective is. All of these scenarios are subject to biases in their underlying designs (choices by modellers) and tools (IAMs structures and calibrations). My understanding from the remainder of this paragraph is that the authors build a larger scenario database than AR6, a laudable endeavour, but still this does not imply objectivity in any meaningful way.
- L85-98: the authors expand the scenario space (beyond for instance what is captured in AR6 database) to a wider range of scenarios. This in principle expands the uncertainty spaces being explored. Then they allow for filtering down to specific scenario by users “guiding their exploration of uncertainties”. However, the filtering function is doing the exact opposite of what they claim. By filtering to one scenario (for instance) we are stripping the choices from the range of parametric uncertainties enabled in the ensemble. Thus far, the work seems to be 1) data collection and curation of large scenario ensemble, 2) labelling, classifying and categorising and 3) filtering to specific scenarios. It is more accurate to describe the work in such terms. Should the final steps follow different strategy that truly allow for exploring the uncertainty space, then the description can be different.
- The classification according to causes, impact, predictions and governance is not well motivated conceptually. The reference to IPCC, 2022 (L102) does not justify such a choice, as it refers to WGII AR6 report.
- The factors identified in L117-129 seems arbitrary. What are the conceptual underpinning here? Could the authors simple identify the key features from the data?
- Is the workflow L130-137 reproducible without undertaking any subjective choices? Which seems to be the case here.
- Again, the categorisation and words frequency (L299-308) are not conceptually well motivated. The most frequent words are not surprising, but their attribution to such categories is not clear.
- L323-end: the significance of the dataset is collecting, curating scenario, and allowing the use to filter for specific one. A major flow here is that these scenarios are not vetted, harmonised, infilled, and therefore can not be claimed to be equally robust scientifically. As for the 3rd point on significance “Enhancing Scientific Decision-Making”: This can not be supported due to the various reasons I list above. Reducing the “blindness and uncertainty” advantages are conceptually flawed at best. As for the 4th: how the availability of the database improves the development of scenario design methods? Is that the narratives, storylines, modelling tools, parameters?
Citation: https://doi.org/10.5194/essd-2025-299-RC3
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
591 | 72 | 26 | 689 | 14 | 19 |
- HTML: 591
- PDF: 72
- XML: 26
- Total: 689
- BibTeX: 14
- EndNote: 19
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1