Global anthropogenic CO2 emissions and uncertainties as prior for Earth system modelling and data assimilation

Global anthropogenic CO2 emissions and uncertainties as prior for Earth system modelling and data assimilation Margarita Choulga, Greet Janssens-Maenhout, Ingrid Super, Anna Agusti-Panareda, Gianpaolo Balsamo, Nicolas Bousserez, Monica Crippa, Hugo Denier van der Gon, Richard Engelen, Diego Guizzardi, Jeroen Kuenen, Joe McNorton, Gabriel Oreggioni, Efisio Solazzo, and Antoon 5 Visschedijk Research Department, ECMWF, Reading, RG2 9AX, United Kingdom Joint Research Centre of the European Commission, EC-JRC, Ispra, 21027, Italy TNO, Department of Climate, Air and Sustainability, Utrecht, 3584 CB, The Netherlands Correspondence to: Margarita Choulga (margarita.choulga@ecmwf.int) 10


S.1 Power industry emissions
Uncertainties calculated in this study are being used in the CO2 Human Emissions (CHE) project to produce an ensemble of simulation with perturbed emissions for emission sensitivity studies (McNorton et al 2020), and also as prior uncertainties in the future carbon dioxide (CO2) Monitoring and Verification Support (MVS) system (CHE, 2020;Janssens-Maenhout et al., 15 2020). In order to get most of the perturbation (e.g. using random noise) and inverse system techniques correct allocation of emission activity is needed. Main source of CO2 emission information in this study is Emission Database for Global Atmospheric Research (EDGAR) version 4.3.2_FT2015 (Olivier et al., 2016b;Janssens-Maenhout et al., 2019). Based on the comparison with regional data from the Netherlands Organisation for Applied Scientific Research (TNO) first version of their greenhouse gas (GHG) and co-emitted species emission database (TNO_GHGco_v1.1) EDGARv4.3.2_FT2015 energy 20 sector (ENE) emissions were divided into autoproducers (energy generated specially for industry) and the rest using can't be more than +3.0 %. In contrast, small plants operate based on day-to-day needs and their upper bound of uncertainty can reach up to +15.0 %. Baring this in mind it was decided to separate modified ENE (after relocation of autoproducer emissions) into two sub-sectors: (i) energy generated by the super power plantsmost emitting single located plant or average emitting and close located (fall into one grid-cell in the gridded ENE field) multiple plants (in total 30 grid-cells), and (ii) energy generated by the remaining (non-super) power plantsaverage emitting single or few close located plants. 35 First, all grid-cells of yearly ENE gridded field were ranked according to the energy flux from the highest to the lowest flux value. Next, all values higher than 7.9·10 -6 kg·m -2 ·s -1 were treated as fluxes generated by super power plants, all the rest as fluxes generated by average power plants. Finally, two new energy gridded fields were generated, ENE-SUP and ENE-OTH respectively.
Currently 30 grid-cells from 13 different geographical entities (i.e. 12 countries) of the initial ENE sector were moved to 40 ENE-SUP, representing 7.1 % (896.7 Mton) of the total ENE sector (12705.5 Mton). Top 3 countries that produce energy using super power plants are China, Russia and India. Usually the share of energy generated by super power plants for a country is ~15.0 %, exceptions are China where this share is 4.0 %, and Kuwait where this share is 72.4 %. Table S1 shows 30 grid-cell flux values, their ranks and geographical locations. Figure S1 shows graphical representation of these ranked 30 grid-cell fluxes, it also shows possible extension of grid-cell number used based on the step change in the grid-cell values. 45  Table S1), red colour represent grid-cells where energy is generated by super power plants (new ENE-SUP field), blue and green colours show possible extension of the new field based on the step change in the grid-cell values

S.2 Coal production emissions 55
Generation of electricity and heat worldwide relies heavily on coal, the most carbon-intensive fossil fuel. In IPCC (2006) it is suggested to neglect CO2 emissions from coal production if prescribed EFs and AD (Tier 1 approach) are used, because during this process is mainly emitted methane (CH4). IPCC-TFI (2019) suggests to take CO2 emissions from underground mines into account, as they are already known from the mine filtering equipment. In order to use prescribed EF and AD uncertainties we had to generate a coal production emission map (COL). Global grid-maps at 0.1º×0.1º horizontal resolution 60 of CH4 emissions from hard coal and brown coal 2012 production provided by Joint Research Centre of the European  (2019)) is that CO2 is emitted only during underground mining; CO2 emissions from surface mining are neglected.
First, hard and brown coal CH4 emission global fields had to be separated into underground and surface mining emissions. 65 Surface mines are usually represented by the large area (several touching grid-cells on a grid-map), underground minesonly by the mine entrance (one or maximum two touching grid-cells on a grid-map). For underground mining only values from grid-cells with 6 and more (up to 8) empty neighbouring grid-cells were used. Next, values from hard and brown coal fields are summed together and finally, translated from CH4 into CO2 emissions by multiplication by (5.9/18.0) value, result in kg·m -2 ·s -1 . 70 According to the newly generated CO2 emissions from COL map ( Figure S2   Main source of emission data in CHE_EDGAR-ECMWF_2015 is EDGARv4.3.2_FT2015, Table S3 shows the full list of differences between EDGARv4.3.2_FT2015 and CHE_EDGAR-ECMWF_2015. Monthly emissions for 2010 (i) Monthly scaling factor grid-maps derived from monthly EDGARv4.3.2; (ii) Monthly scaling factor applied to yearly CHE_EDGAR-ECMWF_2015 emissions

S.3 Details on the parameterisation of the lognormal distribution
A lognormal distribution is typically an accurate assumption for the model output form, where the uncertainty range is not symmetric with respect to the mean, even though the variance for the total inventory may be correctly estimated from where anthropogenic CO2 emissions per sector j; corr corresponds to the corrected uncertainty (i.e. corrected for the systematic underestimation of uncertainty calculated by the error propagation approach used in this study comparing to uncertainties calculated by using the Monte Carlo approach); UCEDGARj is in %.
In this study all calculations were performed for upper and lower uncertainty limits separately to preserve as much 105 information provided by IPCC (2006)   (3) and Eq. (4) (see Figure S3 for visual representation of these equations): where ln corresponds to logarithmic transformation of the distribution; resulting values are not absolutethey have signs! It should be noted that according to this methodology (with constants for 2.5 th and 97.5 th percentiles, +1.96 and -1.96 respectively, from the Z-table 1 ) the lower uncertainty half-range {[( ) ] } will always be less than 100.0 %.
} is approximately symmetric relative to the 0 (Gaussian distribution) 125 up to ~20.0 %, then has rather rapid growth till ~500.0 % (which with logarithmic transformation results in ~486.0 %), maxima at ~1350.0 % (which with logarithmic transformation results in ~582.6 %) and further gradual decrease. Next, these calculated uncertainty bounds were combined into 7 ECMWF groups.

S.4 Geographical treatment 130
The whole world in this study is presented in 242 geographical entities (i.e. 232 countries) over the land and 1 residual entity over the ocean (including seas). Each geographical entity represents part of the country (e.g. Isle of Man, Bermuda and Cayman Islands are different parts of the United Kingdom) or several countries merged together (e.g. Sudan and South Sudan or Netherlands Antilles and Bonaire, Sint Eustatius, Saba and Curacao).
Each entity reports its annual GHG inventory with anthropogenic emission budgets, uncertainties and trends. Residual entity 135 emissions are calculated from any activity (e.g. aviation, shipping, etc.) that took place over the ocean based on global country mask (international aviation and international shipping are explicitly taken into account in the residual entity emissions, not any specific country). Accuracy of these reported values strongly depend on statistical system development level of the entity. According to IPCC (2006) suggestions all entities are divided into two groupswith well-developed statistical systems (WDS) and with less well-developed statistical systems (LDS), and can be related to Annex I and Non-140 Annex I countries respectively, see Figure S4 for schematic representation of all world countries grouping. We made certain exceptions to this grouping: (i) far away territories of Annex I countries are treated as LDS countries (e.g. the United Kingdom is Annex I country meaning WDS, Bermuda is its part yet treated as LDS country because of its far 155 away geographical location from the main part of the United Kingdom); (ii) China is treated as WDS country, because quality of its GHG inventories has recently increased; (iii) India is treated as WDS country, because of its well-developed statistical infrastructure; (iv) Russian Federation is currently treated as LDS country, because completion of its GHG inventory has recently decreased. Table S3 shows all geographical entities involved in this study with their statistical system development level and countries main part. 160 should be noted, that for several geographical entity uncertainty aggregation (e.g. Europe (28 members till end 2019)) 170 emissions are considered to be fully uncorrelated, following the suggestion from IPCC (2006).  (2006)) used in EDGAR-JRC. Uncertainties are specified for countries with well developed (WDS) and less well-developed (LDS) statistical infrastructures. Upper and lower ranges refer to the 95 % confidence interval of the mean. No specification means that process or fuel type uncertainty was applied to all sectors. 180  Uncertainties from EDGAR-JRC dataset aggregated to the ECMWF group level were compared with the ones from 185 CHE_EDGAR-ECMWF_2015, see Table S6 for selected countries. Comparison showed that uncertainties derived in this study are an upper bound of the uncertainty estimation with more detailed information. Even though sometimes differences might be quite high in %, they are usually quite small in Mtons.  Competing interests. The authors declare that they have no conflict of interest.