Global soil NO emissions for Atmospheric Chemical Transport Modelling: CAMS-GLOB-SOIL v2.2

We present a dataset of global soil NO emissions comprising gridded monthly data and the corresponding 3-hourly weight factors, suitable for atmospheric chemistry modelling. Data are provided globally at 0.5°×0.5° degrees horizontal resolution, and with monthly time resolution over the period 2000-2018. Emissions are provided as total values and also with separate data for soil NO emissions from background biome values, and those induced by fertilizers/manure, pulsing effects, and atmospheric 5 deposition, so that users can include, exclude or modify each component if wanted. This paper presents the emission algorithms and their data-sources, some comments on the availability of soil NO emissions in other inventories (and how to avoid double-counting), and finally some preliminary modelling results and comparison with observed data. This dataset was constructed as part of the Copernicus Atmosphere Monitoring Service (CAMS), with the dataset referred to 10 as CAMS-GLOB-SOIL v2.2. These data are available through the Copernicus Atmosphere Data Store (ADS) system, (https: //doi.org/10.24380/kz2r-fe18, last access June 2021, Simpson 2021a) or through the Emissions of atmospheric Compounds and Compilation of Ancillary Data (ECCAD) system (https://eccad.aeris-data.fr/, last access June 2021). For review purposes, ECCAD has set up an anonymous repository where a subset of the CAMS-GLOB-SOIL v2.2 data can be accessed directly (https://eccad.aeris-data.fr/essd-surf-emis-cams-soil/, Last access July 2021, Simpson 2021b). 15

The EMEP model makes use of the so-called soil moisture index (SMI) which is available from the IFS model (ECMWF, 2021). Defining minimum and maximum soil water amounts to be the permanent wilting point (PWP) and field capacity (FC), 175 SMI is defined as: where SM is volumetric soil moisture, PWP is the permanent wilting point, and FC is the field capacity, all in m 3 m −3 .
SMI can be calculated in this way for each soil type in the grid, and then averaged to get a grid-average value which is more physically meaningful than a simple average over absolute volumetric soil moisture values. The SMI values used here ('SMI1') 180 are from the upper 7 cm of the soil.
Although it is simply impossible to take into account all the variability associated with heterogenous vegetation and soil types within a grid-cell, this SMI index should hopefully capture the main episodes of soil drying and effects on vegetation.

Soil temperatures
Although the IFS model does provide soil temperatures, we have simply used 2m air temperature for the current calculations.

185
There are two main reasons for this: (a) most importantly, this variable was easily available from the EMEP model system we were using, and (b) it is anyway difficult to interpret soil temperatures from a numerical weather prediction model in terms of ecosystem-specific values.
The latter point is important, as the relation between air and soil temperatures is complex, and depends upon the vegetation cover and its physiological state (e.g. LAI) over the year. Soil temperatures may be higher or lower than air temperatures, and 190 the many parameters required may depend on topography, soil texture, and soil water content -all of which may vary over short distances and even over different types of crops (e.g. Zheng et al., 1993;Brown et al., 2000;Kang et al., 2000;Plauborg, 2002;Tsilingiridis and Papakostas, 2014).

N-inputs, Fertilizer
Nitrogen inputs to ecosystems are a main driver for most N-related emissions. In agricultural areas, fertilizer application is the 195 main source of N (and sometimes nitrogen fixation). For semi-natural areas atmospheric N-deposition is a key input.
Maps of global fertilizer and manure inputs were estimated by Potter et al. (2010Potter et al. ( , 2011, for the period of around 2000-2007. These data were converted to maps of N availability with 0.5°×0.5°degrees spatial resolution and monthly time resolution for the HEMCO system (Keller et al., 2014, http://wiki.seas.harvard.edu/geos-chem/index.php/HEMCO, last acess June 2021).
These data were derived from N-inputs spanning the years 2000-2007, but with most emissions for the latter year (Potter et al.,200 2010). Hence we assigned these data a nominal year of 2005.
Scaling factors to get to other years were made by combining national year to year variations from the CEDS database (Hoesly et al., 2018) with global NH 3 emission from ECLIPSEv5a database (https://iiasa.ac.at/web/home/research/researchPrograms/ air/Global_emissions.html, last access June 2021) with the latter needed to allocate country codes to grids. For this first emis- sion estimate, where we only attempt monthly resolution of emissions, we adopted the simple procedure of allowing emission 205 rates to follow these monthly N-inputs.

N-inputs, atmospheric deposition
Estimates of atmospheric N-deposition are readily available from CTMs (Dentener et al., 2006;Simpson et al., 2014;Kanakidou et al., 2016;Schwede et al., 2018), though often for limited time periods or with coarser spatial resolutions than are used in CAMS81. For this work, estimates of atmospheric N-deposition were taken from the EMEP chemical transport model (Simp-210 son et al., 2012(Simp-210 son et al., , 2020b, as run for the Arctic Monitoring and Assessment Programme (AMAP) project (Whaley, 2021 It can be noted that there are large uncertainties in deposition estimates from all CTM models or indeed from observation-215 based estimates (Flechard et al., 2011;Schwede et al., 2011;Simpson et al., 2014;Vet et al., 2014;Theobald et al., 2019;Walker et al., 2020), but simple mass-balance should ensure that over the large scale the amounts deposited should be constrained by emissions.

Methods
The basic methodology merges methods from Yienger and Levy (1995)  We have aimed at monthly resolution for this study. One important reason is that many of the underlying data-sets have 230 monthly resolution, and even this has substantial uncertainties. Secondly, the most dramatic short-term variation with soil NO emissions is associated with pulses, and for reasons given in Sect. 4.4, estimation of the timing of such events cannot reliably be provided at this stage.

Calculation of F biome
The basic emissions algorithm for F biome is given by: where F biome is the background biome-based soil NOx flux (ng(N) m −2 s −1 ), A biome is a function of the biome-type, f (T, SMI) is a function of temperature and soil moisture index, and CRF is the canopy reduction factor accounting for NOxcapture by the vegetation canopy above the soil.
The biome emissions, F biome , are driven by the underlying land-cover data, biome factors (A biome ), and meteorological 240 drivers. Following YL95 and SL11, biome factors are given for dry and wet soils, with different temperature functions (f (T )) used for both. With the updated landcover used in v2.1, values of the emission factors were now taken directly from SL11, as tabulated in Table 2.
As seen from Table 2, we need to distinguish 'dry' from 'wet' soils. YL95 defined soils as being dry when the accumulated precipitation over the last 2 weeks was less than 1 cm, but subsequent authors have made use of NWP soil moisture data.

245
SL11 defined the threshold between wet and dry soils at 15% volumetric soil moisture, which for an average soil was said to correspond to midway between PWP and FC, i.e. to SMI=0.5. Figure 2 illustrates the fraction of time that grid-cells are defined as wet with this SMI=0.5 threshold. Although not identical to the results shown in Fig.7 from Steinkamp and Lawrence (2011), the results are similar. We therefore define soils with SMI>0.5 as wet, otherwise dry.
As with YL95 and SL11, crops are assumed to be irrigated, and so the Aw rates applied at all times through the growing 250 season. Defining this growing season is difficult for number of reasons though. This includes the wide variety of species, with different planting and phenological developments, and the possibility of multiple harvests in the same fields (e.g. Sacks et al., 2010;Mills et al., 2018). For this study we have made the simple assumption that the months in which fertilizer application rates (Sect. 4.2) are above the median values for any particular grid cell are those when crops are likely to be growing.

255
Estimates of NO emissions from fertilizer N-inputs are commonly defined in terms of fertilizer induced emissions (FIE) -the percentage of the applied N which is released as NO. Steinkamp and Lawrence (2011) used FIE of 1%, but such estimates vary widely. YL95 used 2.5%, Bouwman et al. (2002) estimated 0.7%, and in an update of that work Stehfest and Bouwman (2006) used 0.55 for agriculture and grassland (excluding legumes). For v2.2 we have used an FIE value of 0.7%. The final F Fert emissions for CAMS-GLOB-SOIL are then generated by applying this 0.7% factor to the annual global maps of N-inputs due to fertilizers and manures (Sect. 3.4). It should also be noted that these F Fert emissions emissions are sometimes, but not always, included in the agricultural sector of other emission data sets. The obvious risk of double-counting is addressed 265 in Sect. 6.

Calculation of F Ndep
As discussed in Sect. 3.1, N-inputs to soils from atmospheric deposition are estimated from monthly model results from the EMEP MSC-W chemical transport model. Emissions of NO are then estimated using the same re-release factor (0.7%) as used for fertiliser N-inputs. Given the large uncertainties in N-deposition estimates (e.g. Simpson et al., 2014) and relatively small 270 contribution of the F Ndep term, this approach seemed acceptable for the current soil emissions calculation.

Calculation of F pulse
Pulsing is the term used for the sudden emission of NO when soils that have been dry for some time are wetted. This release of NO is often of short duration. Both YL95 and SL11 used rainfall estimates in their approach to pulsing. In SL11 for example, if the accumulated precipitation was less then 10 mm in a gridcell during the last 14 days, and the precipitation then exceeds 275 1 mm ("sprinkle"), 5 mm ("shower") or 15 mm ("heavy rain") during one day, pulses of increasing magnitude and duration (3-14 days) were triggered. Using this methodology, SL11, found pulsing fractions to be between 12-20% across all the landcovers (with mean value of 17%). The BDSNP model of Hudman et al. (2012) used soil water changes to initiate pulsing, but they also ackowledged that the soil moisture variable used (θ) values had not yet been validated.
Although many studies suggest that pulsing is important, there is little evidence that such pulses can be accurately timed or 280 quantified in global or even European scale CTMs. Indeed, Yan et al. (2005) noted that large scale NWP models have trouble predicting the conditions needed for pulsing, commenting that the ECMWF model's data never reached a value low enough to trigger a pulse in tropical savanna regions. Tests conducted for v1.1 showed that the timing of pulses varies greatly from one method to another (e.g. precipitation or SMI-based, and for different definitions of 'dry' versus 'wet'), so for v1.1 the pulsing emissions were omitted.

285
As parameters such as volumetric soil water or the SMI used here cannot be verified, we have also explored some of the simpler rainfall-based approaches suggested in the literature. A very pragmatic methodology was devised for F pulse in v2.2. The occurrence of potential pulse events was counted using (i) a 14-day rainfall criteria (dry days were days with less than 1 mm rain per day, as long as SMI remained below 0.5), or (ii) changes in SMI of 0.01 after 3 days of SMI < 0.5 were counted. These criteria in themselves often suggested quite different monthly distributions of possible pulsing events. Instead of choosing, 290 both counts were simply summed, smoothed in time, and used as a normalising factor for the pulsing emissions. Firstly, the magnitude of annual emission was simply set to be 15% of the biome emissions set in Sect. 4.1 for each grid square where pulses were detected, loosely consistent with estimates by SL11.
Further work will be needed, for example based upon use of satellite soil moisture data and/or comparison to TROPOMI NO 2 data (Veefkind et al., 2012), to find an algorithm which could be used with some confidence with regard to pulsing.

Calculation of CRF
It is well established that some of the NO emitted from soils can react quickly with ozone, forming NO 2 . Some of this NO 2 is deposited within the canopy, reducing the emission of reactive N. YL95 used canopy reduction factors (CRFs) of between 0.25 for rain forests to 0.77 for Tundra, giving a global average of 0.53. These CRFs are very uncertain however, with Yan et al.  Table 2.

Tropical rainforests
The new land-cover data contains the category 'evergreen broadleaf forest' in Köppen-Geiger climates A&B, which was identified as 'rainforest' in SL11. As suggested by YL95, Steinkamp et al. (2009) and SL11, this tropical rainforest category receives special treatment, in that the temperature functions are not applied, and instead dry/wet emissions are a function of 305 season and not meteorology. Combined with the low CRF applies to rainforest the v2.1 and v2.2 emissions are then significantly reduced compared to v1.1 estimates. (We can note however that YL95 and SL11 differed greatly in the emission factors suggested for rain forests: YL95 suggested 8.6 and 2.6 ng(N) m −2 s −1 for dry and wet soils respectively, whereas SL11 suggest Although this definition may suffice for annual calculations, this procedure leads to a large step change in emissions at the equator. For this work we have calculated the five driest months from a 5-year climatology of gridded rainfall. This procedure produces a much smoother transition in emissions changes near the equator. Having applied the dry and wet season emission factors to this biome, we further apply a simple temporal smoothing to allow for the great uncertain in both the climatological 315 shifts in emissions behaviour.

Temperatures
In S18, soil temperatures (T s ) were estimated from air temperatures using simple empirical relationships, T s (C) = T a (C) + 5 for dry soils (following YL95) and T s (K) = 0.72T a (K) + 82.28 for wet soils (algorithm from the code base of the MEGAN system, Guenther et al. 2012). However, closer examination of these equations, and alternatives as used by YL95 suggested 320 by YL95, show some worrying features. For example, the MEGAN equations predict higher soil than air temperatures up to ca. 20 • C, but in many situations this cannot happen, and indeed T a should often be higher than T s . At 30 • C temperatures the MEGAN system predicts T s of 27.4 • C, whereas the Williams et al. (1992) equations used by YL95 would predict 28.9 • C for grasslands and 28.8 • C for forests -both close to air temperature. The ideal solution here would be to take T s from the ECMWF model for each type of landcover, but this solution was not readily available for the current calculations. As an interim solution 325 we simply assume that T s = T a , recognising that this needs to be improved in future methodologies.  (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018). These plots illustrate the strong spatial variations in soil NO emissions, and also 330 that the drivers vary markedly from region to region. For example, western European emissions are estimated to be strongly affected by the fertilizer-induced emissions, whereas in southern Africa or South America it is the biome component that strongly dominates. Atmospheric deposition is seen to be a relatively small contributor, but of course the relative contribution will increase away from agricultural source areas. Overall, year-to-year variations are not especially large, and trends are rather small.

335
Month to month variations in emissions are much more prominent, as illustrated in Fig. 5. Seasonal cycles are driven largely by temperature and associated wet/dry changes. The large contribution of F Fert to Western/Eastern European(EUR) emissions is also very evident, with largest F Fert emissions near the start of the growing season.     are 40% higher in v2.2 than in Weng et al. (2020), 53% for Russia, and 27% for East Asia. v2.2 emission estimates are substantially lower than Weng et al. (2020) for North Africa (21%) and South Asia (99%) The larger discrepancies for these regions probably reflects increasing difficulties with land-cover characteristics (e.g. savanna or sparsely vegatated areas) and with the increasing frequency and importance of dry conditions.

355
The global satellite-based (OMI) estimate of Vinken et al. (2014). suggests somewhat larger global emissions than v2.2 or SL11, but the uncertainty range (±3.9 Tg(N)/yr) cited in that study is likely low since the analysis depends also on the use of a chemical transport model (GEOS-Chem) in the analysis.   component similar to our F Fert , we hereby introduce F nonFert , such that: where F nonFert is then the sum of the F biome , F Ndep , and F pulse terms.
Estimates of 'anthropogenic' soil NO are also provided by a number of emission inventories used by models from the 365 CAMS system, including both the IFS model and EMEP MSC-W, and there are risks of both double-counting or omission of emissions when mixing CAMS-GLOB-SOIL with these other data sets. As will be shown below, many of the emissions derive their methods from the EMEP/EEA Emission Inventory Guidebook chapter on crop production and agricultural soils, so we present this first (Sect. 6.1), then for each emission data-set we present the status of soil-NO emissions, and a recommendation on how these data should be used with CAMS-GLOB-SOIL.

Soil NO emissions in the EMEP/EEA Guidebook
Within the Convention on Long-range Transboundary Air Pollution (LRTAP Convention), most countries mainly report NOx emissions due to agricultural activities using the EMEP/EEA Emissions Inventory Guidebook (Hutchings et al., 2019). The Guidebook provides methods for calculating soil-NO data from fertilizer and other inputs. Table S1 presents the main sources for which soil NO emissions ares covered by the Guidebook, and Table S2 presents   for France, and also provides emissions for a few countries where soil NO emissions are lacking in WebDab (TR, UA, BY), but on the whole the agreement is good. It therefore seems reasonable to equate the 'Fert' emissions of CAMS-GLOB-SOIL with these GNFR L emissions as provided to EMEP MSC-W. This also suggests however that we need to add the nonFert emissions from CAMS-GLOB-SOIL to provide the best soil-NO estimate for modelling.  Recommendation for EMEP/WebDab emissions: When using EMEP emissions derived from officially reported data (with soil NO emissions as given in GNFR L), for example in EMEP MSC-W reporting runs, retain the official GNFR L data, but add biome, N-dep and pulse emissions from CAMS-GLOB-SOIL.

CAMS-REG
The anthropogenic European emissions provided by CAMS-REG (Kuenen et al., 2021;Granier et al., 2019) deliberately 395 exclude soil-NO emissions, so as to avoid the risk of double-counting when used with CAMS-GLOB-SOIL (J. Kuenen, pers.comm., 2021). Thus, our recommendation is straightforward: Recommendation for CAMS-REG: Use GLOB-SOIL-NO directly when used with CAMS-REG.  Method [ii] should of course be the most consistent data-set, and both monthly and diurnal time-profiles are provided with the data-set, but more work to investigate the differences between the data-sets would be worthwhile.

ECLIPSE
The ECLIPSE inventories provided by IIASA (e.g. https://iiasa.ac.at/web/home/research/researchPrograms/air/Global_emissions. is to make use of satellite data (e.g. OMI, Tropomi) to look for, evaluate, and calibrate the CAMS-GLOB-SOIL emissions, though this task is challenging for many reasons. As a first step towards emissions evaluation, and to get a better idea of the importance of soil NO emissions, we can however compare model runs with and without soil-NO emissions to measurements from well-established surface networks.
In this section we presents some preliminary calculations of the impacts of soil-NO. The EMEP model (v4.42) has been run decrease, but these changes are very small (usually less than 0.5 ppt), and presumably reflect increased NO 2 loss in the more chemically active troposphere induced by the soil NO emissions.  (Tørseth et al., 2012), and comprise stations in rural areas, suitable for evaluation of the EMEP CTM. Inclusion of soil NO emissions is seen to improve almost all statistics, with 460 bias for N-compounds reduced significantly, but also correlation and IOA metrics are improved. Table 5 summarises the evaluations statistics for ozone from the global run at a number of stations. Observations are from the GAW network , and also comprise stations in rural and remote areas, suitable for evaluation of global CTMs. Here the results of adding soil NO emissions are seen to be more mixed. At the European stations we find similar responses to those discussed above, and especially improved R values at most sites (especially Payerne in Switzerland). For   values and also with separate data for soil NO emissions induced by fertilizers/manure, pulsing effects, and atmospheric deposition, so that users can include, exclude or modify each component if wanted.
It should be emphasised that all estimates of soil NO emissions are notoriously uncertain, since the emissions are driven by complex under-soil processes (microbial activity, pH, organic-C content, nutrients) rather than the simple meteorological and air quality variables which CTMs usually deal with, and there are very few data which can be used to evaluate such estimates.

480
For example, Davidson et al. (2000) suggested that although their review of data (covering many tropical ecosystems) clearly supported the assertion that nitrogen oxide emissions are related to rates of nitrogen cycling in ecosystems, a model based on these regression parameters will have only order-of-magnitude prediction accuracy. Further, the emissions can vary markedly with vegetation type, fertilizer type and agricultural management systems, and prior occurrence of biomass-burning (e.g. Skiba et al., 1997;Bouwman et al., 2002;Steinkamp and Lawrence, 2011).

485
For the CAMS-GLOB-SOIL datasets, we have here aimed at pragmatic solutions rather than sophistication, in order to set up a transparent initial framework, and to avoid over-parameterising a model in which many of the underlying datasets (e.g.  reflects difficulties in even identifying the timing of pulses, let alone the magnitude. There are also some puzzling differences 490 in the emission rates assigned to different land-cover by SL11, e.g. that the rates for mixed forest are lower than those of any deciduous or coniferous forest (cf Table 2). These differences presumably reflect a lack of measurement data, and this is a fundamental problem.
Future revisions to this data-set will hopefully include improved estimation of soil temperatures, inclusion of the impact of forest-fires, and generally more use of field data and satellite products to evaluate and constrain the estimated emissions. Author contributions. DS developed the soil NO emissions algorithms and generated the emission data, as well as writing most of this paper.
SD helped with conversion of MODIS land-cover data and with provision of the anthropogenic emissions used in the modelling, as well as with many technical issues associated with the ECCAD database.
Acknowledgements. The presented work was supported by project CAMS_81: Global and Regional Emissions funded within the Copernicus