A cultivated planet in 2010: 2. the global gridded agricultural production maps

Data on global agricultural production are usually available as statistics at administrative units, which does not give any diversity and spatial patterns thus is less informative for subsequent spatially explicit agricultural and environmental analyses. In the second part of the two-paper series, we introduce SPAM2010—the latest global spatially explicit datasets on agricultural production circa year 2010—and elaborate on the improvement of the SPAM (Spatial 15 Production Allocation Model) dataset family since year 2000. SPAM2010 adds further methodological and data enhancements to the available crop downscaling modeling, which mainly include the update of base year, the extension of crop list and the expansion of sub-national administrative unit coverage. Specifically, it not only applies the latest global synergy cropland layer (see Lu et al., submitted to the current journal) and other relevant data, but also expands the estimates of crop area, yield and production from 20 to 42 major crops under four farming systems across a global 5 arc-minute grid. 20 All the SPAM maps are freely available at the MapSPAM website (http://mapspam.info/), which not only acts as a tool for validating and improving the performance of the SPAM maps by collecting feedbacks from users, but also dedicates as a platform providing archived global agricultural production maps for better targeting the Sustainable Development Goals. In particular, SPAM2010 can be downloaded via an open-data repository (DOI: https://doi.org/10.7910/DVN/PRFF8V. IFPRI, 2019). 25

3 Changes in agricultural lands over time is as important as that over space, especially given that the changes in cropping pattern and crop yields are more frequent than that at the land cover level (Verburg et al., 2011). While there are four spatially explicit datasets on global agricultural production available around the year 2000 (Anderson et al., 2015), three of them, i.e. M3, MIRCA, GAEZ, are no longer available after 2000. Agricultural production systems are constantly changing, 70 and these changes are not trivial. However, a lot of recent agricultural and environmental assessments were still based on those maps produced decades ago (Deutsch et al., 2018;Nanni et al., 2019;Estes et al., 2018;Prestele et al., 2018;Erb et al., 2018;Porwollik et al., 2019;Yu et al., 2017b), suggesting that an update of existing global agricultural production maps is very desirable for subsequent analysis.

75
SPAM had committed to update maps in every five years (You et al., 2014;Wood-Sichra et al., 2016), which substantially fills the data gap and extends the work for global agricultural production mapping by operating a global gridscape at the confluence between earth and farming systems in multiple time stages. The SPAM model has become a critical tool to many initiatives within and beyond the Consultative Group for International Agricultural Research (CGIAR). Moreover, SPAM data are frequently downloaded and widely used by researchers and analysts from international originations, academia, 80 governments agencies all over the world. The global spatially explicit datasets in multiple time stages enable scientists as well as policymakers to better address the global change challenges within the anthroposphere and beyond, such as targeting agricultural and rural development policies and investments, increasing food security and growth with minimal The SPAM methodology was first developed in a trial project for six major crops in Latin America and the Caribbean by combining satellite imagery and crop statistics. Later on, it was used to derive regional estimates of spatially disaggregated crop production in Brazil and sub-Saharan Africa (You and Wood, 2006;You et al., 2009). Over the years the model has 105 evolved, adding more crops, using additional data and increasingly complicated optimization equations, as well as expanding to global coverage (You et al., 2014). The SPAM methodology is different from its counterparts. For example, M3 has no distinction across farming systems and is allocated proportionally (within each crop) to each grid cell within each subnational unit, hence the M3 dataset provides interpolated estimates of output by crop at the resolution of the satellite data ( Figure 1). SPAM not only considers the crop yield variation across farming systems but also assigns production weighted 110 by price to grid cells rather than pure proportionality (Donaldson and Storeygard, 2016). Moreover, MIRCA and GAEZ focus more on the biophysical aspects of agricultural production, while SPAM uses a triangulation of any and all relevant background and partial information, which not only include national or sub-national crop production statistics, satellite data on land cover, but also include maps of irrigated areas, crop potential suitability, secondary data on population density, market accessibility, cropping intensity and crop prices (Figure 2). 115 The SPAM model produces global gridded maps of agricultural production at a 5 arc-minute spatial resolution. The first SPAM maps, known as SPAM2000, represent global agricultural production circa year 2000 for 20 crops, with the exception of a few small island states and conflict zones (You et al., 2014). Subsequently, the SPAM maps have been updated every 5 years. SPAM2005 acts as an intermediate update which expands the coverage of crops from SPAM2000. The 42 crop 120 categories are further adopted in SPAM2010 ( Figure 1).

Model improvement for SPAM2010
There are three sub-modules in a standardized SPAM model: disaggregation, optimization and allocation. We conceptualize the SPAM2010 based on this general setting while adds further methodological and data improvements, which mainly 125 include the update of base year, the expansion of sub-national administrative unit coverage, the extension of crop list, and the substitution of the latest hybrid cropland map as the basic allocation layer. Considering the huge amount of input data and multiple year efforts, such an update is not trivial and will be critical for the user community. In this section, we briefly introduce the model structure and how these sub-modules are processed and connected.

Disaggregation
The first step for SPAM is to disaggregate crop statistics of agricultural production (e.g. the yield, harvested area, and total production) by administrative unit levels (k), crop type (j), and farming system (l) from coarser scale to finer scale (illustrated by orange shapes in Figure 2). For example, the national-level statistics are disaggregated into sub-national levels, statistics for crop aggregates are divided into individual crop types, and the crop statistics are further separated by 135 rainfed and irrigated conditions. Disaggregation is a non-spatial module. For the administrative unit (ADM), we consider three levels: k = 0 (national level), 1 (sub-national level 1), or 2 (sub-national level 2), and refer to the country-specific administrative level as the statistical reporting units (SRUs,SRU = k0,k1 or k2). In general, The SPAM model will have better performance if crop statistics are more disaggregated by ADM. Therefore, we prefer to collect crop statistics for ADM1 and ADM2, despite statistics are mostly available at the ADM0 level, and the sub-national coverage always being 140 less complete. Comparing to the previous SPAM products, the sub-national coverage percentage has increased markedly for SPAM2010, which is described in detail in Section 4.1.
We improve the model capacity in SPAM2010 as well: we simultaneously allocate 42 crops and crop aggregates (versus the 20 crops and crop aggregates in SPAM2000) and consider four farming systems for each crop (Figure 2). In SPAM2010, we 145 keep the farming systems conceptualized for SPAM2000, which have been approved useful to represent the different crop performances under different management systems, e.g. the irrigated yields of a particular crop are likely to be substantially different from the corresponding rainfed yields. The four farming systems are defined as:

6
The four conceptualized farming systems are mainly delineated by the water supply system and inputs used by farmers, despite global data on farming system shares for each crop being largely absent. For a small number of large countries, e.g.
Brazil, China, India, Russia, the United States (see more details in Section 4.1), we have data on farming system shares at the ADM1 level. For the other countries we first assign the national farming system shares to each ADM1 level, and then adjust individual ADM1 farming system shares in light of the supporting evidence. For example, if the national share for irrigation 165 of wheat was 30%, we assign that to all ADM1 units. Then we look at individual units, and if supporting evidence (e.g., the Global Map of Irrigation Areas (GMIA) data) indicates that there was no irrigated area present in a particular AMD1 unit, we set the irrigation share of wheat to zero in that administrative unit. Finally the farming system shares at national level are recalculated as the weighted average of the adjusted ADM1 estimates. For a few countries which have very limited data accessibility, experts may give their opinions. For example, it was often necessary to use farming system shares from one 170 crop as proxies for similar crops (e.g., farming system shares for beans are used for all pulses) or shares from one country and apply them to similar countries (e.g., the geographically smaller countries in the Middle East, including Kuwait, Oman and Qatar, are assigned the same farming system shares).
For irrigated farming systems, the crop-specific shares are derived by dividing the harvested area cultivated under full 175 control irrigation obtained from AQUASTAT, MIRCA, and country-level statistics by the overall harvested area. For rainfed farming systems, crop-specific shares are primarily estimated based on generalized assumptions for individual countries and crops. For example, all cereals in Western Europe are produced with high inputs, whereas 20% cereals in Sub-Sahara Africa are grown under a subsistence farming system. We also assume fertilization as a proxy for high-input use, so if irrigated crop areas and overall fertilized and non-fertilized areas of a crop are known, it is possible to deduce rainfed-high shares by 180 subtracting the irrigated areas from fertilized areas. The remainder of fertilized area will be then classified as rainfed-high Where: CE is the abbreviation for cross entropy, which is defined as the log function of probability. The difference between 195 {s ln s} versus {s ln π} means the estimated probability s and its prior probability π are minimized subject to certain constrains: (i) Constraint specifying the range of allocated physical area shares: (ii) Constraint specifying the sum of allocated physical area shares within a grid: 200 (iii) Constraint specifying that the sum of allocated physical area over all crops and farming systems within a grid should not exceed the actual cropland within the same grid: (iv) Constraint specifying that the allocated physical area by grid, crop and farming system should not exceed the 205 suitable area within the grid with corresponding crop and farming system: (v) Constraint specifying that the sum of allocated physical area over all farming systems within a sub-national unit should be equal to the sum of statistical physical area overall all farming systems within the corresponding subnational unit: 210 where: P is the set of commodities for which sub-national statistics exist.
(vi) Constraint specifying that the sum of allocated physical area under an irrigated farming system within the grid should not exceed the area equipped for irrigation in the grid: where: Q is the set of commodities which are fully or partly irrigated within grid i.
Shares sijl are the probability values between 0 and 1: where: AdjCropAjl is the total physical area of a given SRU for crop j at input level l to be allocated. AllocAijl is the area 220 allocated to grid i for crop j at input level l.
ijl indicates the decision to produce a particular crop under a specific production system, which is normally dependent on both biological and economic factors. However, subsistence farmers mainly grow crops for their own consumption, largely uncoupled from price, market access or crop potential suitability conditions. Therefore, we first assume the prior allocation 8 for subsistence physical area ( ̅̅̅̅̅̅̅̅̅ ) in grid i by crop j under this circumstance is simply dependent on rural population density: where: AdjCropAjkS is the generated physical area for crop j at subsistence farming system for the given SRU k, and AggRurPopi is the rural population density at grid i (see the detailed description in the following Section 4). 230 Then for the three remaining farming systems, we assume the potential unit revenue of planting a certain crop (Revijl) would affect farmers' crop choices: where: AdjCroplandi, Pricej, Accessij and PotYieldijl are the adjusted cropland area, market price, accessibility parameter and 235 potential yield values for crop j in farming system l and grid i (see the detailed description in the following Section 4).
Then we assume the priors for the remaining three farming systems are mainly influenced by the estimated revenue, cropland area and irrigated area.
For an irrigated farming system (I): 240 For a rainfed-high (H) and/or a rainfed-low (L) farming systems: where: AdjCropLandi and AdjIrrAreai are the cropland area and irrigated area at grid i (see the detailed description in the following Section 4). 245 Finally, the main inputs for the optimization procedure are converted to shares and written as: The optimization module in SPAM2010 is almost the same as that in previous versions. We apply the cross-entropy process 250 in the General Algebraic Modeling System (GAMS), which ensures the optimization procedure iterates until a solution is found. Once the allocation is successful, meaning that an optimal or locally optimal solution has been found, the routine immediately returns the allocated physical area (AllocAijl) by grid i, crop j and farming system l, and the program continues with post processing automatically ( Figure 2). If the solution is infeasible or non-optimal, the program stops, allowing for manual scrutiny, adjustment and re-run (see data harmonization in the following section).

Allocation
Using the results of the optimization, the allocation module produce maps of harvested area (AllocHijl), yield (AllocYijl), and production quantity (AllocPijl) for each grid i by crop j and farming system l ( Figure 2). For harvested area, we convert the allocated physical area (AllocAijl) to allocated harvested area (AllocHijl) by multipling by cropping intensity 260 (CropIntensityjlk): For yield, we first calculate an average potential yield ( ̅̅̅̅̅̅̅̅̅̅̅̅ ) within an SRU using the allocated harvested area as weight, then the allocated yield (AllocYijl) is estimated as: 265 where the average potential yield is calculated as: We finally estimate the production quantity (AllocPijl) as:

Data preparation for SPAM2010
The largest amount of effort to create a SPAM map is spent on identifying, collecting and harmonizing data. For the production of SPAM2010, we collect raw data from two major sources: we first collect non-spatial crop statistics for the data disaggregation process; we then collect and/or create multiple spatially explicit constraint maps at a 5 arc-minute resolution 275 from both biophysical and socioeconomic aspects for the spatial optimization and allocation processes. Afterwards, we introduce how these multi-sourced data are harmonized and how data adjustment is taking place.

Crop statistics disaggregated by administrative units 280
We start with the administrative units (k) for which we have been able to obtain crop production statistics ( Figure 2). We primarily used FAO's Global Administrative Unit Layers (GAUL) at both the national and sub-national levels to relate the tabulated crop statistics to gridded data during the allocation process. GAUL contains shapefiles for three administrative level units: ADM0 (national level 0), ADM1 (sub-national level 1) and ADM2 (sub-national level 2). Shape files from the Database of Global Administrative Areas (GADM) are used for ADM1 and ADM2 in China, since they proved to be easier 285 to match to the statistics.
We collect crop statistics from FAOSTAT, EUROSTAT, CountrySTAT, ReSAKSS, national statistical offices, ministries of agriculture or planning bureaus of individual countries, household surveys and a variety of ad hoc reports related to a particular crop within a particular country (Figure 2). SPAM estimates are most dependent on the degree of disaggregation 290 of the underlying national and sub-national production statistics, so it is important to identify and collect as many subnational statistics as possible (Joglekar et al., 2019). Although we prefer to collect crop statistics for ADM1 and ADM2 and run the model at the ADM1 level for all countries, unfortunately, crop statistics are mostly available at the ADM0 level, the sub-national coverage being less complete. Therefore, for most countries we run SPAM at an ADM0 level, except for some (geographically) large countries that are modeled at an ADM1 level. We summarize the sub-national data coverage by 295 region in Table 1. We present the detailed procedure for collecting crop statistics in the Supplementary Information (SI, Section S1), which further contains a table listing all countries that are modeled at an ADM1 level (Table S1) and a table listing the sources of crop statistics by country and sub-national coverage (Table S2) for all countries.

here)
We collect data in all the ADM1 units in the United States, Russia and Canada, and at least 80% of the ADM1 units for the rest regions worldwide. While Europe, Middle East, Oceania, Russia and Sub-Saharan Africa have data collected on the full set of crops in below 80% of their ADM2 units. This coverage is substantially improved for SPAM2010 than that for 305 SPAM2005, which are only 66.2% and 43.2% for ADM1 and ADM2 respectively (Table 1). Monfreda et al. (2008) reported that 81% of the year 2000 global harvested area data in their M3 came from sub-national sources, but it does not distinguish coverage by sub-national levels 1 and 2. SPAM often has higher levels of sub-national coverage than M3, especially in Africa and the former states of the Soviet Union. This can be seen in SPAM2005, e.g., 310 93.4% of global data came from ADM1 sources and 54.6% from ADM2 sources (Wood-Sichra et al., 2016). While in SPAM2010, such coverage rates are further increased to 96.1% and 68.0% respectively (Table 1).

Crop statistics disaggregated by crop types
We simultaneously allocate 42 crops and crop aggregates (j) for SPAM2010 ( Figure 2). The crop categories are driven by 315 FAO's Statistical Database (FAOSTAT)'s definitions. Comprised of 33 individual crops (e.g., wheat, rice, maize, barley, potato, bean, cotton) and 9 crop aggregates (e.g., other cereal, vegetables), the SPAM2010 crop list covers all crops reported by FAO, except for explicit fodder crops (mostly grasses) which are not modeled. When multiple FAO crops fall into a single SPAM2010 crop category (e.g., vegetables), FAO's corresponding area and production data was summed up and yields were calculated as a weighted average. We present the detailed procedure for aligning the crop types in the SI (Section 320 S2), which further contains a full list of crops and their respective FAO code (Table S3).
We collect statistics on harvested area (H), production (P) and yield (Y) (CropHPY) by each crop j in each administrative unit k for data disaggregation (Figure 2). We prepare data for the model based on the 2009-2011 average of the crop production statistics (AvgCropHPYjk). If data is missing from this time period, we use the average from the available data 325 spanning the closest years between 2005 and 2015. We make corrections for discrepancies in statistical reporting units, crop names, and units of measurement during the initial cleaning phase of the data. For example, we adjust all national and subnational statistics (AdjCropHPYjk) using the national 2009-2011 average from FAO, in order to improve the comparability of the crop production statistics across countries, we explicitly distinguish between crops not grown in an area (coded as zero) and crop data that is not available for an area (coded as a missing value). Despite the possible uncertainties in FAO data, it 330 has been chosen as the baseline in the adjustment of country statistics mainly because: (1) FAO data is the most widely acknowledged global agricultural statistics, hence it is the most appropriate source for the purpose. (2) SPAM products have been used by many global models such as IFPRI's IMPACT (https://www.ifpri.org/project/ifpri-impact-model), IIASA's GLOBIOM (https://iiasa.ac.at/web/home/research/GLOBIOM/GLOBIOM.html). These models use FAO country data for cross-country comparisons and they need our maps to be consistent with FAO data. In fact, the idea of conceptualizing 335 SPAM is to spatially allocate statistics from administrative units to spatial grids, and the maps could be easily adjusted to any other country data. We present the detailed procedure for adjusting the crop statistics in the SI (Section S3).

Crop statistics disaggregated by farming systems
We elaborated the disaggregation module for obtaining the farming system shares by crop j and administrative unit k 340 (Percentjlk) in Section 3.1. In some countries there are statistics, in others experts may give their opinions, or assumptions are made as to how some crops are grown in a similar way as other crops. In supplementing to Section 3.1, we present more details on the procedure for obtaining the farming system shares in the SI (Section S4), which further contains a table listing the sources of sub-national farming systems data (Table S4) and a table listing the farming system shares by crop groups and selected countries (Table S5). For example, shares of irrigated farming system were taken directly for country statistics like 345 Brazil, China and the United States, at ADM1. For some countries these figures were found in MIRCA and yet for the rest of the countries AQUASTAT provides information on irrigated areas per crop at the national level. We are able to source data on farming system shares at the ADM1 level for limited large countries (Table S4). Based on this list we showcase the shares of production under irrigated and rainfed systems for selected crop groups and countries (Table S5). We choose Brazil, China, Ethiopia, France, India, Indonesia, Nigeria, Turkey and the United States, because they vary in agro-ecology, region, 12 income level and geographical size. For cereal crops, the three Asian countries (China, India and Indonesia) have the highest shares of irrigated area, whereas the two Sub-Saharan countries (Ethiopia and Nigeria) have the lowest shares of irrigated area. For roots, tubers and pulses production, the United States and both European countries have the highest shares of irrigated areas, while the Sub-Saharan countries again have less than one percent each. Aggregating across all crops, the three Asian countries rank highest in terms of irrigated area shares while the two Sub-Saharan countries rank lowest. 355 We disaggregate the adjusted statistics on harvested area and yield (AdjCropHPYjk) for each of the four farming systems ( Figure 2). Harvested area by farming system l (AdjCropHjlk) is directly calculated by multiplying the farming system shares (Percentjlk), while the yields by farming system l (AdjCropYjlk) are more complicated to calculate. Here we not only consider the farming system shares but also the yield conversion factors (determined by expert judgement) to distinguish the yield 360 variations for irrigated versus rainfed systems and rainfed-high versus rainfed-low systems. We present the detailed procedure for disaggregating the crop statistics by farming systems in the SI (Section S5), which further contains a list of the yield conversion factors, i.e. both the factor of crop yield under irrigated versus crop yield under rainfed (with a "I") and that of yield under rainfed high input versus yield under rainfed low input (with a "R"), for selected crops and countries (Table   S6). 365

Physical area
We create a new variable-physical area (AdjCropA, i.e., the area footprint of the crop irrespective of the number of times per year the same area was planted and harvested)-for the model, recognizing that crop production may take place over several seasons within a year. SPAM does not have a direct mechanism for modeling sequential or intercropping processes, 370 and thus we use harvested area and cropping intensity (CropIntensity) per crop as a proxy for these processes: Where AdjCropAjlk indicates the generated physical area by crop j, farming system l and administrative unit k.
Implementing the crop allocation calculations by farming system enables more flexibility when accounting for variation in 375 these cropping intensity practices. However, such data is still scarce. Only some country statistics have such figures, e.g., Bangladesh and India, thus we rely primarily on expert judgment to seek information on the number of cropping seasons by crop, farming system and country. We present the detailed procedure for generating physical area in the SI (Section S6), which further contains a table listing CropIntensityjlk by crop groups and selected countries (Table S7).

Cropland extent
We apply an already classified land-cover image-where cropland has been identified (CropLand)-to determine the places where production statistics can be allocated. Comparing to SPAM2000 and SPAM 2005, SPAM2010 not only updates the statistics but also the cropland distribution: it uses the global cropland synergy map with spatial resolution of 500 m circa 385 2010, jointly produced by CAAS and IFPRI (Figure 1). The CAAS-IFPRI cropland dataset fuses national and sub-national statistics with multiple existing global land cover maps including GlobeLand30, CCI-LC, GlobCover 2009, MODIS C5 and Unified Cropland. It reports three major parameters by grid around the year 2010: the median and maximum cropland percentage (MedCropLandi and MaxCropLandi) and a confidence score between 0 to 1 in the cropland estimation (ProbCropLandi). Although the synergy dataset does not delineate the geography of specific crops, it designates the total 390 cropland extent with a higher accuracy than the input datasets and tries to be consistent with administrative cropland statistics. The detailed description of the CAAS-IFPRI cropland dataset is submitted as a parallel paper, see Lu et al. (2020).
Before using the cropland extent in SPAM2010, we aggregate the cropland synergy map from 500 m grid cells to 5 arcminute grid cells for the three major parameters. We present the cropland data preparation in the SI (Section S7), which further contains the resampled maps on median cropland (AggMedCropLandi), maximum cropland (AggMaxCropLandi) and 395 cropland confidence (AggProbCropLandi) ( Figure S1).

Crop potential suitability
We estimate the crop suitable area (SuitArea) from GAEZv3.0 to consider the spatially varied, potential suitability for different crops in terms of different thermal, moisture and soil requirements as an allocating parameter. GAEZv3.0 produces 400 a 5 arc-minute gridded suitability index for 49 major crops, four input levels (i.e., high, intermediate, low or mixed) and two main water regimes (i.e., irrigated or rainfed). The major crops surveyed by GAEZ include most of the SPAM2010 cropsthose not included are assigned values from similar GAEZ crops. We present the detailed procedure for estimating the suitable area (SuitAreaijl) for grid i, crop j and input l in the SI (Section S8), which further contains a table illustrating the concordance between GAEZ crops and SPAM2010 crops (Table S8), and maps of suitable areas for maize irrigated, 405 rainfed-high and rainfed low farming systems ( Figure S2).

Irrigated area
We adopt the irrigated area (IrrArea) from the Global Map of Irrigation Areas (GMIA) to consider the share of irrigated area within a grid as an allocation parameter. GMIAv5.0 is the only irrigated area dataset with global coverage, which estimates 410 the amount of area equipped for irrigation at a 5 arc-minute resolution for the period around 2005 (Siebert et al., 2013).
GIMAv5.0 does not include information on the functionality or quality of irrigation equipment and makes no distinctions between different types of irrigation which may introduce errors and inconsistencies into the allocation. We present a map of area equipped for irrigation at the grid level (IrrAreai) in Figure S3 in the SI (Section S9). 415

Protected area
We select the protected area (Protect) from the World Database on Protected Areas (WDPA), released by the International Union for Conservation of Nature (Deguignet et al., 2014), as an allocation parameter to indicate the locations where crop production is least likely to take place. Notionally, crop production does not occur within protected areas (such as national parks, wilderness areas and nature reserves), but in reality it does. During the initial allocation process SPAM allows for crop 420 allocation in protected areas to allow for this reality, but if the model does not solve, one option is to increase the area designated as cropland, suitable land or irrigated land. That expansion is not allowed into protected areas. The data is originally in a polygon format. We convert it to 5 arc-minute grids (Protecti) and map it in Figure S4 in the SI (Section S10).

Accessibility 425
We adopt the population count from the Gridded Population of the World (GPWv4.0) as a proxy to consider the influence of market accessibility (Access) on farmers' crop choices (Equation 10). GPWv4.0 provides a gridded representation of human populations across the globe at a 30 arc-second resolution (CIESIN, 2016). For SPAM 2010, we aggregate the population count grid to a 5 arc-minute resolution and re-calculated the population density. Then we derive rural population density (AggRurPopi) based on the assumption that if there is cropland within the 5 arc-minute grids, then the population residing 430 within the grids should be rural people. We do not aim to distinguish rural area from urban area. Instead, the variable AggRurPopi is introduced to estimate the market accessibility and to account for subsistence production. Therefore, it does not mean the accessibility of getting food. As crop-specific revenue is divided by the total revenue within a pixel (Equation 11 and 12), the prior is not affected by market accessibility if it is not crop-specific. In other words, crop-specific market accessibility is preferable for the current SPAM model. Such accessibility doesn't exist now. We create a measure of market 435 accessibiliity (Accessi) from the grid level estimates of rural population by considering the relationship between AggRurPopi and maximum and minimum rural population densities within a country. Population in grids with no cropland are not used in further calculation. We present the detailed procedure for measuring Accessi in the SI (Section S11), which further contains a map of AggRurPopi ( Figure S5) and a table of minimum and maximum rural population densities in select countries (Table   S9). 440

Crop revenue
We measure the crop potential revenue (Rev)-determined by market accessibility (Access), crop prices (Price) and crop potential yield (PotYield)-as an allocation parameter, which fully considers the influence of farmers' crop choices. We adopt the crop-specific prices (Pricej) from FAO's Gross Production Value. Prices for crop aggregates (e.g., tropical fruit) 445 are calculated as a weighted average from FAO world totals. It is important to note that these are not spatially-specific prices, and they likely misrepresent the local economic realities and associated cropping choices faced by farmers. We list the crop prices in Table S10 in the SI (Section S12). We estimate the crop-specific potential yield (PotYieldijl) as a composite measure of potential harvested yield (PotHarvYieldijl) based on GAEZ. We present the detailed procedure for estimating PotYieldijl in the SI (Section S12), which further contains a table listing the dry matter yield conversion factors (Table S11). 450 Finally, we calculate the grid-level potential unit revenue of planting a certain crop according to Formula 1.

Adjusting input data
We list the main input variables for SPAM2010 in Table 2. As we collect data from various sources, it might inevitably 455 cause information inconsistancies. Therefore, we set rules to harmonize all these data. At the beginning, we adjust all the area-related parameters (e.g., cropland area, irrigated area and suitable area) to satisfy the constraints at the administrative unit level before calculating the priors of physical area. When the model runs, it might be unable to find the optimal allocation solution for a particular country, administrative unit or crop. Under these circumstances, we set several options to "force" a solution, including adjusting the entropy conditions, and adjusting the data harmonization rules. We elaborate on 460 the details for adjusting areas (Section S13), entropy conditions (Section S14) and harmonization rules (Section S15) respectively in the SI. (insert Table 2 here) 465

Adjusting allocation results
The model produces the allocated harvested area (AllocHijl), the allocated yield (AllocYijl), and the production quantity (AllocPijl) for SPAM2010. As a final step, we need to adjust the allocation results in order to keep the grid level results consistent with the statistics. In each step of estimation, we scale the results to the national 2009-2011 FAO average 470 (AvgCropHPYjk0) by crop j and country k0 to even-out potential inaccuracies introduced by the allocation adjustments. This means all the allocated results in this subsection could be adjusted (if necessary) before being applied in the next phase.
We first scale the allocated harvested area (AllocHijl) to the national FAO average to even-out potential inaccuracies introduced by the allocation adjustments: 475 Total harvested area of each crop in the grid was calculated by summing estimates across the four farming systems: For yield, we begin with the potential harvested yields (PotHarvYieldijl) developed earlier (see Section S12 in the SI). 480 Missing values were filled in sequentially using the following values in order of availability: i. Potential yield from potential suitability surfaces: ii. Average potential yield in SRU: Then we modify the allocated yield (AllocYijl) according to the minimum and maximum yields in the administrative unit: For production quantity, we scale the AllocPijl to the national FAO average: 495 Then we calculate the total production in the grid by summing overall production levels: Finally, we re-calculate the allocated yield from the allocated harvested area and allocated production to effectively scale 500 yields to the national FAO average: To simplify, the grid-cell yield is calculated from the reported yield at statistical reporting unit, the allocated area from model results and the potential yield (at grid-cell level) from GAEZ (global Agroecological Zone). These are illustrated in 505 Equations 15 and 16: The spatial variation of yield within a statistical reporting unit follows the same spatial variation of the potential yield of that crop. In other words, the more suitable (higher potential yield) cells would have a relatively higher yield while the average yield of all the grid cells would be equal to the statistically reported yield of the administrative unit.

Results
In this section, we briefly showcase some of the main SPAM2010 results, which mainly focus on the staple crops, to 510 illustrate how SPAM2010 has been produced.

Disaggreaged crop statistics
Disaggregation of crop statistics is the first step for running the SPAM model. Table 3 summarizes the disaggregated rice harvested area and yield (area-weighted) for global rice production by four farming systems in SPAM2010. At the global 515 level, the world has harvested about 160 million ha. of rice around year 2010. The majority of rice production area is irrigated, i.e., about 98 million ha., which accounts for 61.2% of the total rice harvested area. This share is followed by high input rainfed farming system (17.3%, approximately 27 million ha.), subsistence farming system (16.0%, approximately 26 million ha.) and low input rainfed farming system (5.5%, approximately 9 milling ha.). The global average rice yield is 4,374 kg/ha, which stands at the average yield between irrigated farming system (5,528 kg/ha) and high input rainfed farming 520 system (3,663 kg/ha) and is much higher than the average yield of low input rainfed farming system (1,810 kg/ha) and the average yield of subsistence farming system (1,604 kg/ha). At the regional level, Asia (South Asia, South East Asia, East Asia together) is the largest rice producing region, which has harvested approximately 142 million ha. of rice around year 2010. The majority of Asia rice production area is also irrigated, and the share, i.e., 63.7 %, is close to the global share of irrigated rice farming system. South Asia has more rice area harvested (approximately 60 million ha.) than South East Asia 525 (approximately 49 million ha.) and East Asia (approximately 33 million ha.). However, the average rice yield in South Asia (3,553 kg/ha) is lower than South East Asia (4,125 kg/ha) and East Asia (6,566 kg/ha). Consequently, the total rice production in these regions is very close to each other. Rice production in North America is completely irrigated, and the average yield is relatively high in this region. Subsistence rice production is mainly in Sub-Sahara Africa (SSA) and South Asia and the rice yield under subsistence condition is also the lowest among the four farming systems.  Table 3 here)

Allocated harvested area and yield 535
After applying the optimization model in GAMS, the disaggregated crop statistics are spatially allocated to produce the SPAM maps. Figure 3 and Figure 4 present the maps of harvested area and yield (after adjustment) for maize, respectively.
For all farming systems, as shown in Figure 3(e), maize area is highly concertrated in Northen China and Northen America.
However, maize production in North America is mainly rainfed with high input, while in China, rainfed farming system is mainly located in the North-east part ( Figure 3e) and irrigated farming system is mainly found in the Central-north part 540 ( Figure 3a). The rainfed low input farming system ( Figure 3c) and subsistance farming system (Figure 3d) for maize production are manily located in South America and SSA, while the rainfed high input maize farming system is also widely distributed outside China and Northen America, including Central America, Europe other regions ( Figure 3b). As shown in

Value of Production
Finally, we use the average 2009-2010 base year price (international dollar, I$) to compute value of production in each grid, 555 for each crop and farming system. Table 4 shows value of production for all crops, food and non-food crops in all regions, as well as the percentage of each category value in relation to the total value. Asia (South Asia, South East Asia, East Asia together) accounts for nearly half (49.2%) of the total value of crop production in 2010, while Middle East and North Africa, Central America, Russia and Oceania account for less than 5% each. Globally, food crops accounts for 86.2% of the total crop production value, with minor regional differences (the classification of crops into food and non-food is detailed in Table  560 S3).  Table 4 here) 565

Data availability
The SPAM2010 provides four essential output indicators, including (a) PHYSICAL AREA: it is measured in a hectare and represents the actual area where a crop is grown, not counting how often production was harvested from it. Physical area is calculated for each production system and crop, and the sum of all physical areas of the four production systems constitute the total physical area for that crop. The sum of the physical areas of all crops in a pixel may not be larger than the pixel size. 570 (b) HARVESTED AREA: also measured in a hectare, harvested area is at least as large as physical area, but sometimes more, since it also accounts for multiple harvests of a crop on the same plot. Like for physical area, the harvested area is calculated for each production system and the sum of all harvested areas of all production systems in a pixel amount to the total harvested area of the pixel. The sum of all the harvested areas of the crops in a pixel can be larger than the pixel size.
(c) PRODUCTION: for each production system and crop, production is calculated by multiplying area harvested with its 575 corresponding yield. It is measured in metric tons. The total production of a crop includes the production of all production systems of that crop. and (d) YIELD: it is a measure of productivity, the amount of production per harvested area, and is measured in kilogram/hectare. The total yield of a crop, when considering all production systems, is not the sum of the individual yields, but the weighted average of the four yields.

Model uncertainty and validation 590
The first SPAM product was the regional level agricultural production maps produced for Brazil circa the year 1994 (You et al. 2006), since then the model and products of SPAM have been continuedly improved and updated. Beside the evolution of method (see Section 3), the evaluation of SPAM model performance is also improving. In one of our early works, the uncertainty of model, i.e. the variance explained by the cross-entropy approach, is evaluated by comparing it with the performance of simplified proportional approaches, which have been used by Monfreda et al. (2008) for producing the M3 595 dataset. It proved that the cross-entropy approach was most successful in estimating crop areas than the proportional approaches, no matter in proportion to the total land area, to the cropland area, or to the amount of (biophysically) suitable land for the production of each crop (You et al. 2006). Moreover, many researchers believed that the inclusion of economic factors, i.e. market, would increase the performance of crop disaggregation model, see the discussion in (You et al. 2014), though it does not automatically guarantee that model outputs. This partly explained the considerable discrepancies between 600 SPAM2000 and M3 (Anderson et al. 2015), and partly confirmed that using more sophisticated approaches for production allocation would reduce uncertainty (Donaldson and Storeygard, 2016). In one of our recent works, the sensitivity of the variant of the standard SPAM model output to a few methodological-data choices had been evaluated. These include the spatial allocation method, the crop coverage, the treatment of a "rest-of-crops" aggregate, the incorporation of a "crop potential suitability" data layer, the inclusion of rudimentary economic elements, and the administrative units details of the 605 primary source statistics. It showed that the standard SPAM estimates are unsensitive to the inclusion of crude economic elements, moderate sensitive to the set of crops or crop aggregates being modelled, but mostly dependent on the degree of disaggregation of the underlying national and subnational production statistics (Joglekar et al. 2019). This implies that the improvements on methodological aspect of SPAM have limited effect on reducing uncertainty. By contrast, the quality and accuracy of the underlying statistics used to prime the model is particularly pertinent (Joglekar et al. 2019). 610 SPAM products are estimates with various uncertainty. Inaccuracy surely exists, and varies from region to region, and even from crop to crop. Although there were efforts paid to the evaluation of model conceptualization and performance, those previous validations should not be taken for granted for the latest updates. Therefore, we carried out extensive validation works to assess the accuracy of the output maps of SPAM2010. Firstly, we relied on a system through which we are able to 615 send the crop maps to collaborators and users alike for comments or assessment. For example, the CGIAR is a global partnership which unites 15 centers engaged in agricultural researches. As each center has its own mandate crops, e.g., IRRI (International Rice Research Institute) for rice and CIMMYT (International Maize and Wheat Improvement Center) for maize and wheat. We took advantage of their vast network of field offices and local expertise to help us to validate the SPAM results. Many researchers from these institutes have been involved in the production of SPAM2010, which increases 620 the reliability of the results. The Chinese Academy of Agricultural Sciences (CAAS) undertook the regional level validation for SPAM2010, following the approaches they have applied for the evaluations of previous SPAM products (Liu et al., 2013;Li et al., 2016;Chen et al., 2016). Moreover, field level validating information have either been collected by crowdsourcing tools such as Geo-Wiki (Fritz et al., 2012) and eFarm (Yu et al., 2017a), or through field trips and workshops onsite or online where local experts were asked to confirm or validate the crop production maps by providing hand-written 625 comments or posting comments online at the our MapSPAM website. Most of these reports were collected crop by crop, and country by country. An example of detailed validation process is provided in the SI (see Section S16). The complete validation process could take a great deal of effort and time, but these users' feedbacks are quite important and valuable. We took these feedbacks and re-run SPAM model and further released the updated versions of SPAM. The previous SPAM products have been updated substantially with the help of those comments. For example, SPAM2000 and SPAM2005 are at 630 version 3.07 and version 3.20, respectively. The current product, i.e., SPAM2010v1.10, is released after extensive validations already, it is still open and ready to receive more comments. Such an iterative process would enable a continued update to improve the product quality.
Secondly, we qualitatively evaluated the uncertainty of input data. Like any models, the results depend on the input data and 635 the modelling process. For SPAM, the most important input data is the sub-national crop data, which has large impact on the final product accuracy as mentioned before. We built our SPAM uncertainty rating mainly on the availability and confidence on our sub-national data. In addition, we added the parameters and constraints we have to adjust to solve the SPAM model.
For example, we sometimes have to abandon some crop potential suitability constraints in order to solve a country. For some countries, we may have to allow cropland per pixel to increase by 5 or even 10% than the original input to make the model 640 run. In addition, we collected feedback and comments from users, local experts and collaborators as discussed above. They are sporadic but very useful. We combine all the information together to give a subjective rating on how confidence we, SPAM team, think of our final crop maps (both area and yield) based on the judgment on the reliability of input data. Figure   5 shows the country-level uncertainty rating with 5 categories (1 represents the lowest uncertainty, 5 the highest). The complete rating list is presented in Section S17 in the SI. Not surprisingly, the uncertainty in Africa and Southeast Asia is 645 higher than those countries in Europe and America. Although such a validation process is not vigorous, but the result is convincing and such a rating is highly demanded and explicitly requested by users.

650
Thirdly, we quantitatively evaluated the results by cross comparing the results with statistics at another administrative level that have not be used in running the model. We ran SPAM with complete statistics (ADM0, ADM1 and ADM2), and then ran them with only ADM0 and ADM1 statistics, to see how the aggregated results to ADM2 compare to the original statistics at ADM2, or at least to the aggregated original results at ADM2. The runs were all done at ADM1 and then 655 combined to give results for the whole country. We then calculated the coefficient of determination (R 2 ) between the values allocated from model and obtained from statistics to assess the model performance. In general, a higher R 2 indicates for a better performance. This approach has already been used for evaluating the performance of SPAM2000 (You et al. 2014).
The upper part of Figure 6 shows the results of such approach applied to Brazil in SPAM2000 for its main food crops, while the bottom part of Figure 6 shows the results of the same approach applied to the same country for the same crops in performance improved greatly from SPAM2000 to SPAM2010, especially for soybean and potato. We further selected a few smaller countries in Asia and Africa to undertake the same assessment, which are believed to have a relatively higher uncertainty in terms of input data ( Figure 5). Bangladesh, Benin, Senegal, Tanzania were selected as they have good statistical data coverage in SPAM2010. Figure 7 shows that the R 2 for selected crops (i.e. maize, rice and cotton) ranged 665 between 0.66 to 0.94, suggesting that the overall performance of SPAM2010 is good in these selected countries for those selected crops.

675
Finally we did regional-level quantitative validations in case that the third-party independent crop maps are available, given that it is impossible for us to collect the true spatial distribution of crops (both area and yield) for the time of 2010 on a global scale. Among the limited third-party, independent spatial crop distribution data, the Cropland Data Layer (CDL, https://nassgeodata.gmu.edu/CropScape/) is a crop-specific land cover dataset created for the continental United State using moderate resolution satellite imagery and extensive agricultural ground truth, which has been applied to validate our 680 SPAM2010 product at the regional scale by correlating the grid level crop area. We focus on the three most popular staple crops in the United States, i.e. maize, wheat and soybean, and obtain the crop area maps of 2009, 2000, and 2011 from CDL.
We calculate the 2009-2011 average crop areas at a 5 arc-minute resolution for CDL according to the scheme of SPAM2010, and further calculate the coefficient of determination (R 2 ) and the root mean square error (RMSE) between the grid level values derived from the two datasets ( Figure 8). The values of R 2 are between 0.71 and 0.91 and the values of RMSE are 685 between 231 and 307 ha., indicating a relatively high reliability. In particular, the higher R 2 and lower RMSE suggest our maize and soybean maps are more reliable than the wheat map. There are potentially many factors affecting the different results if we treat CDL as the truth, for example, the different accuracy or availability of input data, suitability layers and parameters for the area shares and yield ratios. Another possible reason is that we did not distinguish spring wheat and winter wheat in SPAM, which partly explains that the agreement for wheat is lower than that for maize and soybean. rather than land use and is in a relatively coarse spatial resolution. Moreover, the R 2 is substantially increased comparing to its predecessors. For example, the R 2 is assessed as 0.42 for SPAM2005 by using the same approach according to Liu et al. (2013). In addition, there are regional-level crop distribution maps produced by independent efforts on interpreting remotely sensed images. For example, Zhang et al. (2017) provided annual paddy area time series from 2000 to 2010 based on satellite remote sensing for China and India. We compared these remote-sensing derived paddy maps with the rice area 700 estimated by SPAM for the year 2010. The R 2 values are 0.36 and 0.34 for China and India respectively (Figure 10). We could expand this quantitative evaluation when more third-party independent crop maps are available. However, it should be noted that errors might exit in the third-party independent crop maps as well, hence this quantitative evaluation approach also might result in uncertainty. Our results show that the uncertainty gradually increase when applying CDL, NLCD and Zhang et al. (2017). 705

Data comparison
There are a few reports which compare SPAM with M3, MIRCA and GAEZ, especially their output maps circa 2000 (Anderson et al., 2015;Donaldson and Storeygard, 2016). Although it is difficult to make statements about which one is better, there are several features that distinguish SPAM products from the M3, MIRCA and GAEZ data. First, the estimates 720 from SPAM can be customized using user provided data for one or more of the inputs variables and return results to the provider in a short turnaround period. Second, although SPAM runs mainly at a 5 arc-minute resolution, it can be run at higher resolutions provided that at least some of the rasterized inputs have also higher resolution data to support such an exercise. Third, considerable effort is made to compile sub-national crop statistics at administrative level two (e.g., district or county) for all possible countries. Fourth, if there is knowledge of crop existence in any area, for any crop, this can be 725 incorporated into the model to make a more accurate crop allocation. Moreover, SPAM does not have a large coverage of crops (compared to M3) and does not include detailed biophysical parameters (compared to MIRCA and GAEZ), instead it focuses more on agricultural production by providing data on crop harvested area and yield disaggregated by farming systems. Finally, SPAM results are readily available on the internet in several formats (also tabular), for all interested users.
We are currently building a SPAM model on the cloud where we let any user to supply his/her own input data and run 730 SPAM on his/her own under the Github platform. This SPAM on the cloud will be published and communicated to SPAM user community once it is ready. Anderson et al. (2015) conclude that substantial discrepancies exist across these four global spatially explicit crop production datasets circa 2000, and the disagreement between models serves as a reminder of the ongoing challenges to the creation of 735 spatially explicit estimates of harvested area and yield based on crop statistics. However, it is more challenging to assess the disaggregated farming system results such as irrigated rice vs rainfed rice, subsistence maize vs high input rainfed rice, which have not been systematically explored in Anderson et al. (2015). We collected additional global datasets which are relevant to agricultural production mapping, e.g. the average irrigated and rainfed yields (ca. year 2000) from Siebert and Doll (2010), and the harvested area and yield for 4 crops (ca. year 2005) from http://www.earthstat.org/. We compared these 740 datasets with our SPAM products at the corresponding period. We found that the results are differed from crop to crop, and from farming system to farming system. In general, the yield estimates on maize and wheat are better than the other crops, and the irrigated yields are better than the rainfed yields ( Figure 11 and Figure 12).

750
However, as M3, MIRCA and GAEZ do not provide subsequent global spatially explicit crop production ca. year 2000, it is impossible to compare the current SPAM2010 with other data products. In order to illustrate the continuity of SPAM products, we present a grid-by-grid comparison between SPAM2010 and SPAM2005. Figure 13 shows that rice production in 2010 increased notably in East Europe, Africa, Northeast China, Northwest India, South Australia, etc., while decreased 755 notably in Central Asia and South America. Maize production displays an overall increase across the globe between 2005 and 2010, except for some places in Central Asia which have shown a decrease trend. It is also noticeable that maize production in the US and Europe have kept relatively stable. This result is accordant with the "maize boom" which had taken place around the globe (Herrmann, 2013), especially in the developing countries (Cairns et al., 2013;Ornetsmüller et al., 2019). It should be noted that the current type of comparison may not be a perfect comparison (because differences exist in 760 methodologies and data input applied in SPAM2005 and SPAM2010), and that the current comparison only shows the rate of change, thus a higher value does not necessarily indicate a huge change in absolute crop production. In addition, we compared the changes in crop area between SPAM products and the above mentioned regional-level independent crop maps once they are available in time series. We calculated the area changes in maize, wheat and soybean by overlaying CDL2005 and CDL2010, and undertook the same procedure for SPAM. We then plot these changes (i.e. 770 ∆CDL and ∆SPAM) in Figure 14. Likewise, we compared the changes in SPAM rice area and the changes in paddy rice area obtained from Zhang et al. (2017) (Figure 15). Figure 14 and Figure 15 both show that the coefficient of determination is extremely low between changes yielded from different data products, which further reminds that it is inappropriate to directly compare SPAM products over time, although we are confident with the spatial accuracy of SPAM products at each time stage. This is mainly because SPAM requires for a large amount of input data, yet the sources of these multiple data 775 inputs can not be guaranteed as the same across different time stages. Therefore, such changes reflected by SPAM products over time not only mix real changes on the ground, but also largely depend on the input data. For example, the cropland layers (one of the most important data inputs) are accessed from different sources to make sure the cropland data and the statistical data are adopted for the same year. We did not evaluate the continuity of these input data, which is almost impossible and is beyond the purpose of SPAM. Consequently, it is suggested to use the SPAM products with, at least, 780 acknowledgement to the corresponding cropland layer, e.g. Lu et al. (2020) for SPAM2010. Moreover, we do not recommend users to cross compare the SPAM products over time, because the differences may have more input data errors/inaccuracies than detecting the real change on the ground.

Limitations
As stated previously, the SPAM estimates are dependent on the extent and veracity of the primary input data like most models (Joglekar et al. 2019). SPAM2010 requires data on 42 crops in over 200 countries for the production season. Ideally, this data should be collected at an ADM2 level, however this is not always possible. It is particularly difficult to a few countries such as Somali and Nigeria where reliable data is not available or different input data just conflict each other For 795 example, only one crop area (i.e. millet) for a district is already larger than the total cropland area, yet we know there are still five more crops growing in this district. In these cases, we have to adjust the conflicting data, using expert judgment, to make the model solvable. Since most cropping statistics are not delineated by farming system, estimates of the shares of production under each of the four systems in question are required. To convert harvested area statistics to the physical area statistics used in the model, additional data on cropping intensities by crop and farming systems must be collected. We have 800 made every effort to collect official or published data and we reply on expert judgments as the last resort when we simply could not find other sources. For example, no country publishes official statistics on crop yield ratio (yield conversion factor) between irrigated vs rainfed crop. We surveyed published papers, personal communication with FAO's Agriculture to 2030 team, and gray literature to collect such data. While indeed a series of expert judgments are used, the scope (e.g. crops and regions) is quite limited in the overall input data. Once the data on disaggregate cropping practices is compiled, several 805 variables at a gridded scale are needed to disaggregate these cropping statistics into the desired spatial units. This data includes estimates of cropland, irrigated land, suitable area and yield, population density and protected areas. The variety and sheer volume required to run the SPAM (and related) models raises questions of reliability and comprehensiveness of estimates across different cropping statistics, geographic areas and countries.

810
In terms of reliability, different sources of information may lead to inconsistent and even incompatible information. For example, the data on the estimated cropland extent within a grid is compiled from several sources, which in turn deploy different methods to generate their estimates. The extent of cropland within a grid is crucial information for the allocation model, but the confidence regarding its actual location varies regionally, see Lu et al. (2020). Crop statistics on area harvested and yield may not have been consistently collected and processed across different countries, so these major data 815 may be unreliable to begin with. Additionally, two of the major conversion factors used, farming system shares and cropping intensities, are often not available for each crop and farming system within a country. Lacking raw data on these statistics for a particular crop-country combination, this data was simply assigned from a similar crop or country or created using expert judgment. Neither data on cropping intensities nor farming system shares have been validated for reliability. In terms of comprehensiveness, notably less sub-national coverage exists in developing countries, and only global average commodity 820 price data was used to account for the economic influences on crop production.
The wide range of data sources, coverage and regional nuances of crop production, have methodological implications. First, there are possible trade-offs between data consistency and data reliability. For example, there are requirements of the model (i.e., cropping intensities and farming system shares) that are not consistently available within a country at the administrative 825 level needed. Often, these numbers are taken from national-level values, even though they may not reflect the reality in the administrative level. Second, multi-and inter-cropping is not handled in a sophisticated manner within SPAM. These types of cropping patterns are only accounted for using a single cropping intensity value per crop, farming system, and (possibly) sub-national unit. Finally, we rely on population density for an indirect representation of market proximity in SPAM, which might cause confusion. We use rural population to calculate a prior for subsistence portion of a crop (i.e. subsistence among 830 the four farming systems). For subsistence farmers, by definition they mostly consume what they produce and so we indeed assume their production is closely correlated with their rural population size. For all other farmers, we assume they produce for the market (local, regional or even international markets). In fact, market is important for both subsistence farmers and commercial ones. Even in poor countries where self-consumption is high, a large majority of households still purchase food products produced by others (Losch et al., 2012). This is the assumption behind the revenue calculation to break down the 835 total cropland into individual crop areas in the prior calculation. We did not, explicitly or implicitly, assume rural population (even subsistence farmers) is entirely fed by local agriculture. The confusion comes from our using rural population as a proxy for market access in the revenue calculation. As this crop-specific revenue is divided by the total revenue within a pixel in equations (9 and 10), the prior is not affected by market accessibility if it is not crop-specific. In other words, cropspecific market accessibility is preferable for the current SPAM model. Yet such accessibility data at the global scale doesn't 840 exist now. We would consider including a more direct representation of market proximity by actual travel times to markets or road networks, as the global roads and railways database are becoming available (Kok et al. 2019). Several trade-offs were made to ensure the complex allocation method was tractable, and it is important to recognize that these trade-offs likely affect the plausibility of results.

845
Last but not least, we admit that it is inappropriate to compare SPAM products directly across time stages (Figure 13-15), although we have paid every effort to guarantee the spatial accuracy of SPAM products at each time stage. It is largely because of the system errors exist across various data products. In a latest publication, Iizumi and Sakai (2020) released a time-series product of global gridded crop yields. Although they applied a different approach (i.e. spatial adjustment) which is conceptually different from the spatial disaggregation approach applied in SPAM, it provides great implications to further 850 integrate and standardize the SPAM and the similar gridded Earth System datasets for broader applications. There is an ongoing consortium called The Land Use Change Knowledge Integration Network (LUCKiNet, www.luckinet.org). SPAM team is part of this consortium which aims to integrate tools and standardize approaches across various ongoing projects that develop gridded information on land-use dynamics for applications in food security, climate change, biodiversity, and other related issue area. Not only LUCKiNet aims to create crop maps comparable over time, we also want to have these maps 855 consistent across land uses such as cropland, grassland, forest. The modelling techniques would consider the spatiotemporal dynamics of different land use forms in an integrative framework.

Concluding remarks
In this paper, we present SPAM2010-the latest global gridded agricultural production dataset in 2010. SPAM2010 uses an 860 updated cross-entropy approach to make plausible estimates of crop distribution for 42 crops and four farming systems within disaggregated units, which shows great improvement than its predecessors: SPAM2000 and SPAM2005. For example, the expanded crop list not only enables the analysis for staple food crops but also for cash crops. A recent study has analyzed the global beer supply by using SPAM2000 (Xie et al., 2018). It will be very promising to analyze the global coffee and tea supply by using the latest dataset, as these crops are newly included in SPAM2010 which are in an increasing 865 demand with superior economic value but also highly sensitive to climate change (Bunn et al., 2015).
SPAM2010 substantially extends the SPAM family and fills the gap for the work of global agricultural production mapping, by successfully creating a global gridscape at the confluence between earth and farming systems. In particular, it helps to better understand land management practices characterized by concomitant data and knowledge gaps (e.g. crop selection, 870 and element of crop harvest) (Erb et al., 2017). It not only allows analysts and policymakers to better target agricultural and rural development policies and investments, increasing food security and growth with minimal environmental impacts, but also enables scientists to better address the global change challenges within the anthroposphere and beyond by providing the only possibility to update the global agricultural and environmental assessments from year 2000 (when M3, MIRCA, GAEZ, and SPAM2000 are available) to year 2005 and 2010 (when SPAM2005 and SPAM2010 are available as well). All the 875 SPAM maps and tabular data in multiple time stages are freely available on the MapSPAM website (http://mapspam.info/), which also acts as a platform for validating and improving the performance of the SPAM maps by collecting feedbacks from users.

Supplement.
The supplement information related to this article is available online. 880 Author contributions. Each dataset is plotted in a coordinate system with the x-axis representing the timespan and the y-axis representing the number of crops that have been included. For each dataset, the first row indicates the major measurement(s) of agricultural 1040 production, the second row indicates the cropland cover layer, and the third row indicates the main approach for allocating production. The dash line within the chart indicates the evolution of a dataset family. The rhombuses indicate spatial data inputs/outputs, while the other shapes indicate non-spatial data inputs (see the detailed 1045 data description in the following section).

Figure and tables
The orange color indicates how crop statistics are disaggregated by administrative unit (k), crop type (j), and farming system (l). The green color indicates how the spatial parameters are collected and prepared at a unified spatial resolution (i) and in a harmonized manner. The yellow color indicates the spatial allocation inputs/outputs.
The darker colors, either in orange or in green, highlight the essential elements in SPAM: the former indicates the farming 1050 system disaggregation scheme while the later indicates (i.e., priors of physical area) a key parameter with which the spatial and non-spatial data are connected and the iterative spatial allocation is able to take place.