Refined burned-area mapping protocol using Sentinel-2 data 1 increases estimate of 2019 Indonesian burning 2

14 Like many tropical forestMany nations, Indonesia is are challenged by landscape fires. Given the significant 15 impacts that burning episodes have on the global carbon cycle and on human health across South-East Asia, A aA 16 confident understanding knowledge of the area and distribution of burning is crucial to understanding quantify 17 monitor the implications impacts of these fires and to assess how they might best be reduced. Given uncertainties 18 surrounding different burned-area estimates, and the substantial differences that arise using different detection 19 approaches, and the uncertainties and debates that surrounding such resultsburned-area estimates, their relativethe 20 accuracy, and merits of such estimates require formal examinationevaluation. Here we propose, illustrate, and 21 examine one promising approach for Indonesia. Despite investment in fire mitigation measures since the severe El-Niño 2015 fire season, severe burning struck 23 Indonesia again in late 2019. DHere, drawing on Sentinel-2 satellite time-series analysis, we present and validate 24 new 2019 burned-area estimates for Indonesia. The corresponding burned-area map is available at: 25 https://doi.org/10.5281/zenodo.4551243. We show that >3.11 million hectares (Mha) burned in 2019, 31% of 26 which on peatlands. This burned-area extent is double the Landsat-derived Official estimate of 1.64 Mha from the 27 Indonesian Ministry of Environment and Forestry, and 50% more that the MODIS MCD64A1 burned-area 28 estimate of 2.03 Mha. Though we observed proportionally less peatland burning (31% versus 39% and 40% for 29 the Official and MCD64A1 products, respectively), in absolute terms we still observed more peatland burning 30 absolutelya greater area of peatland affected (0.96 Mha) than the official estimate (0.64 Mha). This new burned- 31 area It dataset has greater reliability as these alternatives, attaining a user’s accuracy of 97.9% (CI: 97.1%-98.8%) 32 compared to 95.1% (CI: 93.5%-96.7%) and 76% (CI: 73.3%-78.7%), respectively. It omits fewer burned areas, 33 particularly smaller- (<100 ha) to intermediate-sized (100 ha -1000 ha) burns scars, attaining a producer’s 34 accuracy of 75.6% (CI: 68.3%-83.0%) compared to 49.5% (CI: 42.5%-56.6%) and 53.1% (CI: 45.8%-60.5%), 35 respectively. The frequency–area distribution of the Sentinel-2 burns scars follows the apparent fractal-like power- 36 law or “pareto” pattern often reported in other extensive fire studies, suggesting good detection over several 37 magnitudes of scale. Our relatively accurate estimates have important implications for carbon-emission 38 calculations from forest and peatland fires in Indonesia. Our approach is amenable to the ongoing production of 39 accurate annual burned-area maps for environmental monitoring and policy in South-East Asia.


Introduction
particular they often spread beyond the intended areas (Gaveau et al., 2017), largely over degraded lands 98 (Miettinen et al., 2017;Lohberger et al., 2018) but can also penetrate into logged-over and intact forests 99 (Nikonovas et al., 2020). Intact rainforests don't burn without the prolonged droughts that favor the accumulation 100 of sufficient dry fuel, and while many live trees often remain (van Nieuwstadt and Sheil, 2005) the resulting 101 changes to forest structure increase the likelihood of further fires (Nikonovas et al., 2020;Cochrane, 2003). Careful

106
In Palangkaraya, the capital city of Central Kalimantan province, daily average particulate matter (PM10) 107 concentrations often reached 1000 to 3000 µg m -3 amongst the worst sustained air quality ever recorded worldwide 108 (Wooster et al., 2018). For reference, 50 µg m -3 is a short-term (24-h) exposure limit set by the World Health Indonesia. These include the FireCCI41 product derived from Envisat-MERIS (Alonso-Canas and Chuvieco,

158
with omission and commission errors of 40% and 22% globally for the 'burned' class (Giglio et al., 2018). This 159 validation is based on independent globally distributed visually interpreted reference satellite data, however none 160 over Indonesia. These coarse-resolution datasets generally omit small-scale fires and, thus, the reported burned 161 area is underestimated (Ramo et al., 2021). This has motivated research in the use of medium-resolution satellites  validation is based on independent globally distributed visually interpreted reference satellite data, however none activity may be underestimated, considering the challenges of their visual interpretation and delineation. Visual 182 interpretation entails a manual delineation of burn scars perimeters, which yields accurate results for large burn 183 scar mapping at local scales, but is very time consuming at large spatial scales, particularly when mapping small 184 fires. A thorough accuracy assessment is also not available for the official burned-area products. Given the 185 unknown errors around burned-area estimates, and the differences between them, the accuracy, and merits of the 186 different mapping approaches over Indonesia require formal examination. Here, we present new and validated 2019 burned-area estimates for Indonesia using a time-series of the 192 atmospherically corrected surface reflectance multispectral images (level 2A product) taken by the Sentinel-2 A and B satellites. With higher spatial resolution (20-m) and more frequent observations (5-day revisit time), the et al., 2017), in turn allowing for its reproduction thus permitting wide applicationfor ongoing burned-area 197 monitoring. We also developed an independent reference dataset to compare the accuracy of our estimate against 198 the Official and MCD64A1 burned-area maps. Given the lack of randomly objectively distributed ground 199 verifications of 'burned' and 'unburned' locationstruthing, we sought an efficient ways to extract reference sites 200 by visually detecting either a smoke plume, a burn scar, or a heat source (flaming front, or hotspot) from the 201 archive of original Sentinel-2 time series Sentinel-2 images. Finally, we examined differences in terms of 202 scarburn-size frequency distributions among these three burned-area estimates to examine spatial patterns.

206
A burned area is an area of land characterizedidentified by alteration of vegetation cover and structure by along 207 with deposits of char and ash, and by alteration of vegetation cover and structure. We mapped burned such areas 208 using a change--detection approach, i.e. by comparing Sentinel-2 infrared signals recorded before and after a 209 burning event (Liu et al., 2020). We analyzed a time-series of the Normalized Burned Area Ratio (see section 2.2) 210 to assembled two national composite images depicting the spectral condition of vegetation shortly condition 211 before and shortly after damagea disturbance (Figure 1). These composites represent a convenient way to capture 212 the entire burned landscape stored in just two image files. 2019 burning (Figure 1)Although we refer to these 213 images as "preand post-fire composites", they also capture vegetation damage caused by automatically extracting 214 pairs of nominally 'burned' and 'unburned' pixels from 47,220 original Sentinel-2 images acquired between 01 7 and post-fire images spans the entire 2019 burning season. It is a convenient way to capture the entire burned 219 landscape stored in just two image files. Subsequent toAfter the production of this image pairthe pre-and post-220 fire composites, we used a "Random Forest" classification model (see section 2.3) trained on visually identified reference dataset by visually interpretating identifying burns scars in the original time-series (5-day repeat pass)

225
Sentinel-2 images. Fourth and finally, we assessed our burned-area map, as well as the Official and MCD64A1 226 burned-area maps, against our reference dataset to gauge the reliability and accuracy of the three burned-areas 227 products. Finally, we tested whether, and how, the three burned-area estimates differed in their tendencies to 228 incorporate burn scars of larger or smallerdifferent sizes.   Figure 2). The NBR of burned damaged vegetation typically declines abruptlyeclines towards 0 (or 246 ≤ 0 for severe damage) because the NIR reflectance declines due to chlorophyl and leaf destruction, while the 247 reflectance of short-wave-infrared spectrum (SWIR; wavelength = 1.610 µm or 2.190 µm; Band 11 or Band 12) 248 increases due to dead or charred material and exposed ground cover. , such that NBR values ≤ 0 of ≤ 0 are often 249 apparent for a severalfew weeks after a fire severe burning or clear-cutting., while the reflectance of short-wave-250 infrared spectrum (SWIR; wavelength = 1.610 µm or 2.190 µm; Band 11 or Band 12) increases due to charred 251 material and exposed ground cover. We analyzed a NBR time series for approximately 94.5 billion 400 m 2 pixels 252 pairs (Indonesia's landmass =198 Mha) to detect the day when a pixel's vegetation was disturbed by fire. . We 253 describe the procedure to detect drops in the NBR time series in the following paragraph.

254
We detected breaks drops in NBR time series with a moving-window approach. Every two days, aA moving 255 window scanned NBR values three months prior and one month after the central day of the window. The output 256 value of the moving window (blue dots in Figure 2) is the difference between average NBR values observed 257 before and after the central day. The NBR average after the central day also included the value ofat the central regions, but generally 5 days when satellite passes do not overlap. The day of the year when thise dNBR difference 264 reached a maximum corresponded to the moment NBR dropped most markedly in each pixel over a two-day 265 period, flagging a disturbance to the pixel's vegetation potentially caused by fire. At this date, we created a pair 266 of pre-and post-fire pixels by selecting the median Red, NIR and SWIR spectral values acquired three months 267 before and one month after the disturbancethe potential burning event. We selected a one-month window rather 268 than a three-month window to compute the post-fire image to maximize our chances to detect a fresh recent burns 269 scars, given that burned areas on degraded lands and savanna tend to re-green rapidly. We repeated this procedure   samples that result from a splitting node at the Decision Tree. We found that a minimum leaf size equal to 1 293 performed the best on average and, thus, we used this value. We selected a conservative number of trees, 50, to 294 ensure the good performance of the Random Forest. We did not set any limit to the maximum nodes in each tree 295 and the variable to split in the random forest was set to the square root of the number of variables, which is the 296 common practice among machine learning practitioners and the default configuration in Google Earth Engine. point coordinates labelled as either 'burned' (317 points) or as 'unburned' (671 points). The selection of these 305 pixels was were selected realized by visual interpretation of the pre-and post-fire image composites. Burned 306 areas show a distinctive dark (low albedo) brown/red color in the SWIR-NIR-Red composite image when 307 displayed as Red-Green-Blue channels ( Figure 1). The training pixels were collected in a variety ofacross 308 landcover types (Supplementary Table S1 for landcover types) to ensure the representativeness of the training 309 dataset and the satisfactory generalization of the classification model across Indonesia. We selected training pixels 310 focused explicitly on medium-to-high burn severity, i.e. areas where the distinctive red color in the SWIR-NIR-

311
Red composite image looked the darkest, indicating that all or most of the vegetation/soil burned. This aspect of 312 the methodology hedged against over-estimation of total burned area by minimizing so-called ed "false positives".

313
It may however but may exclude areas with implied low-burn severity or low-visibility impacts, such as 314 understorey fires (below an intact forest canopy, see e.g., van Nieuwstadt and Sheil, 2005) and even some 315 agricultural and grassland fires. By prioritizing confident identification of fires over absolute burned-area 316 coverage, as well as by duly validating our estimates, this conservative approach has the advantage of assuaging 317 sensitivities concerningavoids the problems caused by frequent false positives (Rochmyaningsih, 2020).

318
We assessed burn severity during algorithm training based on visual interpretation. RGB composites with bands  simple random sampling or a stratified random sampling in the generation of the reference dataset. In our case 356 study, we employed a stratified-random sampling approach to ensure an acceptable sample of 'burned' reference 357 sites. Our stratified approach was necessary given that the 'burned' class was rare over the study area: the area 358 of seven provinces of interest is 87.6 Mha and the combined area detected as burned by all three datasets 359 represented only 3.1% of this area.

360
For the generation of the 1298 reference sites (see Supplementary Table S4 for associated landcover types one 361 year before fire), we first randomly sampled (i) 419 sites across from the areas classified 'burned' by the three 362 datasets (red area in Figure 3a; Supplementary Table S21), and (ii) 879 sites in areas classified as 'unburned' by 363 all three datasets hereafter denoted U (grey area in Figure 3a). This sample size is deemed sufficient and 364 comparable to other map assessments at larger scale (Stehman et al., 2003;Olofsson et al., 2014).

365
This initial sample of 1298 total sites present a shortcoming for direct pair-wise comparisons of between the 366 reference dataset and each of the three burned-area maps individually. Specifically, sampling densities in the 367 reference dataset were far greater in areas classified 'burned' by the three datasets (red area in Figure 3a) compared 368 to the area deemed 'unburned' by all three datasets, hereafter denoted U (grey area in Figure 3a). Consequently,

369
for the validation of a given burned-area dataset, its total number of 'unburned' reference sites would be over-370 sampled upon defining 'unburned' reference sites with reference to U as well as areas classified as burned uniquely 371 by one of the other two maps (cyan areas in Figure 3b, c, d, hereafter denoted as U'). Such over-sampling of 372 reference sites in the realm of U' would violate the stratified-sampling approach described in Olofsson et al.

373
(2014) and would lead to an erroneous accuracy assessment. In order toTo achieve a balanced stratified sampling 374 of reference sites across 'burned' and 'unburned' areas of each dataset, we generated three subsamples from the 375 initial 1298 reference sites (red areas in Figures 4f3e,gf,hg) and used these subsamples to validate each dataset. reference sites used for validation is given in Table 1.

398
Interpreters looked for three distinct signs of burning in these images to confirm them as burned: (i) smoke plumes;

399
(ii) flaming frontsthat is, a line a moving fire where the combustion is primarily flaming; and (iii) rapid changes 400 in color from 'green' to 'dark red', characteristic of a transition to charred vegetation (Figure 4). If rapid changes 401 in color were observed over the reference site, with at least one direct feature (smoke or flame) in its vicinity, this 402 indicated a fresh burn scar, and the reference site was declared 'burned'. If rapid changes in color were observed 403 over the reference site, with at least one direct feature (smoke or flame) in its vicinity, this indicated a fresh burn 404 scar, and the reference site was declared 'burned'. If rapid changes in color from 'green' to 'dark red' were 405 observed without smoke or flame, the reference site was also declared 'burned'. If no change in color was 406 observed, with at least one direct feature (smoke or flame) in its vicinity, the reference site was declared 407 'unburned'. If none of these three features were observed, the reference site was declared 'unburned'.

430
We tested whether, and how, the three burned-area estimates differed in their tendencies to incorporate burn scars 431 of larger or smaller sizes. Specifically, we compared the frequency distributions of burn-scar size areas (or 432 "scars") amongst the three estimates to test for similarity and qualify any distinguishing differences on the part of 433 our Sentinel-based estimate. Differences amongst burn-scar size frequency distributions implies that a given 434 burned-area estimate is more or less inclusiveinclusive of burn scars of a given size, regardless of absolute 435 differences to total burned area between the estimates. Inter-estimate comparisons of burn-scar size frequency is 436 analogous to tests of whether the 'samples' of burn scars defined by each estimate describe the same, ultimately 437 partially-observed universe of fire activity. Significant inter-estimate differences imply greater or lesser inclusion 438 of a given realm of fire activitye.g., small-scale agricultural burning, plantation fires, extreme wildfiresthus 439 indicating bias (or lack thereof) without defining such realms explicitly.

440
For all three estimates, we employed the Kruskal-Wallis H test of differences with respect to the 'location' of 441 frequency distributions along a continuum of burn-scar sizes. Given significant inter-estimate differences 442 according to this three-way test, we tested for two-way differences in the shape and location of the scarburn-size 443 frequency distribution (Kolmogorov-Smirnov test), as well as two-way differences in medians (Mann-Whitney U 444 test), between our Sentinel estimate and either the Official or MODIS estimate individually. We performed all 445 comparisons for scarburn-size cohorts > 6.25 ha, > 20 ha, > 100 ha, > 1000 ha, and > 5000 ha, without Bonferonni 446 correction given the nested nature of these cohorts. Testing for similarity over increasingly large scar-size cohorts 447 clarified the degree to which significant inter-estimate differences were attributable to the inclusion or omission 448 of a given cohort.

450
We excluded scars burns <6.25 ha because this is the minimum observable burn-n scar size according to MODIS 451 data, given pixel resolution, and it is already evident that our Sentinel estimates are distinguished by their ability 452 to detect burn scars below this threshold. The of the Landsat-8 Official estimates similarly have few scars < 6.25 453 ha due to the challenging nature of visual interpretations at such fine scales. We note that the minimum scar size

501
The Sentinel, Official and MCD64A1 estimates captured significantly distinct realms of fire activity, as 502 represented by their relative burn size frequencies of scar sizes ( Figure S62). The three estimates differ from one 503 another decreasingly over increasingly larger minimum scar-size thresholdsmost notably for small burns, 504 however, and they are statistically indistinguishable for scars burns > 5000 ha indicative of extreme fire activity 505 (Table 3). In other words, all three estimates capture very large scars burns (>5000 ha) equally well, and 506 distinctions amongst the estimates concentrate amongst small (<100 ha), intermediate (100-1000 ha) and larger 507 burns (1000-5000 ha) scars, in decreasing order of degree as indicated by the magnitude of the test statistics in 508

512
Compared to Official or MCD64A1 estimates, the Sentinel estimate has a significantly greater relative frequency 513 of small scars burned areas (< 100 ha), especially amongst the smallest of these scars (Table 4). This is indicative 514 of a greater detection of the realm of fire activity small fires presumably characterized by small-scale agriculture 515 fires and similar, small-scale controlled burning. The Sentinel estimate similarly has a greater relative frequency 516 of intermediate scars sized burns (100-1000 ha), but less acutely so, with inter-estimate differences being more 517 moderate for the Official estimate than the MCD64A1 estimate (Table 4, Figure 6, Figure S62). For scars burns 518 >1000 ha, the Sentinel estimate differs only relative to the official estimate (Table 3), seemingly due to the latter's 519 lesser estimationunderestimation of large and very large scars ( Figure 6). Note for instance the increasingly large 520 divergence between the cumulative burned-area curves for the Sentinel-2 and the Official estimates in Figure 6 521 for scars burn areas > 1000 ha. For very large scars burns (> 5000 ha), two-way comparisons in Table 4 again 522 report no significant statistical differences in burn-scar detection rates between the Sentinel and alternative 523 estimates. However, given the small sample of patches > 5000 ha, it is noteworthy that the Sentinel estimate

528
In summary, the greater overall burned-area estimate of our Sentinel data compared to the Official and MCD64A1 529 alternatives is largely attributable toreflects differences in the inclusion of smaller and intermediately sized scars.

530
Indeed, tThe aerial sum of all Sentinel burn scars areas that are individually <~860 ha equals the entirety of the 531 official burned-area estimate ( Figure 6). The . We note that the Sentinel-2 estimate data exhibits a size-frequency 532 pattern that approximates the linear expectation of a near scale-free power-law ( Figure 6).While the finer spatial 533 resolution of Sentinel data must account for some of the inter-estimate discrepancies, particularly relative to the with our estimate'sare dominated by greater different sensitivity detection ofto otherwise overlooked smaller-536 scale burningsmaller burns. Hence, the inter-estimate differences qualify our Sentinel estimates not simply as 537 more extensive but also as qualitatively distinct in terms of the degree to which different realms of fire activity 538 are captured. The near linear log-log frequency-area distribution over several orders of scar-size of our Sentinel We developed a method that generates two national composite Sentinel-2 images depicting vegetation condition 542 before and after burning in 2019 (Figure 1), and then classified this pair to extract burned areas using a Random 543 Forest supervised classification algorithm. We developed a comprehensive validation protocol to strictly assess

559
The burn-scar frequency distribution of the Sentinel-2 estimate is characteristic of robust power-law relation 560 ( Figure 6), a pattern typical of large scale fire studies (Malamud et al., 1998). Modern studies suggest that these 561 fractal-like patterns are often subtly more complex and can arise through a range of phenomena (Karsai et al., 562 2020;Falk et al., 2007). We note that the Sentinel-2 estimate data exhibits a size-frequency pattern that 563 approximates closer to the linear expectation of a near scale-free power-law, or pareto distribution (Karsai et al., 564 2020;Falk et al., 2007). These, patterns are typical of large-scale fire studies (Malamud et al., 1998). compared 565 to either of the alternative burned-area estimates, bothBoth other methods yield of which show a clearlyan S-566 shaped curve with less area at smaller and larger sizes than captured in the Sentinel-2, indicating the likely bias 567 by omission over the entire range of scales and are not determined by image resolution alone ( Figure 6). These 568 results, with different frequency patterns arising from burns from the same regions in the same period, also 569 highlight the danger in interpreting apparent burned-area patterns without careful consideration of the limitations 570 and biases that arise from the methods used to map them-an issue that may not have always been sufficiently 571 recognized in past assessments or policy. more burns of the greater frequency of its coverage (five-versus sixteen-day revisit time). Also, our method 575 avails of the massive computational capabilities and automation of the Google Earth Engine, allowing us to 576 analyze more images and thus map more and smaller burn scars and associated details than could even the most 577 well-equipped team of visual interpreters. method suffered a 24.4% omission error rate (burned areas that remained undetected). These rates reflect 580 necessary tradeoffs between commission and omission error in a context where conservative estimates are much 581 preferred for environmental policy and monitoring. We prioritized a low commission error rate (i.e. high user's 582 accuracy) over absolute burned-area coverage to address sensitivities (Rochmyaningsih, 2020). By hedging 583 against commission errors, our approach omitted hard-to-detect events, including low-intensity burns, such as 584 those that occur beneath the forest canopy on mineral soils (van Nieuwstadt and Sheil, 2005) or on savanna 585 grasslands, which tend to re-green rapidly. While further work is required to clarify and refine the optimal levels 586 of inclusivity and reliability, we emphasize that the production of before and after fire annual composite images 587 is relatively straightforward for the user community, given the availability of both the necessary imagery and our 588 Google Earth Scripts.

589
We stress that wWhile Tthe accuracy assessment proved that our training dataset is valid for the classification of

601
In the past considerable emphasis was placed on the necessity of ground-checks to validate and calibrate remote-602 sensing-based estimates . DSometimes commentators raise doubts about may persist concerning our ability to 603 confidently estimates of burn scars areas without extensive and costly on-the-ground ground-truthingground-604 checks. Modern high-resolution remote sensing makes such on-the-ground checks less essential than in the past 605 as burned areas are readily identified with good accuracy in modern high-resolution imagery such as that we used 606 for our validation. The protocol developed here to generate a reference dataset based on visual inspection of dense 607 (5-day revisit time) satellite imagery is better suited than ground verifications of 'burned' and 'unburned' 608 locations, because it allows the generation of extensive randomly-distributedrandomly distributed well 609 characterised reference sites, a process too time-consuming and costly with field visits. The identification and 610 quantification of less-readily-detected burned areas, such as those under a closed forest canopy, remain a challenge better estimate to calculate carbon emissions from the 2019 fires in Indonesia. Combined with daily fire hotspots 620 detected using thermal remote sensing, our detailed burned-area map can help identify ignition sites and estimate 621 fire duration more precisely, and therefore contribute to forensic analyses of burning across landholdings (e.g.

624
The Indonesian government has shown some success in reducing fires (Sloan et al., in review2021). Apparent 625 reductions to fire activity would however ideally be qualified using our more inclusive and accurate burned-area

919
(de)) were randomly excluded until the sampling density in the area U' equaled that of the larger unburned area U (area in 920 gray). Panels (e), (f) and, (g) and (h) show the three final, adjusted, stratified subsamples of reference points derived from the 921 initial sample of 1298 reference points. Note that the relative areas and number of sites per class in Figure 3 do not correspond 922 to the actual datasets being evaluated.
11=1.610 µm to detect the intense heat from flaming fronts. On these image pairs, one can see flaming fronts traveling towards 928 the reference sites (red dot) from the north on the pre fire images (left), and sharp changes in color from 'green' to 'dark red' 929 characteristic of charred remains with continuing flaming on the post-fire images (right). Layout built using © Google Earth