the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Spatiotemporal mapping of invasive yellow sweetclover blooms using Sentinel-2 and high-resolution drone imagery
Abstract. Yellow sweetclover (Melilotus officinalis (L.) Lam.; MEOF) is an invasive forb pervasive across the Northern Great Plains, often linked to traits such as wide adaptability, strong stress tolerance, and high productivity. Despite MEOF's prevalent ecological-economic impacts and importance, knowledge of its spatial distribution and temporal evolution is extremely limited. Here, we aim to develop a spatial database of annual MEOF abundance (2016–2023) across western South Dakota (SD) at 10 m spatial resolution by applying a generalized prediction model on Sentinel-2 imagery. We collected in situ quadrat-based total vegetation cover with MEOF percent cover estimates across western SD from 2021 through 2023 and synthesized with other available percent cover estimates (2016–2022) of several federal, state, and non-governmental sources. We conducted drone overflights at 14 sites across Butte County, SD in 2023 to develop very high spatial resolution (4–6 cm) and accurate MEOF cover maps by applying a random forest (RF) classification model. The field-measured and uncrewed aerial system (UAS) derived MEOF percent cover estimates were used to train, test, and validate a RF regression model. The predicted MEOF percent cover dataset was validated with UAS-derived percent cover in 2023 across four sites (out of 14 sites). We found that the variation in the Tasseled Cap Greenness and Normalized Difference Yellowness Index were among the top predicting variables in predicting MEOF abundance. Our predictive model yielded greater accuracies with an R2 of 0.76, RMSE of 15.11 %, MAE of 10.95 %, and MAPE of 1.06 %. We validated our 2023 predicted maps using the 3-m resolution PlanetScope imagery for regions where field samples could not be collected in 2023. The database of MEOF abundance showed consecutive years of average or above-average precipitation yielded a higher MEOF abundance across the study region. The database could assist local land managers and government officials pinpoint locations requiring timely land management to control the rapid spread of MEOF in the Northern Great Plains. The developed invasive MEOF percent cover datasets are freely available at the figshare repository (https://doi.org/10.6084/m9.figshare.29270759.v1).
- Preprint
                                        (3234 KB) 
- Metadata XML
- 
                                    Supplement (1573 KB) 
- BibTeX
- EndNote
Status: final response (author comments only)
- 
                     RC1:  'Comment on essd-2025-353', Anonymous Referee #1, 08 Aug 2025
            
                        
            
                            
                    
            
            
            
                        - 
                                        
                                     AC1:  'Response to Referee 1', Sakshi Saraf, 11 Oct 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        We appreciate the thoughtful suggestions and comments made by the referee in revising the manuscript. We would like to sincerely thank the referees for suggestions on language improvement. We have revised the manuscript for clarity and polished the text throughout, incorporating the referee’s suggestions, which we believe have greatly improved the quality of the manuscript. Note-The page and line number reference are based on the track changes/colored document provided in the supplementary. RC: Referee’s comment, AC: Author’s comment/response RC1: Saraf et al. produce a time-series of yellow sweet clover maps in western SD using sentinel imagery. The methods are solid, and the data appear to be useful to managers. However, there are many occurrences of text that is confusing/out of context. I provide detailed suggestions below. Also, the language could be a bit more polished. For example, on line 70: “One such case we have for an invasive plant named yellow sweetclover (Melilotus officinalis….” Should be rephrased to something like “A common invasive biennial in the NGP, yellow sweetclover (Melilotus officinalis….” And on line 72: “of such plant species till the previous decade” should be “of such plant species until the 2010s”. AC1:We thank the referee for the suggestion. We have revised the sentences in lines 74-79 as: “Yellow sweetclover (Melilotus officinalis (L.) Lam., MEOF), a common invasive legume in the NGP, exemplifies this biennial phenology. There has been little to no literature on mapping blooms of such plant species until the 2010s. In recent years, MEOF has attracted attention from land managers in South Dakota (SD) as it is becoming a prominent invasive species in the NGP region. We refer to years with MEOF super blooms (Preston et al., 2023) in the Dakota region as "sweetclover years".” Specific Comments RC2: Line 32: Why did the authors choose an RF approach as opposed to a DNN/CNN? AC2: We appreciate the referee’s concern. This study is built upon our previous work (Saraf et al., 2023), in which we compared four machine learning algorithms, i.e., Cubist, extreme gradient boosting (XGBoost), Generalized Additive Models (GAM), and Random Forest (RF), to evaluate their performance in predicting yellow sweetclover percent cover. RF was one of the primary selected modeling frameworks due to its strong performance, efficiency, easy interpretability and practicality under limited data conditions. RF consistently outperformed the other approaches, demonstrating its robustness and suitability for ecological mapping tasks. Building on these findings, we extended the analysis to multi-year mapping the percent cover of yellow sweetclover across our study region. Although deep learning models such as DNNs or CNNs can perform well with very large datasets, they present several challenges: high risk of overfitting, and primarily their “black box” nature leading to limited interpretability regarding variable importance, and reduced generalization when spatial or temporal coverage is limited. Shiferaw et al., (2019) found that RF outperformed deep neural networks for mapping fractional cover of an invasive plant species in dryland ecosystems, achieving the highest accuracy and robust sensitivity and specificity. Similarly, UAV-based studies comparing RF to deep learning classifiers (FCN and patch-CNN) showed that while deep learning models can excel with very large training datasets, RF performs competitively and often better when sample sizes are moderate or when spatial context is limited (Liu et al., 2018). RF remains a robust and efficient choice, balancing strong predictive performance with interpretability and lower computational demands. Additionally, RF is well suited to our dataset, as it handles unbalanced year-wise samples effectively, ensuring that predictions remain reliable across years with differing sample sizes. RC3: Line 81-82: “Invasive forbs such as MEOF develop inflorescences with yellow flowers that are prominent during flowering time.” Suggest “Invasive forbs such as MEOF develop yellow inflorescences that are prominent during flowering time” AC3: As suggested, we have modified the sentence in lines 87-90 below as “Invasive forbs such as MEOF develop yellow inflorescences that are prominent during flowering time and can be detected using 10 m resolution Sentinel-2 derived reflectance and quantitative indices (Saraf et al., 2023).” RC4: Line 85: Suggest adding “For example,” before “Sentinel-2 imagery with 10 m spatial resolution” to better tie with prior sentence. I would also recommend adding some additional citations to this section. AC4: As suggested, we have modified the sentences in lines 90-93 below as “Previous studies have shown that multi-temporal analysis using remote sensing data can be a powerful tool for addressing challenges in monitoring invasive species dynamics (Bradley, 2014; Mouta et al., 2023). For example, Sentinel-2 imagery with 10 m spatial resolution has sufficed for mapping a range of invasive plant species (Kattenborn et al., 2019).” RC5: Line 91 “Moreover, its yellow flowers can be easily mistaken for other” suggest “Moreover, its yellow flowers can be easily mistaken in remote sensing imagery for other” AC5: We have modified sentence in lines 98-100 as “Moreover, its yellow flowers can be easily mistaken in remote sensing imagery for other yellow-flowered forbs such as yellow salsify, black eyed susan, western wallflower, annual sunflower or leafy spurge.” RC6: Line 133-134: “with poor representation in the field data” in what field data? In the data you’ve collected? AC6: We appreciate the referee for pointing this out. We have revised this sentence in lines 143-145 to “Developing a generalized model that can be applied across space and time allows for efficient mapping of irruptive invasive plant species, which often bloom episodically and occur in clustered patches.” RC7: Line 139: Is objective 1 to map the extent (i.e. presence/absence) or fractional cover of MEOF? This isn’t clear. AC7: We thank the referee for pointing this out. We have revised the first objective and the statement in lines 151-153 as “(1) to develop a generalized prediction model using field-collected and UAS-derived percent cover samples along with Sentinel 2 imagery to map the fractional cover of invasive MEOF across western SD;” RC8: Line 156: Replace rainfall with precipitation. AC8: As suggested, we have replaced the word ‘rainfall’ with ‘precipitation’. The revised sentence in lines 171-172 is “About three-fourths of the precipitation occurs during summer, and snowfall ranges from 650 mm to 5000 mm throughout western SD (Paul et al., 2016).” RC9: Line 160: “mosaic of mixed-grass prairie interspersed with shrubs…” I would say more accurately, from a landscape perspective “mosaic of mixed-grass prairie interspersed with cultivated lands” AC9: As suggested, we have revised the sentence in lines 174-175 as “The landscape of western SD is a mosaic of mixed-grass prairie interspersed with cultivated lands.” RC10: Line 166-168: “Dryland sedges (Carex spp. L.), prairie threeawn (Aristida oligantha Michx.), and fringed sagewort (Artemisia frigida Willd.) increase with disturbance.” what is the source of this information? AC10: We thank the referee for pointing this out. We have added a citation to the sentence in lines 181-184 “Dryland sedges (Carex spp. L.), prairie threeawn (Aristida oligantha Michx.), and fringed sagewort (Artemisia frigida Willd.) increase with disturbance (Owensby and Launchbaugh, 1977; Reinhart et al., 2019; Sanderson et al., 2015). RC11: Lines 178-180: Wouldn’t you also want samples of non MEOF sites? AC11: We appreciate the referee’s concern. We have revised the sentence in lines 196-198 as “The flight locations were randomly selected across Butte County in western South Dakota to capture large, continuous patches of MEOF, ensuring that the imagery encompassed the full range of percent cover within each site, including areas without MEOF.” RC12: Lines 203-212: What attributes were collected in the field sample? Just MEOF cover? Or something else also? Please include this information. AC12: We thank the referee for the question. We have now included the details on all the attributes collected during the field work and revised the sentences in the paragraph in lines 239-252 as “Our field-collected surveys recorded the plant species composition, including dominant species and percent cover of all species present, using the conventional plot-based quadrat method. Within each 30 m × 30 m plot, a minimum of three 0.5 m × 0.5 m quadrats were sampled. Percent cover for each plot was calculated as the average of the quadrat measurements, with each quadrat considered representative of its portion of the plot. Within each quadrat, we estimated percent cover of MEOF by averaging the grids it occupied, allowing fine-resolution observations to be scaled up to the plot level while capturing spatial variability (John et al., 2018). We recorded flowering and non-flowering MEOF individuals separately. The separation was done to document phenological variability and population structure, which can be useful for understanding interannual flowering dynamics in future analyses. However, only the flowering MEOF percent cover was used for remote sensing–based mapping, as flowering individuals exhibit a distinct spectral signal that can be consistently detected in aerial and satellite imagery. This approach ensured that the satellite-derived cover estimates corresponded specifically to the detectable, flowering component of MEOF.” RC13: Line 268: what years are included in the climate means? AC13: We appreciate the referee’s concern. We have revised the sentence to provide clear context in lines 307-314 as “For climate predictors, we utilized the Daymet monthly and annual dataset (Version 4R1) available at 1 km spatial resolution (Thornton et al., 2022). From the monthly data, we calculated mean annual precipitation (MAP) as the sum of monthly precipitation values and mean annual temperature (MAT) as the average of the monthly mean temperatures for each year corresponding to the MEOF cover samples. To account for potential biennial effects, we also calculated biennial precipitation (MAP2) and biennial temperature (MAT2) by combining the values from the sample year with those of the preceding year (e.g., total precipitation across both years and average temperature across both years).” RC14: Lines 270: The seasonal composites are also means from across multiple years? AC14: We appreciate the referee’s concern. We have revised the sentence in lines 314-317 to provide clear context as “We also computed seasonal composites of precipitation and mean temperature for each year separately corresponding to the MEOF cover samples, including spring (March–May; P_MAM and T_MAM) and summer (June–August; P_JJA and T_JJA).” RC15: Line 280 “such as” implies there are other terrain derivatives calculated from NED. Is this the case? If so, please list. AC15: We thank the referee for the suggestion. We have revised the sentence in lines 325-327 as “We used the National Elevation Dataset from the NASA Earthdata portal available at 10 m resolution to derive elevation, slope, aspect, hillshade, terrain wetness index, and terrain roughness index. RC16: Line 285: Proximity to roads could not be calculated from NLCD. You could derive distance to urban though, which may be synonymous with roads in some cases. AC16: We thank referee for pointing this out. We have revised the sentence in lines 329-332 as “The land cover/use data were derived at 30 m resolution from the 2019 National Land Cover Database (NLCD 2019, Dewitz, 2021). We also derived the distance to developed/urban areas, including non-primary roads as a proxy for proximity to roads.” RC17: Line 295: “identically” distributed is odd. Do you mean “randomly”? AC17: As suggested, we have replaced “identically” with “randomly” in the sentence. The corrected statement in lines 344-345 is “Most machine learning models such as RFs work on the assumption that the samples are independent and randomly distributed.” RC18: Lines 295-308: How is this discussion of Moran’s I/spatial autocorrelation different from above (lines 250-256)? The final n (11,235) is the same in both. Is the first describing the UAS site predictions, and the 2nd describing the regional model? The heading of 2.6 should be revised to clarify this. AC18: We thank the referee for pointing this out. As suggested, we have removed the sentences from lines 250-256 (original document) to remove duplicity of the sentences and replaced them in lines 292-295 with “We then combined field and UAS-derived samples from 2016-2023, resulting in a total of 22,972 MEOF percent cover samples for the regional-scale regression analysis described in Section 2.6 and shown in Figure 2.” We revised the heading of Section 2.6 to “Regional MEOF cover regression model” to clarify that this section focuses on the regional-scale regression analysis in line 341. We also moved the statement lines 250-256 from original document to lines 342-344 to avoid redundancy, better clarity and logical flow. “We compiled a total of 22,972 MEOF percent cover samples for the regional-scale regression analysis. After removing duplicate records (samples from different sources falling within the same pixel and year), 20,275 unique samples remained”. RC19: Lines 333-336: This description of fig 3 is practically a caption, no need for this in text. AC19: As suggested, we have removed lines to avoid redundancy, and we have now stated “The RF classification accuracies can be visually validated in three representative UAS sites with MEOF blooms (Figure 3) in lines 383-384.” RC20: Lines 377-378: “However, climate variables like annual precipitation or snow depth, did not rank among the top predicting variables” Or could there be a lag effect (eg last years precip vs current year MEOF)? AC20: We agree with the referee’s point that there could be a lag effect. We have added the following sentences to address this: “Despite this, climate variables such as annual precipitation or snow depth, did not rank among the top predicting variables. This may be due to MEOF’s biennial life cycle, where precipitation from the previous year can influence current-year cover (Klebesadel, 1992; Van Riper and Larson, 2009). We tested this by including biennial precipitation (MAP2). However, due to its high correlation with annual precipitation (MAP) and the higher relative importance of MAP, neither variable alone, at the coarser 1 km resolution, adequately captured the biennial dynamics.” RC21: Lines 381-382: The spatial resolution difference described in the prior sentence is not the reason for “created a MEOF percent cover map series for 2016 through 2023” as currently implied by “therefore”. Rather, you compared the “MEOF percent cover map series for 2016 through 2023” to “precipitation anomaly maps”. AC21: We agree with the referee. We have replaced the sentence in lines 427-442 with “We created a MEOF percent cover map series for 2016–2023 and compared it with precipitation anomaly maps to assess the potential relationship between MEOF cover and interannual climatic variability.” RC22: Line 374: The first couple paragraphs on 4.1 read almost as results since there is no references cited at all. AC22: We thank the referee for this helpful comment. In response, we have split the first couple of paragraphs of Section 4.1 into two parts. The portion describing observed MEOF patterns has been moved to the Results section, while the remaining portion stays in the Discussion. We also added relevant citations to support interpretation and maintain a logical flow. The two paragraphs added in the result section in lines 427-459: “We created a MEOF percent cover map series for 2016–2023 and compared it with precipitation anomaly maps to assess the potential relationship between MEOF cover and interannual climatic variability. These precipitation anomaly maps showed that the western SD witnessed above-average precipitation in a few regions for 2018 and 2023 and most of the western SD for 2019 (Figure S4). The central and eastern counties in 2019 and the central and southern counties in 2023 showed a greater range of MEOF covers showing a consistent pattern of MEOF resurgence with the return of wet conditions. Despite 2016 being a relatively normal or slightly dry year, sweetclover cover remained moderate with less spatial variability, indicating less widespread establishment. The widespread establishment of MEOF could be seen increasing in 2018, with a high Coefficient of Variation (CV) of 0.5 and the percent cover reached a peak in the subsequent year of 2019. For the years 2020, 2021 and 2022, most regions experienced average to below-average rainfall conditions. During these years, the MEOF percent cover reached up to 50%, with a sharp drop in percent cover in 2021, where the maximum cover was only 43%. This showed drought conditions likely limit growth and establishment. The year 2020 and 2022 acted as transitional years, possibly due to lagged ecological response. For dry years, the majority of western SD predicted less than 50% cover. Overall, we found a high percent cover range in the western counties of western SD including Butte, Meade, Pennington, Custer, Fall River, Jackson, Bennet and Oglala Lakota counties. Central South Dakota counties showed fluctuating trends, with moderate to high coverage in some years (e.g., 2018, 2019, 2023) and relatively low coverage in other years (e.g., 2020, 2021), whereas the eastern counties (i.e., Corson, Dewey, and Stanley) consistently exhibited relatively low percent cover (<20%) for the majority of years. In the eastern region, MEOF appeared to be more scattered and patchier with fewer patches of higher percent cover near floodplains, which are situated at lower elevations and benefit from high moisture availability especially in the years 2018 and 2019. During the summer fieldwork of 2022, we observed MEOF predominantly in the first year of its life cycle. In the following year, we observed ample coverage of MEOF blooms in Butte County, SD forming patches substantial enough to be captured by the drones. This temporal pattern arises from the biennial growth period of MEOF. Additionally, we predicted MEOF percent cover estimates for the year 2024 using our trained model (Figure S5). However, this 2024 prediction has not yet been validated due to the unavailability of field data. Validation of model performance for 2024 and subsequent years remains a key focus for future work.” Revised paragraphs in the Discussion section in lines 495-536: “The occurrence of sweetclover years is predominantly associated with wetter conditions, suggesting that precipitation plays a key role in the resurgence of MEOF (Gucker, 2009). Despite this, climate variables such as annual precipitation or snow depth, did not rank among the top predicting variables. This may be due to MEOF’s biennial life cycle, where precipitation from the previous year can influence current-year cover (Klebesadel, 1992; Van Riper and Larson, 2009). We tested this by including biennial precipitation (MAP2). However, due to its high correlation with annual precipitation (MAP) and the higher relative importance of MAP, neither variable alone, at the coarser 1 km resolution, adequately captured the biennial dynamics. This unexpected result may be due to the large disparity in spatial resolution between Sentinel-derived variables at 10 m and the 1 km climate variables, which likely contributed to an underestimation of precipitation’s importance in the model (Latimer et al., 2006). There is a possibility that MEOF blooms could be influenced not just by precipitation but also by local groundwater availability or soil moisture, particularly in areas near floodplains. While we observed some higher cover near floodplain regions in certain years, the pattern was not consistent across all years. Future analyses focusing on watersheds and hydrological variables could help clarify the environmental drivers of bloom events. Overall, our findings suggest that climate contributes to interannual variation in MEOF cover, while previous studies suggest that spatial heterogeneity and local environmental conditions further modulate vegetation dynamics across the Northern Great Plains (Fore, 2024). Despite experiencing ample moisture in some areas in 2016 or 2018, the ‘sweetclover year’ super bloom events were limited only to 2019. This phenomenon may be attributed to MEOF’s biennial life cycle, which plays a significant role and acts as a lag effect provided average or above average conditions persist (Van Riper and Larson, 2009). A distinct drop in coverage is seen in the years of 2020 and 2021 across the south, with a recovery in 2022–2023. Moreover, MEOF with >40% percent cover was found in mostly regions that received above-average precipitation during both dry and wet years, highlighting the importance of moisture in regulating dominance. This aligns with previous studies showing that sweetclover cover can fluctuate substantially from year to year, driven by its biennial growth habit and strong germination response in years with high precipitation (Turkington et al., 1978). Although the RF model did not identify precipitation as the top predictor, our predicted MEOF cover maps showed that years of high cover (e.g., 2018 and 2019) coincided with favorable moisture conditions, whereas lower cover in 2020–2021 corresponded with drier years. This pattern supports the hypothesis that ‘sweetclover years’ of high MEOF abundance occur when favorable moisture conditions are maintained, allowing successful establishment and dominance despite losses from evapotranspiration. These favorable moisture conditions likely facilitate the successful establishment and dominance of MEOF across the Northern Great Plains rangelands, consistent with broader patterns observed for invasive species in semi-arid rangelands (Brooks et al., 2004; D’Antonio and Vitousek, 1992). Similar patterns have been observed for exotic annual grasses such as Cheatgrass (Bromus tectorum L.), Red brome (Bromus rubens L.) or Medusahead (Taeniatherum caput-medusae (L.) Nevski), which often increase under periods of favorable precipitation (Chen and Weber, 2014; Dahal et al., 2023).” RC23: Line 425: “displaying huge appearances” is odd AC23: We appreciate the referee’s suggestion. We have replaced “displaying huge appearances” with “exhibiting huge blooms” in the statement in lines 480-483. The revised sentence is “Our study offers a workflow for different plant species of annuals, biennials, or geophytes that share dominance during the bloom events, exhibiting huge blooms in specific years with differences of 4 to 10 weeks in their length and peak of the flowering period (Vidiella et al., 1999).” RC24: Line 450: NDMI was already defined in line 369. AC24: We thank the referee for pointing out. We have revised the sentence in lines 555-557 as “The variable importance results for MEOF reveal that NDMI is the most influential predictor, indicating that soil and vegetation moisture play a crucial role in supporting its invasion and growth (Figure S2).” RC25: Line 454: Or could it be proximity to roads importance is driven by sampling distribution? Are there simply more samples near road due to accessibility? And the model has put together a false association? AC25: We agree with the referee’s point. To address this, we added the following sentence in lines 563-567: “This pattern is also consistent with our field surveys, where a higher percent cover of MEOF was observed closer to roads compared to the interior of plots. Nevertheless, the importance of road proximity should be interpreted cautiously, as greater sampling accessibility near roads may have partially inflated its role in the model.” RC26: Line 471: “though it also increased RMSE” could this be simply down to a higher mean in the new maps? Seems this is the case based on lines 489-494. The higher cover relating to higher RMSE suggests to use a different stat such as nRMSE to account for this. AC26: We thank the referee for this observation. To clarify the relationship between RMSE and cover distribution, we added a supplementary table showing year-wise normalized RMSE (nRMSE) (Table S9). We also included a paragraph in the Results section addressing year-wise variation in model predictions. This paragraph explains that while some years (e.g., 2023) had high RMSE despite a large sample size, this was likely due to spatial clustering and the reduced ability of the model to predict extreme cover values, rather than insufficient data. Conversely, years with broader variability in observed cover (e.g., 2018) showed relatively low nRMSE, indicating that the model effectively captured patterns across the cover range. The added paragraph in lines 461-476 is as follows: “Year-wise evaluation of model performance revealed considerable variation in normalized RMSE (nRMSE), which ranged from 0.12 in 2022 to 0.65 in 2023 (Table S9). The year-wise sample distribution of observed MEOF cover could be a partial reason for these differences. In 2018, the observed cover exhibited the greatest variability (CV = 0.51) and reached a maximum cover of 81%. However, the nRMSE remained low (0.19), indicating that the model effectively captured patterns in years with a broader range of values. Conversely, 2023 exhibited the highest error (nRMSE = 0.657) despite having the 100% maximum cover and the lowest variability (CV = 0.25). This high error occurred despite a relatively large sample size, likely due to spatial clustering and the reduced ability of the model to predict extreme cover values. Consequently, the model's capacity to generalize to high-cover conditions was restricted. Similarly, 2020 had a moderate maximum cover (56%) but relatively high error (nRMSE = 0.55), which may reflect imbalances in sample distribution across cover classes. In contrast, the most optimal overall performance was achieved in 2022 (max = 57%, CV = 0.38) (nRMSE = 0.124), which implies that predictive accuracy is enhanced by balanced sampling across cover ranges. These results emphasize that the distribution and variability of cover values across years have a significant impact on predictive performance, although increasing the sample size improves model stability. We also revised the discussion in Section 4.3 lines 581–599 to explicitly link the increase in RMSE to the inclusion of a wider range of percent cover values, consistent with the year-wise nRMSE results. The revised lines are as follows: “Thus, while temporal imbalance in samples (e.g., more samples from bloom years such as 2019 and 2023) influenced the overall distribution of training data, spatial balance and adequate coverage across the full percent cover range were the most critical factors for model accuracy. We found that increasing the sample size and ensuring a more balanced distribution significantly improved model performance, raising R² from 0.55 (Saraf et al., 2023) to 0.76. RMSE increased from 7% to 15%, reflecting the inclusion of a wider range of percent cover values rather than insufficient sample size or overall imbalance. Saraf et al. (2023) reported that their model underestimated high percent cover due to a limited sample size (n = 1,612). In contrast, our model utilized a larger and more evenly distributed sample (n = 11,235) across years, improving predictive accuracy and the representation of extreme cover values. These findings suggest that balanced sample sizes enhance both the predictive range and accuracy of RF models, although temporal imbalance in certain years may still influence RMSE and require further investigation. Moreover, it is noteworthy to highlight that it is difficult to fully stratify samples temporally for a biennial species like MEOF, which remains dormant during certain seasons and blooms only under specific environmental conditions.” RC27 (a): Lines 469-481: The increase in sample size in the current map vs the older one is described is improving results. Is this not in opposition to Line 357 “We noticed that the reduction in sample size had little-to-no effect”? AC27(a): We understand the referee’s concern. Therefore, we added the following sentence in lines 581-583 at the beginning of section 4.3. “It is important to note that reducing the sample size from 22,972 to 11,235 due to high spatial correlation did not substantially affect model performance. However, in comparison to Saraf et al. (2023), a much larger overall sample size was required to improve predictive accuracy.” RC27 (b) Regarding the temporal imbalance in samples. Is there just one MEOF RF model built through time, in which case the temporal imbalance should be largely irrelevant. Or is a unique MEOF model built for each year, in which case the temporal and spatial balance would be key. AC27 (b): We acknowledge the referee’s concern and have added the following statement in lines 583-588 in Section 4.3 to clarify the methodology used: “We developed a single generalized RF model across all years (2016–2023) and applied it to predict MEOF cover annually. Thus, while temporal imbalance in samples (e.g., more samples from bloom years such as 2019 and 2023) influenced the overall distribution of training data, spatial balance and adequate coverage across the full percent cover range were the most critical factors for model accuracy.” RC28: Line 497: “We manually delineated polygons of invasive MEOF presence, which were then used to train the RF classifier.” But this was done on UAS imagery, not directly for the RF classification as implied here. AC28: We thank the referee for pointing this out. We have revised the sentence in lines 615-617 as “We manually delineated MEOF presence and absence polygons on the UAS imagery, which were used to train and validate the RF classification model. The resulting classified image was then used to derive continuous, wall-to-wall fractional cover estimates across the UAV sites. ” RC29: Lines 497-516: Your approach of scaling observations with UAS imagery to serve as training over a broader landscape is similar in some respects to Rigge et al. 2020 (https://www.mdpi.com/2072-4292/12/3/412). AC29: We agree with the referee’s point. To acknowledge this similarity in the methodology, we have added the following sentence in lines 639-642 at the end of Section 4.4. “Our approach of scaling UAS-derived observations to develop percent cover estimates at broader spatial scales is conceptually similar to Rigge et al., (2020), who demonstrated the utility of integrating high-resolution reference data to improve landscape-scale predictions of rangeland vegetation cover.” RC30: Line 526: “The prediction map for 2023….” This is the 3rd time this pattern has been discussed. AC30: We agree with the referee’s comment and have removed this statement to avoid redundancy. RC31: Lines 529-537: This entire section has nothing to do with validation (the section title). And much of it is introduction-type content. AC31: We agree with the referee’s comment. We have revised this section in lines 645-657 to remove introductory material and ensure the section focuses solely on validation results. The complete revised section is given below: “We validated the predicted MEOF cover maps using four independent UAS-validation sites. Predictions showed strong correlation with observed MEOF cover derived from UAS imagery, with an R² of 0.71, RMSE of 17.81%, MAE of 13.17%, and MAPE of 4.89% (Figure 6, Figure S6). The visual comparison of the predicted maps with UAS imagery at the four validation sites showed that the model generally captured the spatial patterns of MEOF cover. We found that the prediction model underestimated the high percent cover range and overestimated the low to no percent cover regions. In 2023, only 0.76% (621.4 km²) of the total rangeland area (81,442 km²) showed cover exceeding 50%, supporting field observations of widespread MEOF blooms in specific regions. The prominent yellow blooms of MEOF are readily visible in UAS and satellite imagery when found in adequately big clusters, hence supporting the reliability of the model predictions. The validation results demonstrate that the Random Forest model effectively captures spatial variation in MEOF cover throughout the study area, providing a solid basis for assessing invasion intensity on a landscape scale.” RC32: Line 553: “Our model does not explain the variation in the MEOF cover that has biennial life cycle.” What? Isn’t that precisely the point of making time-series maps for 2016-2023? AC32: We agree with the referee’s point. We have revised the sentence in lines 671-675 as “Our model does not explicitly incorporate the biennial life cycle of MEOF; rather, we capture this variation indirectly by generating annual time-series maps (2016–2023) that reflect differences in cover between bloom and non-bloom years. Most of the observed MEOF cover samples were collected during the second year of its life cycle to enable capture of its flowering stage.” RC33: Line 561: I don’t understand the point on HLS. Was HLS data actually used in modelling? And how is the sentence related to line 563 “We resolved this issue”? I don’t see the connection. AC33: We thank the referee for pointing this out. To avoid confusion, we have removed these lines. RC34: Line 849: Your group collected all of the ~23K observations? Are you using any BLM AIM data? Looking at Table S1, data from AIM, NEON, etc. was used. This should be described in the methods. Also, the “land cover map” should be more clearly defined as NLCD, date x. AC34: As suggested, we have revised the caption in lines 1016-1021 for Figure 1 as “The top panel shows field observations used in this study (n = 22,972) collected from 2016 to 2023 across the Northern Great Plains, including our own surveys as well as publicly available datasets such as BLM AIM and NEON (© Esri, Maxar, Earthstar Geographics, and the GIS User Community). The bottom panel shows the UAS training and validation sites overlaid on the National Land Cover Database (NLCD, 2019) land cover map with county boundaries of western South Dakota.” We also included a statement acknowledging all the sources mentioned in Table S1and S2 in section 2.3. The revised sentences are “We retrieved 17,689 MEOF cover samples from multiple federal, state, and non-governmental sources for 2016–2022 across four states: South Dakota, North Dakota, Montana, and Wyoming (Figure 1a; Table S1). These sources included RCMAP data from the USGS Center for Earth Resources Observation & Science, USGS Northern Rocky Mountain Science Center (Montana), the Bureau of Land Management (BLM) database, the Northern Great Plains Inventory & Monitoring Network, the National Ecological Observatory Network (NEON), and the Montana Natural Heritage Program. The source, year-wise distribution, and frequency of the samples are summarized in Tables S2 and S3.” RC35: Line 857 (Figure 3) should the range of percent cover be 0-100 not 0-1? AC35: We thank referee for pointing out the mistake. We have corrected the percent cover range in the revised figure in Page 29. RC36: Line 867: “Yellow sweetclover percent cover estimates in the high yellow sweetclover probability” is quite awkward. Rephrase. AC36: We understand the referee’s concern. We have rephrased the caption for Figure 5 as “Comparison of yellow sweetclover (Melilotus officinalis) cover in western South Dakota rangelands for 2019. (a) Percent cover estimates from Saraf et al. (2023) based on 1,612 samples, showing areas with high probability of yellow sweetclover occurrence. (b) Predicted percent cover from the current study using 11,235 samples, highlighting the updated yellow sweetclover cover estimates from Saraf et al. (2023).” RC37: Figure 5. The point of this figure is to contrast the older Saraf 2023 map with the new one from the current paper? This needs to be made much more clear in the caption. AC37: As suggested, we have rephrased the caption in lines 1036-1040 mentioned above for more clarity. RC38: All figures, figures 3, 4, and 5 all have different color ramps (symbology). Pick one and stick with it for all figures. AC38: As suggested, we have revised the figure 3 and 4 (Pages 29 and 30) to match the color ramp with the remaining figures (figures 5, 6 and 7). RC39: Line 876 (Figure 7 caption): “Predicted percent cover estimates for invasive yellow sweetclover (MEOF) in panel (a) at four different sites represented with numbers and each site is compared with the PlanetScope imagery available at 3 m resolution shown in green, green, and blue band combination to highlight yellow sweetclover blooms in panel (b). (PlanetScope imagery © Planet Labs PBC).” This is all super confusing. I would add a label “2019” over the left set of panels and “2023” over the right set, and change the text to: “Predicted percent cover estimates for invasive yellow sweetclover (MEOF) at four different sites represented with numbers for 2019 (left) and 2023 (right). In each site, a) 3 m resolution PlanetScope imagery shown in green, green, and blue band combination to highlight yellow sweetclover blooms, b) fractional cover of MEOF. (PlanetScope imagery © Planet Labs PBC).” AC39: We appreciate the referee’s suggestion. We have rearranged the panels in figure7 in Page 33 and revised the caption (lines 1046-1050) as suggested by the referee. Figure 7. Predicted percent cover estimates for invasive yellow sweetclover (MEOF) at four different sites represented with numbers for 2019 (left) and 2023 (right). In each site, (a) 3 m resolution PlanetScope imagery shown in green, green, and blue band combination to highlight yellow sweetclover blooms, and (b) fractional cover of MEOF. (PlanetScope imagery © Planet Labs PBC). 
 
- 
                                        
                                     AC1:  'Response to Referee 1', Sakshi Saraf, 11 Oct 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        
- 
                     RC2:  'Comment on essd-2025-353', Anonymous Referee #2, 09 Sep 2025
            
                        
            
                            
                    
            
            
            
                        Remote sensing of invasive species is an important research topic. Here, the authors combine Sentinel-2 data with drone imagery to map invasive yellow sweetclover blooms across western South Dakota in a multiannual approach using machine learning. This study could be an interesting example of how remote sensing can be applied in monitoring plant invasions. Further details about data analysis need to be added to the methods. Some results need to be supported by data. Some parts of the discussion need to be moved to the results section, and the discussion would benefit from further interpretation of the results which includes citing external literature. I have a couple of questions/remarks: L63: “However, previous studies often miss important data…”: Data on what? L83: The spatial resolution depends on the size of the target species and/or age of the plant individual. Reformulate. L132: What do you mean by a generalized model? L141: From the logic of the introduction it is not clear why this validation step using PS imagery is required. Refine. L199: How were the ten sites selected? L204ff: So the cover of smaller plots were regarded as representative for the larger plots? Were they averaged? Please elaborate. L209ff: How do these samples compare to the samples described in L204ff? Were they sampled in a similar way? L215ff: Was there any non-flowering MEOF in your plots, and was the cover of non-flowering MEOF estimated? L225: Hyperparameter tuning was performed to optimize which accuracy parameter? L228: Which threshold was chosen for the binary classification? L237ff, L256, L318ff: If data based on an RF model is used to perform another analysis using RF (or any other kind of model), can this lead to error propagation? L239: Why did you decided to use linear regression and not any other type of approach? L250: I think the methods needs a workflow diagram to visualize how which data was used for what purpose. L289: How were the data resampled? L309: Write RF out and cite it earlier in the manuscript. L309: Overlaid or extracted? L330: Distinguishing flowering MEOF pixel? L345: RF predictions of what? L348: Name the predictors or at least the most important groups. L362ff: Support this result statement with data. L377ff: Could mass blooming be affected rather by ground water parameters than precipitation? Were any patterns observed regarding closeness to floodplains? Maybe further analysis focussing on watersheds could also help to understand the mass blooming. L378-383: “This unexpected result may be due to the large disparity in spatial resolution between Sentinel-derived variables at 10 m and the 1 km climate variables, with the 10,000-fold difference in spatial resolution contributing to an underestimation of precipitation as a significant variable. Therefore, we created a MEOF percent cover map series for 2016 through 2023 and compared it with precipitation anomaly maps during the same period computed using the Daymet dataset product.”: This sounds like a results. And I don’t really understand the latter part. Please reformulate. L390: What does CV stand for? L377-396: This sounds like a results section. Please reformulate. L415ff: How did the time-series maps support the hypothesis? I don’t see this in your line of argumentation. L398-423: This section lacks external references. Support your interpretation with references to existing literature. L427: Why does particularly the bloom trigger changes in soil nitrogen content? Is it not generally an N-fixing species? L443-446: The ecological consequences are an interesting aspect to be discussed. I think this aspect could be elaborated further. L465: Which data show that local moisture dynamics and human disturbance play a critical role? Explore this further. L478: Is there any way to deal with unbalanced data sets? Can you really relate the increased RSME with the imbalanced date set? L497: Why did you not use the manually delineated polygons for modelling instead of the modelled cover values to avoid problems of error propagation? L539ff: The whole section sounds like results. Reformulate and/or remove. L548-550: Support this statement with data. It would rather belong to results. In the discussion, further interpretation of the results are needed. L569: Do you mean PlanetScope data when referring to high-resolution mapping? What could be limitations of PlanetScope data? Citation: https://doi.org/10.5194/essd-2025-353-RC2 - 
                                        
                                     AC2:  'Response to Referee 2', Sakshi Saraf, 11 Oct 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        Remote sensing of invasive species is an important research topic. Here, the authors combine Sentinel-2 data with drone imagery to map invasive yellow sweetclover blooms across western South Dakota in a multiannual approach using machine learning. This study could be an interesting example of how remote sensing can be applied in monitoring plant invasions. Further details about data analysis need to be added to the methods. Some results need to be supported by data. Some parts of the discussion need to be moved to the results section, and the discussion would benefit from further interpretation of the results which includes citing external literature. We sincerely thank the referee for their constructive feedback. We have added further details on data analysis in the Methods, moved results-related content from the Discussion to the Results section, and provided data support for key findings. The suggested revisions have improved clarity, strengthened data support, and placed the study within the broader context of remote sensing applications in plant invasion monitoring. Note-The page and line number reference are based on the track changes/colored document provided in the supplementary. RC: Referee’s comment, AC: Author’s comment/response Specific comments: RC1: L63: “However, previous studies often miss important data…”: Data on what? AC1: We thank the referee for pointing this out. We have clarified in the revised manuscript that the missing data refer specifically to spatiotemporal information on invasion dynamics (e.g., species cover, spread rates, and environmental drivers). The revised statement in lines 64-67 is “However, previous studies often lack important spatiotemporal data on invasion dynamics, such as changes in species cover, spread rates, and environmental drivers, making it difficult to fully understand invasion processes that unfold continuously across space and time (Larson et al., 2020).” RC2: L83: The spatial resolution depends on the size of the target species and/or age of the plant individual. Reformulate. AC2: We thank referee for suggesting this correction. We have revised the statement in lines 87-90 as “Invasive forbs such as MEOF develop yellow inflorescences that are prominent during flowering time and can be detected using 10 m resolution Sentinel-2 derived reflectance and quantitative indices, provided the plants meet the optimal size or developmental stage for detection (Saraf et al., 2023).” RC3: L132: What do you mean by a generalized model? AC3: We thank the referee for the comment. We have revised the sentence in lines 143-145as “Developing a generalized model that can be implemented across space and time allows for efficient mapping of irruptive invasive plant species that bloom episodically and form clustered patches.” RC4: L141: From the logic of the introduction it is not clear why this validation step using PS imagery is required. Refine. AC4: We thank referee for the suggestion. We have revised the third objective in lines 154-157 as: “(3) to further validate the predicted yellow sweetclover maps using PlanetScope imagery, which provides higher temporal resolution and independent data for cross-sensor validation, and to assess MEOF cover in regions lacking UAS coverage.” RC5: L199: How were the ten sites selected? AC5: We appreciate the referee’s concern and have added the following clarification to the manuscript in lines 217-222: “All 14 sites captured the observed range of MEOF percent cover, but they differed in total area covered by MEOF presence and the number of samples derived from each site. To ensure a balanced split, the 10 smaller sites were randomly selected for training the RF model, while the remaining four larger sites were reserved for validation. This approach ensured that both the training and validation sets contained approximately equal numbers of samples, providing an unbiased assessment of model performance.” RC6: L204ff: So the cover of smaller plots were regarded as representative for the larger plots? Were they averaged? Please elaborate. AC6: We appreciate the referee’s comment and have revised the sentence in lines 241-246 as “Within each 30 m × 30 m plot, a minimum of three 0.5 m × 0.5 m quadrats were sampled. Percent cover for each plot was calculated as the average of the quadrat measurements, with each quadrat considered representative of its portion of the plot. Within each quadrat, we estimated percent cover of MEOF by averaging the grids it occupied, allowing fine-resolution observations to be scaled up to the plot level while capturing spatial variability (John et al., 2018).” RC7: L209ff: How do these samples compare to the samples described in L204ff? Were they sampled in a similar way? AC7: We thank the referee for this comment. We have clarified in the revised manuscript that the historical samples were obtained using different field protocols but were integrated with our field-collected data to increase spatial and temporal coverage. The revised paragraph in lines 225-254 are as follows: “We used a total of 22,972 MEOF percent cover samples collected across western South Dakota rangelands and surrounding regions during 2016-2023 (Table S1). This included 5,283 samples derived from UAS imagery collected during the peak blooming months (June–August) in 2023 (details in Sections 2.2 and 2.4) across western South Dakota rangelands. In addition, 17,689 MEOF cover samples were retrieved and synthesized from multiple federal, state, and non-governmental sources for 2016–2022 across four states: South Dakota, North Dakota, Montana, and Wyoming (Figure 1a; Table S1). Although the historical samples were obtained using different field protocols, they were integrated with our field-collected data to increase spatial and temporal coverage. These sources included RCMAP data from the USGS Center for Earth Resources Observation & Science, USGS Northern Rocky Mountain Science Center (Montana), the Bureau of Land Management (BLM) database, the Northern Great Plains Inventory & Monitoring Network, the National Ecological Observatory Network (NEON), and the Montana Natural Heritage Program. The source, year-wise distribution, and frequency of the samples are summarized in Tables S2 and S3. At the 10 m mapping scale, this compilation provided a suitable reference for model training and validation. Our field-collected surveys recorded the plant species composition, including dominant species and percent cover of all species present, using the conventional plot-based quadrat method. Within each 30 m × 30 m plot, a minimum of three 0.5 m × 0.5 m quadrats were sampled. Percent cover for each plot was calculated as the average of the quadrat measurements, with each quadrat considered representative of its portion of the plot. Within each quadrat, we estimated percent cover of MEOF by averaging the grids it occupied, allowing fine-resolution observations to be scaled up to the plot level while capturing spatial variability (John et al., 2018). We recorded flowering and non-flowering MEOF individuals separately. The separation was done to document phenological variability and population structure, which can be useful for understanding interannual flowering dynamics in future analyses. However, only the flowering MEOF percent cover was used for remote sensing–based mapping, as flowering individuals exhibit a distinct spectral signal that can be consistently detected in aerial and satellite imagery. This approach ensured that the satellite-derived cover estimates corresponded specifically to the detectable, flowering component of MEOF. For 2023, the GPS locations of the field-collected quadrat samples were utilized as the ground control points for enhancing the processing of drone imagery to derive percent cover samples.” RC8: L215ff: Was there any non-flowering MEOF in your plots, and was the cover of non-flowering MEOF estimated? AC8: Thank you for the comment. In our field surveys, we recorded flowering and non-flowering MEOF individuals separately. The separation was done to document phenological variability and population structure, which can be useful for understanding interannual flowering dynamics in future analyses. However, only the flowering MEOF percent cover was used for remote sensing–based mapping, as flowering individuals exhibit a distinct spectral signal that can be consistently detected in aerial and satellite imagery. This approach ensured that the satellite-derived cover estimates corresponded specifically to the detectable, flowering component of MEOF. We have clarified this in section 2.3 in lines 246-252 (paragraph added in the previous comment) to the revised manuscript. RC9: L225: Hyperparameter tuning was performed to optimize which accuracy parameter? AC9: We appreciate the referee’s comment. We have revised the statement in lines 267-269 as “ We tuned the Random Forest hyperparameters (mtry = 4, ntrees = 1500) to optimize model predictive performance, specifically by minimizing the Root Mean Square Error (RMSE) using 10-fold, 5-repeat cross-validation.” RC10: L228: Which threshold was chosen for the binary classification? AC10: We thank the referee for the question. We have revised the statement in lines 271-274 as “We converted the continuous Random Forest predictions to binary presence/absence using a threshold of 0.5, assigning pixels with predicted probability ≥ 0.5 as MEOF presence (assigned as 1) and pixels < 0.5 as absence (assigned as 0) (Josso et al., 2023; Steen et al., 2021).” RC11: L237ff, L256, L318ff: If data based on an RF model is used to perform another analysis using RF (or any other kind of model), can this lead to error propagation? AC11: Thank you for the comment. We acknowledge that using Random Forest (RF)–derived percent cover estimates as input for further analyses could introduce some degree of error propagation. To minimize this, we calibrated the RF-derived values against independent field observations using a leave-one-out jackknife procedure. We used linear regression for calibration because it provides a simple and transparent way to correct systematic biases in the RF predictions. This approach ensures that each predicted value is validated independently of the data used for model training, reducing overfitting and mitigating bias. RC12: L239: Why did you decided to use linear regression and not any other type of approach? AC12: We appreciate the referee’s comment. We have clarified the calibration procedure and the rationale for using linear regression in section 2.4 in the revised manuscript. We have revised the statement in lines 288-292 as “We used linear regression to calibrate RF-derived percent cover estimates because it provides a simple and transparent way to correct systematic biases. To ensure unbiased predictions and minimize overfitting, we applied a leave-one-out jackknife procedure, where each observation was predicted independently of the data used to fit the model (Wolter, 2007).” RC13: L250: I think the methods needs a workflow diagram to visualize how which data was used for what purpose. AC13: We thank the referee for the suggestion. We have added the reference to workflow diagram (Figure 2) in lines 292-295 and cited it accordingly throughout the methods section. RC14: L289: How were the data resampled? AC14: We thank the referee for the comment. We have added the sentence in lines 334-337 to clarify the procedure: “All variables were resampled to 10 m resolution and projected in Albers Equal Area projection and WGS 84 datum. We used bilinear interpolation for predictor variables to preserve data integrity during resampling.” RC15: L309: Write RF out and cite it earlier in the manuscript. AC15: We thank the referee for pointing this out. We now spell out Random Forest (RF) at its first mention in the Introduction and cite Breiman (1984) there. In the Methods and later sections, we use the abbreviation RF consistently. RC16: L309: Overlaid or extracted? AC16: As suggested, we have replaced the ‘overlaid’ with ‘extracted’ in the sentence in lines 360-361. The revised sentence is “We constructed a predictor variable database by extracting observed sample points from the satellite-derived predictor variables (rasters) for training the RF model.” RC17: L330: Distinguishing flowering MEOF pixel? AC17: We thank the referee for the comment. We have corrected the sentence in lines 380-382 as “The developed RF classification model exhibited an overall accuracy of 98.76% and kappa of 0.97 in distinguishing flowering MEOF pixels.” RC18: L345: RF predictions of what? AC18: We thank the referee for the suggestion. The title of Section 3.2 has been revised to “Regional-scale Random Forest predictions of MEOF cover” in the manuscript. RC19: L348: Name the predictors or at least the most important groups. AC19: As suggested, we have included the names of the predictors and added the following statement to the manuscript in lines 396-404: “The top 13 predictor variables included climatic variables — mean annual precipitation (MAP), coefficient of variation of MAP (MAPcv), mean annual temperature (MAT), coefficient of variation of MAT (MATcv), snow depth (SnowDepth), and coefficient of variation of snow depth (SnowDepth_cv); topographic variables — elevation (Elevation) and slope (Slope); proximity to roads (Dist_Roads); and remote sensing indices capturing moisture and vegetation properties —Normalized Difference Moisture Index (NDMI), coefficient of variation of Normalized Difference Water Index (NDWIcv), coefficient of variation of Land Surface Water Index (LSWIcv), and coefficient of variation of Tasseled Cap Wetness (TCWcv; Table 2).” RC20: L362ff: Support this result statement with data. AC20: We thank the referee for the comment. We have removed the statement regarding MEOF cover following moisture gradients, as the available data do not consistently support this pattern. RC21: L377ff: Could mass blooming be affected rather by ground water parameters than precipitation? Were any patterns observed regarding closeness to floodplains? Maybe further analysis focusing on watersheds could also help to understand the mass blooming. AC21: We thank the referee for these insightful suggestions. We agree that local groundwater availability and soil moisture may influence MEOF blooms in addition to precipitation. While we observed higher cover near floodplain regions in certain years, this pattern was not consistent across all years. We have added a statement in the manuscript noting that future analyses incorporating watershed and hydrological variables could help clarify the environmental drivers of mass blooming events. We have added the following statements in lines 505-513 in the discussion section 4.1: “There is a possibility that MEOF blooms could be influenced not just by precipitation but also by local groundwater availability or soil moisture, particularly in areas near floodplains. While we observed some higher cover near floodplain regions in certain years, the pattern was not consistent across all years. Future analyses focusing on watersheds and hydrological variables could help clarify the environmental drivers of bloom events. Overall, our findings suggest that climate contributes to interannual variation in MEOF cover, while previous studies suggest that spatial heterogeneity and local environmental conditions further modulate vegetation dynamics across the Northern Great Plains (Fore, 2024).” RC22: L378-383: “This unexpected result may be due to the large disparity in spatial resolution between Sentinel-derived variables at 10 m and the 1 km climate variables, with the 10,000-fold difference in spatial resolution contributing to an underestimation of precipitation as a significant variable. Therefore, we created a MEOF percent cover map series for 2016 through 2023 and compared it with precipitation anomaly maps during the same period computed using the Daymet dataset product.”: This sounds like a results. And I don’t really understand the latter part. Please reformulate. AC22: We thank the referee for this helpful comment. In response, we have split the first couple of paragraphs of Section 4.1 into Results and Discussion. Observed MEOF patterns are now fully described in the Results section, while the Discussion focuses on interpretation. We also explicitly acknowledge that mass blooming may be influenced not only by precipitation but also by local groundwater availability or soil moisture, particularly near floodplains, and we suggest future analyses incorporating hydrological variables. These changes clarify the role of climate versus local environmental factors and improve the logical flow of the manuscript. The three paragraphs added in the result section in lines 427-476: “We created a MEOF percent cover map series for 2016–2023 and compared it with precipitation anomaly maps to assess the potential relationship between MEOF cover and interannual climatic variability. These precipitation anomaly maps showed that the western SD witnessed above-average precipitation in a few regions for 2018 and 2023 and most of the western SD for 2019 (Figure S4). The central and eastern counties in 2019 and the central and southern counties in 2023 showed a greater range of MEOF covers showing a consistent pattern of MEOF resurgence with the return of wet conditions. Despite 2016 being a relatively normal or slightly dry year, sweetclover cover remained moderate with less spatial variability, indicating less widespread establishment. The widespread establishment of MEOF could be seen increasing in 2018, with a high Coefficient of Variation (CV) of 0.5 and the percent cover reached a peak in the subsequent year of 2019. For the years 2020, 2021 and 2022, most regions experienced average to below-average rainfall conditions. During these years, the MEOF percent cover reached up to 50%, with a sharp drop in percent cover in 2021, where the maximum cover was only 43%. This showed drought conditions likely limit growth and establishment. The year 2020 and 2022 acted as transitional years, possibly due to lagged ecological response. For dry years, the majority of western SD predicted less than 50% cover. Overall, we found a high percent cover range in the western counties of western SD including Butte, Meade, Pennington, Custer, Fall River, Jackson, Bennet and Oglala Lakota counties. Central South Dakota counties showed fluctuating trends, with moderate to high coverage in some years (e.g., 2018, 2019, 2023) and relatively low coverage in other years (e.g., 2020, 2021), whereas the eastern counties (i.e., Corson, Dewey, and Stanley) consistently exhibited relatively low percent cover (<20%) for the majority of years. In the eastern region, MEOF appeared to be more scattered and patchier with fewer patches of higher percent cover near floodplains, which are situated at lower elevations and benefit from high moisture availability especially in the years 2018 and 2019. During the summer fieldwork of 2022, we observed MEOF predominantly in the first year of its life cycle. In the following year, we observed ample coverage of MEOF blooms in Butte County, SD forming patches substantial enough to be captured by the drones. This temporal pattern arises from the biennial growth period of MEOF. Additionally, we predicted MEOF percent cover estimates for the year 2024 using our trained model (Figure S5). However, this 2024 prediction has not yet been validated due to the unavailability of field data. Validation of model performance for 2024 and subsequent years remains a key focus for future work. Year-wise evaluation of model performance revealed considerable variation in normalized RMSE (nRMSE), which ranged from 0.12 in 2022 to 0.65 in 2023 (Table S9). The year-wise sample distribution of observed MEOF cover could be a partial reason for these differences. In 2018, the observed cover exhibited the greatest variability (CV = 0.51) and reached a maximum cover of 81%. However, the nRMSE remained low (0.19), indicating that the model effectively captured patterns in years with a broader range of values. Conversely, 2023 exhibited the highest error (nRMSE = 0.657) despite having the 100% maximum cover and the lowest variability (CV = 0.25). This high error occurred despite a relatively large sample size, likely due to spatial clustering and the reduced ability of the model to predict extreme cover values. Consequently, the model's capacity to generalize to high-cover conditions was restricted. Similarly, 2020 had a moderate maximum cover (56%) but relatively high error (nRMSE = 0.55), which may reflect imbalances in sample distribution across cover classes. In contrast, the most optimal overall performance was achieved in 2022 (max = 57%, CV = 0.38) (nRMSE = 0.124), which implies that predictive accuracy is enhanced by balanced sampling across cover ranges. These results emphasize that the distribution and variability of cover values across years have a significant impact on predictive performance, although increasing the sample size improves model stability.” Revised paragraphs in the Discussion section in lines 495-536: “The occurrence of sweetclover years is predominantly associated with wetter conditions, suggesting that precipitation plays a key role in the resurgence of MEOF (Gucker, 2009). Despite this, climate variables such as annual precipitation or snow depth, did not rank among the top predicting variables. This may be due to MEOF’s biennial life cycle, where precipitation from the previous year can influence current-year cover (Klebesadel, 1992; Van Riper and Larson, 2009). We tested this by including biennial precipitation (MAP2). However, due to its high correlation with annual precipitation (MAP) and the higher relative importance of MAP, neither variable alone, at the coarser 1 km resolution, adequately captured the biennial dynamics. This unexpected result may be due to the large disparity in spatial resolution between Sentinel-derived variables at 10 m and the 1 km climate variables, which likely contributed to an underestimation of precipitation’s importance in the model (Latimer et al., 2006). There is a possibility that MEOF blooms could be influenced not just by precipitation but also by local groundwater availability or soil moisture, particularly in areas near floodplains. While we observed some higher cover near floodplain regions in certain years, the pattern was not consistent across all years. Future analyses focusing on watersheds and hydrological variables could help clarify the environmental drivers of bloom events. Overall, our findings suggest that climate contributes to interannual variation in MEOF cover, while previous studies suggest that spatial heterogeneity and local environmental conditions further modulate vegetation dynamics across the Northern Great Plains (Fore, 2024). Despite experiencing ample moisture in some areas in 2016 or 2018, the ‘sweetclover year’ super blooms were limited only to 2019. This phenomenon may be attributed to MEOF’s biennial life cycle, which plays a significant role and acts as a lag effect provided average or above average conditions persist (Van Riper and Larson, 2009). A distinct drop in coverage is seen in the years of 2020 and 2021 across the south, with a recovery in 2022–2023. Moreover, MEOF with >40% percent cover was found in mostly regions that received above-average precipitation during both dry and wet years, highlighting the importance of moisture in regulating dominance. This aligns with previous studies showing that sweetclover cover can fluctuate substantially from year to year, driven by its biennial growth habit and strong germination response in years with high precipitation (Turkington et al., 1978). Although the RF model did not identify precipitation as the top predictor, our predicted MEOF cover maps showed that years of high cover (e.g., 2018 and 2019) coincided with favorable moisture conditions, whereas lower cover in 2020–2021 corresponded with drier years. This pattern supports the hypothesis that ‘sweetclover years’ of high MEOF abundance occur when favorable moisture conditions are maintained, allowing successful establishment and dominance despite losses from evapotranspiration. These favorable moisture conditions likely facilitate the successful establishment and dominance of MEOF across the Northern Great Plains rangelands, consistent with broader patterns observed for invasive species in semi-arid rangelands (Brooks et al., 2004; D’Antonio and Vitousek, 1992) . Similar patterns have been observed for exotic annual grasses such as Cheatgrass (Bromus tectorum L.), Red brome (Bromus rubens L.) or Medusahead (Taeniatherum caput-medusae (L.) Nevski), which often increase under periods of favorable precipitation (Chen and Weber, 2014; Dahal et al., 2023).” RC23: L390: What does CV stand for? AC23: We have spelled out CV and revised statement as “ The widespread establishment of MEOF could be seen increasing in 2018 with high Coefficient of Variation (CV) of 0.5 and then it’s percent cover reached a peak in the subsequent year of 2019.” RC24: L377-396: This sounds like a results section. Please reformulate. AC24: As suggested, we have split the first couple of paragraphs of Section 4.1 into two parts. The portion describing observed MEOF patterns has been moved to the Results section, while the remaining portion stays in the Discussion. We also added relevant citations to support interpretation and maintain a logical flow. The added paragraphs in the Results section and the revised Discussion paragraphs are provided in our response to a previous comment. RC25: L415ff: How did the time-series maps support the hypothesis? I don’t see this in your line of argumentation. AC25: We thank the referee for this comment. To clarify how the time-series maps support our hypothesis, we have revised the statement in lines 524-530 as follows: “Although the RF model did not identify precipitation as the top predictor, our predicted MEOF cover maps showed that years of high cover (e.g., 2018 and 2019) coincided with favorable moisture conditions, whereas lower cover in 2020–2021 corresponded with drier years. This pattern supports the hypothesis that ‘sweetclover years’ of high MEOF abundance occur when favorable moisture conditions are maintained, allowing successful establishment and dominance despite losses from evapotranspiration.” RC26: L398-423: This section lacks external references. Support your interpretation with references to existing literature. AC26: We thank the referee for this suggestion. As recommended, we have moved this section to the Results. The remaining portion in the Discussion now includes references to maintain logical flow and situate our findings within the existing literature. RC27: L427: Why does particularly the bloom trigger changes in soil nitrogen content? Is it not generally an N-fixing species? AC27: We thank the referee for this helpful suggestion. We have revised the statement in lines 483-487 as follows: “These blooms cause a sudden increase in annual net primary production, triggering relevant changes in the ecosystem such as increases in soil nitrogen content due to N-fixation, temporary plant composition modifications, attraction of predators, etc. (Jaksic, 2001), as well as changes in the local climate: an increase in evapotranspiration and a decrease in albedo (He et al., 2017).” RC28: L443-446: The ecological consequences are an interesting aspect to be discussed. I think this aspect could be elaborated further. AC28: We thank the referee for this suggestion. To elaborate on the ecological consequences of MEOF invasion, we have added the following statements: “Furthermore, the database supports investigation of the ecological consequences of MEOF invasion. For example, MEOF’s nitrogen-fixing ability may alter soil nutrient dynamics, potentially facilitate its own dominance while affect native plant communities. Increased MEOF cover could lead to declines in native species richness, shifts in plant community composition, and changes in ecosystem processes such as nutrient cycling and primary productivity, particularly in nitrogen-limited prairie ecosystems. Understanding these impacts is critical for predicting long-term vegetation changes and developing targeted management strategies.” RC29: L465: Which data show that local moisture dynamics and human disturbance play a critical role? Explore this further. AC29: We thank the referee for this comment. To clarify which data support the role of local moisture dynamics and human disturbance, we have revised the statement as follows: “Overall, our results suggest that local moisture dynamics, captured by NDMI and NDWIcv, and human disturbances, reflected by proximity to roads, are stronger determinants of MEOF distribution at fine spatial scales than coarser-resolution climatic variables (snow depth, MAP, MAT, and their variability). Although climate may establish broad-scale suitability, our data indicate that MEOF invasion patterns in western South Dakota are primarily influenced by local hydrological conditions and human-mediated dispersal.” RC30: L478: Is there any way to deal with unbalanced data sets? Can you really relate the increased RSME with the imbalanced date set? AC30: We thank the referee for this comment. To address concerns regarding unbalanced datasets, we added a couple of statements in the manuscript and revised this paragraph in lines 581-599 as follows: “It is important to note that reducing the sample size from 22,972 to 11,235 due to high spatial correlation did not substantially affect model performance. However, in comparison to Saraf et al., (2023), a much larger overall sample size was required to improve predictive accuracy. We developed a single generalized RF model across all years (2016–2023) and applied it to predict MEOF cover annually. Thus, while temporal imbalance in samples (e.g., more samples from bloom years such as 2019 and 2023) influenced the overall distribution of training data, spatial balance and adequate coverage across the full percent cover range were the most critical factors for model accuracy. We found that increasing the sample size and ensuring a more balanced distribution significantly improved model performance, raising R² from 0.55 (Saraf et al., 2023) to 0.76. RMSE increased from 7% to 15%, reflecting the inclusion of a wider range of percent cover values rather than insufficient sample size or overall imbalance. Saraf et al., (2023) reported that their model underestimated high percent cover due to a limited sample size (n = 1,612). In contrast, our model utilized a larger and more evenly distributed sample (n = 11,235) across years, improving predictive accuracy and the representation of extreme cover values. These findings suggest that balanced sample sizes enhance both the predictive range and accuracy of RF models, although temporal imbalance in certain years may still influence RMSE and require further investigation. Moreover, it is noteworthy to highlight that it is difficult to fully stratify samples temporally for a biennial species like MEOF, which remains dormant during certain seasons and blooms only under specific environmental conditions.” RC31: L497: Why did you not use the manually delineated polygons for modelling instead of the modelled cover values to avoid problems of error propagation? AC31: We thank the referee for this comment. To clarify, we have added an explanation in lines 615-620 as follows: “We manually delineated MEOF presence and absence polygons on the UAS imagery, which were used to train and validate the RF classification model. The resulting classified image was then used to derive continuous, wall-to-wall fractional cover estimates across the UAV sites. We used these model-derived continuous MEOF cover values, rather than the manual polygons, for regression analyses in order to generate numerous spatially explicit cover samples and to capture gradients of invasion across the landscape.” RC32: L539ff: The whole section sounds like results. Reformulate and/or remove. AC32: We thank the referee for this suggestion. In response, we have removed “Section 4.6 Validation with Planet Imagery” and incorporated the content in lines 659-668 into Section 4.5 “Validation for 2023 estimates.” The revised paragraph reads as follows: “In addition to UAS validation, we used four-band (visible and near-infrared), 3 m resolution Dove Classic and SuperDove PlanetScope (PS) imagery for 2019 and 2023 through the NASA CSDA program (Planet Labs PBC, 2023) to further assess model predictions (Figure 7). PS scenes were selected for locations with predicted high MEOF cover, and false-color combinations (green-green-blue) were applied to enhance visualization of MEOF blooms. These imagery data offered an independent and freely available means to complement the UAS-based validation by visually verifying the spatial patterns of predicted MEOF cover across sites where field data were unavailable. In general, the validation results indicate that the RF model effectively depicts spatial variation in MEOF cover throughout the study area, thereby providing a reliable foundation for evaluating invasion intensity on a landscape scale.” RC33: L548-550: Support this statement with data. It would rather belong to results. In the discussion, further interpretation of the results are needed. AC33: We thank the referee for this comment. The statement in lines L548–550 has been removed, and the paragraph has been rephrased in lines 659-668 for inclusion in the Discussion section (Section 4.5), focusing on interpretation rather than presenting results. The revised paragraph, which now places the PlanetScope validation within the broader discussion of model reliability and spatial variation, is provided in the previous comment. RC34: L569: Do you mean PlanetScope data when referring to high-resolution mapping? What could be limitations of PlanetScope data? AC34: We thank the referee for this comment. We have revised the sentence in lines 684-689 to clarify that high-resolution mapping refers to both Sentinel-2 and PlanetScope data, and also highlighted the challenges with uneven predictor spatial resolutions: “High-resolution mapping, even at Sentinel-2 (10 m) or PlanetScope (3 m) resolution, is complicated by the uneven spatial resolution of independent variables, making it more difficult to understand their relative roles in characterizing the niche of invasive species. Mapping at very high resolution, such as 3 m PlanetScope imagery, has its own limitations, including fewer spectral bands, lower radiometric calibration, and higher noise levels in vegetation indices, which can affect the accuracy of species-specific detection.” 
 
- 
                                        
                                     AC2:  'Response to Referee 2', Sakshi Saraf, 11 Oct 2025
                                        
                                                
                                        
                            
                                        
                            
                                            
                                    
                            
                            
                            
                                        
Data sets
Spatiotemporal mapping of invasive yellow sweetclover blooms using Sentinel-2 and high-resolution drone imagery Sakshi Saraf, Ranjeet John, Venkatesh Kolluru, Khushboo Jain, Geoffrey Henebry, Jiquan Chen, Raffaele Lafortezza https://doi.org/10.6084/m9.figshare.29270759.v1
Model code and software
Spatiotemporal mapping of invasive yellow sweetclover blooms using Sentinel-2 and high-resolution drone imagery Sakshi Saraf, Ranjeet John, Venkatesh Kolluru, Khushboo Jain, Geoffrey Henebry, Jiquan Chen, Raffaele Lafortezza https://doi.org/10.6084/m9.figshare.29270759.v1
Viewed
| HTML | XML | Total | Supplement | BibTeX | EndNote | |
|---|---|---|---|---|---|---|
| 1,226 | 39 | 24 | 1,289 | 35 | 33 | 34 | 
- HTML: 1,226
- PDF: 39
- XML: 24
- Total: 1,289
- Supplement: 35
- BibTeX: 33
- EndNote: 34
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
 
                         
                         
                         
                        



 
                 
                 
                 
                 
                
Saraf et al. produce a time-series of yellow sweet clover maps in western SD using sentinel imagery. The methods are solid and the data appear to be useful to managers. However, there are many occurrences of text that is confusing/out of context. I provide detailed suggestions below. Also, the language could be a bit more polished. For example, on line 70: “One such case we have for an invasive plant named yellow sweetclover (Melilotus officinalis….” Should be rephrased to something like “A common invasive biennial in the NGP, yellow sweetclover (Melilotus officinalis….” And on line 72: “of such plant species till the previous decade” should be “of such plant species until the 2010s””
Specific Comments
Line 32: Why did the authors chose an RF approach as opposed to a DNN/CNN?
Line 81-82: “Invasive forbs such as MEOF develop inflorescences with yellow flowers that are prominent during flowering time.” Suggest “Invasive forbs such as MEOF develop yellow inflorescences that are prominent during flowering time”
Line 85: Suggest adding “For example,” before “Sentinel-2 imagery with 10 m spatial resolution” to better tie with prior sentence. I would also recommend adding some additional citations to this section.
Line 91 “Moreover, its yellow flowers can be easily mistaken for other” suggest “Moreover, its yellow flowers can be easily mistaken in remote sensing imagery for other”
Line 133-134: “with poor representation in the field data” in what field data? In the data you’ve collected?
Line 139: Is objective 1 to map the extent (i.e. presence/absence) or fractional cover of MEOF? This isn’t clear.
Line 156: Replace rainfall with precipitation.
Line 160: “mosaic of mixed-grass prairie interspersed with shrubs…” I would say more accurately, from a landscape perspective “mosaic of mixed-grass prairie interspersed with cultivated lands”
Line 166-168: “Dryland sedges (Carex spp. L.), prairie threeawn (Aristida oligantha Michx.), and fringed sagewort (Artemisia frigida Willd.) increase with disturbance.” what is the source of this information?
Lines 178-180: Wouldn’t you also want samples of non MEOF sites?
Lines 203-212: What attributes were collected in the field sample? Just MEOF cover? Or something else also? Please include this information.
Line 268: what years are included in the climate means?
Lines 270: The seasonal composites are also means from across multiple years?
Line 280 “such as” implies there are other terrain derivatives calculated from NED. Is this the case? If so, please list.
Line 285: Proximity to roads could not be calculated from NLCD. You could derive distance to urban though, which may be synonymous with roads in some cases.
Line 295: “identically” distributed is odd. Do you mean “randomly”?
Lines 295-308: How is this discussion of Moran’s I/spatial autocorrelation different from above (lines 250-256)? The final n (11,235) is the same in both. Is the first describing the UAS site predictions, and the 2nd describing the regional model? The heading of 2.6 should be revised to clarify this.
Lines 333-336: This description of fig 3 is practically a caption, no need for this in text.
Lines 377-378: “However, climate variables like annual precipitation or snow depth, did not rank among the top predicting variables” Or could there be a lag effect (eg last years precip vs current year MEOF)?
Lines 381-382: The spatial resolution difference described in the prior sentence is not the reason for “created a MEOF percent cover map series for 2016 through 2023” as currently implied by “therefore”. Rather, you compared the “MEOF percent cover map series for 2016 through 2023” to “precipitation anomaly maps”.
Line 374: The first couple paragraphs on 4.1 read almost as results since there is no references cited at all.
Line 425: “displaying huge appearances” is odd
Line 450: NDMI was already defined in line 369.
Line 454: Or could it be proximity to roads importance is driven by sampling distribution? Are there simply more samples near road due to accessibility? And the model has put together a false association?
Line 471: “though it also increased RMSE” could this be simply down to a higher mean in the new maps? Seems this is the case based on lines 489-494. The higher cover relating to higher RMSE suggests to use a different stat such as nRMSE to account for this.
Lines 469-481: 1) The increase in sample size in the current map vs the older one is described is improving results. Is this not in opposition to Line 357 “We noticed that the reduction in sample size had little-to-no effect”?
2) Regarding the temporal imbalance in samples. Is there just one MEOF RF model built through time, in which case the temporal imbalance should be largely irrelevant. Or is a unique MEOF model built for each year, in which case the temporal and spatial balance would be key.
Line 497: “We manually delineated polygons of invasive MEOF presence, which were then used to train the RF classifier.” But this was done on UAS imagery, not directly for the RF classification as implied here.
Lines 497-516: Your approach of scaling observations with UAS imagery to serve as training over a broader landscape is similar in some respects to Rigge et al. 2020 (https://www.mdpi.com/2072-4292/12/3/412).
Line 526: “The prediction map for 2023….” This is the 3rd time this pattern has been discussed.
Lines 529-537: This entire section has nothing to do with validation (the section title). And much of it is introduction-type content.
Line 553: “Our model does not explain the variation in the MEOF cover that has biennial life cycle.” What? Isn’t that precisely the point of making time-series maps for 2016-2023?
Line 561: I don’t understand the point on HLS. Was HLS data actually used in modelling? And how is the sentence related to line 563 “We resolved this issue”? I don’t see the connection.
Line 849: Your group collected all of the ~23K observations? Are you using any BLM AIM data? Looking at Table S1, data from AIM, NEON, etc. was used. This should be described in the methods. Also, the “land cover map” should be more clearly defined as NLCD, date x.
Line 857 (Figure 3) should the range of percent cover be 0-100 not 0-1?
Line 867: “Yellow sweetclover percent cover estimates in the high yellow sweetclover probability” is quite awkward. Rephrase.
Figure 5. The point of this figure is to contrast the older Saraf 2023 map with the new one from the current paper? This needs to be made much more clear in the caption.
All figures, figures 3, 4, and 5 all have different color ramps (symbology). Pick one and stick with it for all figures.
Line 876 (Figure 7 caption): “Predicted percent cover estimates for invasive yellow sweetclover (MEOF) in panel (a) at four different sites represented with numbers and each site is compared with the PlanetScope imagery available at 3 m resolution shown in green, green, and blue band combination to highlight yellow sweetclover blooms in panel (b). (PlanetScope imagery © Planet Labs PBC).” This is all super confusing.
I would add a label “2019” over the left set of panels and “2023” over the right set, and change the text to:
“Predicted percent cover estimates for invasive yellow sweetclover (MEOF) at four different sites represented with numbers for 2019 (left) and 2023 (right). In each site, a) 3 m resolution PlanetScope imagery shown in green, green, and blue band combination to highlight yellow sweetclover blooms, B) fractional cover of MEOF. (PlanetScope imagery © Planet Labs PBC).”