Global long-term hourly 9 km terrestrial water-energy-carbon fluxes (FluxHourly)
Abstract. Land surface energy, water and carbon fluxes are key for understanding Earth’s climate system, yet global continuous high resolution fluxes datasets remain scarce. In this study, we present a global long-term (2000–2020) hourly dataset of terrestrial water-energy-carbon fluxes, generated by integrating model simulations, in-situ measurements, and machine learning with remote sensing and meteorological data. First the integrated STEMMUS-SCOPE model was deployed to simulate land surface fluxes over 170 sites with in-situ measurements. The modeled variables include net radiation (Rn), latent heat flux (LE), sensible heat flux (H), soil heat flux (G), gross primary productivity (GPP), solar-induced fluorescence in 685 nm and 740 nm (SIF685, SIF740). Next optimal interpolation was applied to merge Rn, LE, and H from STEMMUS-SCOPE simulations with eddy covariance observations. The optimal interpolated Rn, LE, H alongside STEMMUS-SCOPE simulated G, GPP, SIF685, SIF740 were then used as training data-pairs to develop the emulator using a multivariate Random Forest (RF) regression algorithm, referred to as Random Forest with Optimal Interpolation (RF_OI) to predict fluxes with global gridded remote sensing and meteorological data. The results demonstrate that RF_OI can estimate land surface fluxes with Pearson Correlation Coefficient score (r-score) values higher than 0.88 except for GPP (Rn 0.99, LE 0.88, H 0.92, G 0.92, GPP 0.8, SIF685 0.99, SIF740 0.99). The testing results on independent stations (which were not included for developing emulators) show r-score values higher than 0.8. The feature importance indicates that incoming shortwave radiation, surface soil moisture, and leaf area index are top predictor variables that determine the prediction performance. This terrestrial flux dataset provides a valuable resource for understanding ecosystem responses to climate extremes on global scale.