EGO: a global 0.05&deg; hourly GPP dataset for monitoring diurnal photosynthesis dynamics

Liu, Xi; Li, Xing; Hao, Dalei; Xiao, Jingfeng; Zhou, Yanan; Zhao, Cenliang; Diao, Zikang; Qu, Fuqiang; Lin, Shangrong; Liu, Xiangzhuo; Zhang, Zhaoying; Liu, Xinjie; Zhang, Helin

doi:10.5194/essd-2026-40

Preprints

https://doi.org/10.5194/essd-2026-40

Preprints

15 Apr 2026

| 15 Apr 2026

Status: this preprint is currently under review for the journal ESSD.

EGO: a global 0.05° hourly GPP dataset for monitoring diurnal photosynthesis dynamics

Xi Liu, Xing Li, Dalei Hao, Jingfeng Xiao, Yanan Zhou, Cenliang Zhao, Zikang Diao, Fuqiang Qu, Shangrong Lin, Xiangzhuo Liu, Zhaoying Zhang, Xinjie Liu, and Helin Zhang

Abstract. Vegetation photosynthesis, quantified as gross primary productivity (GPP), regulates the terrestrial carbon sink and land–atmosphere exchanges. At sub-daily scales, diurnal GPP dynamics reveal rapid adjustments to changing light, temperature and water conditions that are largely obscured in daily-to-annual aggregates, underscoring the need for developing global hourly GPP products. However, existing hourly products mostly rely on traditional machine-learning schemes that lack explicit biophysical constraints and an adequate representation of water limitation, leading to large uncertainties, especially in arid regions. Besides, the added value of hourly products for resolving diurnal behavior and responses to environmental stress remains poorly quantified. Here, we develop a causal knowledge-driven upscaling framework that couples the Peter and Clark Momentary Conditional Independence guided causal weights with ensemble learning strategies. Based on eddy-covariance measurements and multi-source meteorological variables, vegetation properties, and land-cover fields, we generated a global 0.05° hourly GPP product from 2000 to 2022, named EGO (Eddy covariance site-based Global hOurly) GPP, and then evaluated how well EGO reproduces observed diurnal cycles and their responses to extreme events. EGO GPP achieves an R² of 0.76 and an RMSE of 4.17 μmol CO₂ m⁻² s⁻¹ on independent test sites, and outperforms two recent hourly upscaling products (FLUXCOM and X-BASE; R² ≈ 0.60 and RMSE ≈ 5.5 μmol CO₂ m⁻² s⁻¹), with large improvement in drylands. EGO GPP clearly illustrates the diurnal progression of photosynthesis and captures observed diurnal metrics across diverse biomes, revealing strong midday depression and morning-skewed curves in drylands but near-symmetric cycles in high-latitude and humid tropical regions. Analyses of the June 2021 U.S. drought and the August 2003 European heatwave further show that EGO reliably tracks diurnal photosynthetic responses to extremes, including GPP reductions, earlier centroid/peak times and intensified midday depression, consistent with tower-based results. Looking ahead, EGO GPP provides a reliable foundation for investigating diurnal photosynthetic behavior, exploring vegetation–climate interactions and benchmarking Earth system models at a sub-daily scale. EGO GPP is available at https://doi.org/10.5281/zenodo.18253238 (Liu et al., 2026).

Received: 16 Jan 2026 – Discussion started: 15 Apr 2026

Competing interests: At least one of the (co-)authors is a member of the editorial board of Earth System Science Data.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Preprint (PDF, 2965 KB)

Supplement (2814 KB)

Download & links

Xi Liu, Xing Li, Dalei Hao, Jingfeng Xiao, Yanan Zhou, Cenliang Zhao, Zikang Diao, Fuqiang Qu, Shangrong Lin, Xiangzhuo Liu, Zhaoying Zhang, Xinjie Liu, and Helin Zhang

Status: open (until 21 Jun 2026)

Post a comment Subscribe to comment alert

RC1: 'Comment on essd-2026-40', Anonymous Referee #1, 23 May 2026 reply

Liu et al. develop a new global GPP model called EGO using a CKML-XGBoost approach and compare it against FLUXCOM. The new model is interesting, but generally it’s hard to make too much of a case for studying diurnal patterns of GPP given the massive uncertainties in partitioning GPP (e.g. the nighttime approach will give different patterns and new studies like those of Keenan et al. emphasize the importance of under appreciated processes like the Kok effect). But regardless it’s an interesting challenge for models. The major shortcoming in my opinion is that the authors did not benchmark improvements of the CKML-XGBoost model against traditional approaches (e.g. LSTM) or standard XGBoost models. How are we to know if it’s an improvement, and by how much? I don’t doubt that it makes an improvement, but without explaining what is really new (and how much the novelty makes a difference), it’s hard to know if CKML is really a future direction of an approach that might work in principle but may (or may not) be meaningfully better. Such an addition needn’t extend to the FLUXCOM comparisons but it is important from a model development perspective. Minor comments follow.

39: GPP is just canopy photosynthesis, the way it’s being described is like you’re trying to talk around something

46: these early studies like Wofsy et al. and Grace et al. and more should be cited

In general, fundamental (or at least more fundamental) references are missing throughout the manuscript. For example, both Ruehr et al. references cited beforehand are great to note, for example, but these are not the first times that people realized that GPP is the largest flux in the global C cycle or that vegetation responds to environmental variability at multiple (including short) time scales.

The PCMCI is interesting but how does this compare to KGML?

106: this paragraph makes a great point, but what metric will be introduced instead? I’ve always wondered why people don’t just use Nash-Sutcliffe modeling efficiency, and there would be a whole world of time and frequency (e.g. wavelet)-based approaches for choosing a superior metric.

111: Note Khan et al. (2022) here. The introduction is generally well reasoned but would benefit from a broader suite of references from the multiple groups who are working on these challenges, as well as the foundational papers only very briefly alluded to above.

140: this pixel/footprint matching requires more detail, especially given all the work that’s gone into this topic by Chu et al. (2021, Ag. For. Met.) and others.

Figure 1 is nice, would benefit from larger text in some of the subplots.

159-160: just say 2 m. Probably too technical a point but using non-breaking spaces on 164, 165 & elsewhere between mathematical characters and values would be an improvement, also a subscript on the 2 in CO2 around here. Throughout, there are a number of minor usage issues that just require a careful read to check.

3.2.1: I agree in principle with this approach but would be curious to know if a simple LSTM would fare much more poorly if the challenge is to incorporate causal inference, or, alternately, if XGBoost alone gives a reasonable depiction of instantaneous flux as it often does. I don’t question the CKML approach, rather I wonder if it’s the only thing that might beat XGBoost here because a number of products including FLUXCOM-X, CASS, ALIVE, and many studies (that didn’t give their model an acronym, e.g. https://doi.org/10.3390/land14010124) have arrived at XGBoost. But without benchmarking against XGBoost alone, the CKML-XGboost model’s improvements can’t be quantified.

3.2.2: were any sites fully held out for the training testing split (commonly 70:30 or 80:20 or similar) or was the fold approach adopted alone? I’m wondering if this isn’t a full fair comparison against FLUXCOM; how does the model perform against data from sites that it has never seen before?

3.2.3 (really before): how was flux data quality considered?

272: the relationship between GPP and radiation isn’t linear, it saturates.

320 and beyond: I’d be curious about the time dependencies of these variables as soil moisture should increase in importance at longer time scales and at annual timescales temperature will likely be more important following the findings of Jung et al. This is perhaps a different topic though.

Fig. 6 and related comparisons: as noted above can we be sure that the FLUXCOM products are treated to a fair comparison in these analyses?

For Figure 12, what time period is the reference? Also, this is more than the US, also a good chunk of Canada and Mexico

Reply

Citation: https://doi.org/10.5194/essd-2026-40-RC1

Xi Liu, Xing Li, Dalei Hao, Jingfeng Xiao, Yanan Zhou, Cenliang Zhao, Zikang Diao, Fuqiang Qu, Shangrong Lin, Xiangzhuo Liu, Zhaoying Zhang, Xinjie Liu, and Helin Zhang

Supplement

https://doi.org/10.5194/essd-2026-40-supplement

Data sets

EGO: a global 0.05° hourly GPP dataset for monitoring diurnal photosynthesis dynamics Xi Liu and Xing Li https://doi.org/10.5281/zenodo.18253237

Model code and software

EGO: a global 0.05° hourly GPP dataset for monitoring diurnal photosynthesis dynamics Xi Liu and Xing Li https://doi.org/10.5281/zenodo.18253237

Xi Liu, Xing Li, Dalei Hao, Jingfeng Xiao, Yanan Zhou, Cenliang Zhao, Zikang Diao, Fuqiang Qu, Shangrong Lin, Xiangzhuo Liu, Zhaoying Zhang, Xinjie Liu, and Helin Zhang

Viewed

Total article views: 411 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	Supplement	BibTeX	EndNote
297	97	17	411	40	18	19

HTML: 297
PDF: 97
XML: 17
Total: 411
Supplement: 40
BibTeX: 18
EndNote: 19

Views and downloads (calculated since 15 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	251	86	16	353
May 2026	46	11	1	58

Cumulative views and downloads (calculated since 15 Apr 2026)

Month	HTML	PDF	XML	Total
Apr 2026	251	86	16	353
May 2026	46	11	1	58

Viewed (geographical distribution)

Total article views: 411 (including HTML, PDF, and XML) Thereof 411 with geography defined and 0 with unknown origin.

Country	#	Views	%

Latest update: 25 May 2026

Download

Preprint (2965 KB)
Metadata XML

Short summary

Vegetation photosynthesis, quantified as Gross Primary Productivity (GPP), changes rapidly throughout the day. We developed a global hourly GPP dataset called EGO by combining tower observations with advanced artificial intelligence that accounts for causal effects, outperforms existing hourly GPP products and well captures diurnal photosynthesis dynamics. It will provide a reliable foundation for investigating sub-daily ecosystem processes and benchmarking Earth system models.


Total:	0
HTML:	0
PDF:	0
XML:	0