the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
SEEPS4ALL: an open dataset for the verification of daily precipitation forecasts using station climate statistics
Abstract. Forecast verification is an essential task when developing a forecasting model. How well does a model perform? How does the forecast performance compare with previous versions or other models? Which aspects of the forecast could be improved? In weather forecasting, these questions apply in particular to precipitation, a key weather parameter with vital societal applications. Scores specifically designed to assess the performance of precipitation forecasts have been developed over the years. One example is the Stable and Equitable Error in Probability Space (SEEPS, Rodwell et al., 2010). The computation of this score is however not straightforward because it requires information about the precipitation climatology at the verification locations. More generally, climate statistics are key to assessing forecasts for extreme precipitation and high-impact events. Here, we introduce SEEPS4ALL, a set of data and tools that democratize the use of climate statistics for verification purposes. In particular, verification results for daily precipitation are showcased with both deterministic and probabilistic forecasts.
- Preprint
(1544 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
- RC1: 'Comment on essd-2025-553', Jonas Bhend, 16 Dec 2025
-
RC2: 'Comment on essd-2025-553', Anonymous Referee #2, 23 Dec 2025
The manuscript introduces a new dataset focused on station-based precipitation over Europe, including climatological statistics that facilitate the computation of meaningful verification metrics, like the SSEPS. This dataset can be useful to verify precipitation forecast, not only from NWP but also forecasts obtained from ML methods. The paper is well written and the data is accessible as described in the manuscript.
Regarding the dataset, there is already an open discussion about reformatting the data. I also think that including the maximum (100th percentile) is a good idea and agree with the change of dimensions.
Regarding the manuscript, I have some minor comments and clarifying questions:
- Line 68: The dataset includes observations from 2022 to 2024 coming from ECA&D. This is presented as a novelty (Line 60), but is this part of the dataset just a direct subset of ECA&D?
- Fig1: Information about the number of the plotted stations is appreciated.
- Lines 82-83: The study uses the blended dataset provided in ECA&D. Blending data has an impact into extreme values, that are smoothed. Please, provide a comment about the impact this blending might have into the extreme cases of the provided dataset.
Citation: https://doi.org/10.5194/essd-2025-553-RC2
Data sets
SEEPS4ALL version 1.0 Zied Ben Bouallègue https://doi.org/10.5281/zenodo.17052887
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 190 | 40 | 18 | 248 | 12 | 13 |
- HTML: 190
- PDF: 40
- XML: 18
- Total: 248
- BibTeX: 12
- EndNote: 13
Viewed (geographical distribution)
| Country | # | Views | % |
|---|
| Total: | 0 |
| HTML: | 0 |
| PDF: | 0 |
| XML: | 0 |
- 1
SEEPS4All review
The authors present a new station-based dataset to support evaluation of precipitation forecasts with the SEEPS score and other verification metrics. This dataset and the corresponding software for verification are useful contributions and the paper is well written and concise. The datasets are well described and accessible, but I suggest to reorganize the data as detailed below.
The suggested changes to the datasets can be summarized as:
To facilitate working with the new data layout, the scripts should be adjusted to take into account the difference in time representation.
So instead of the representation as below:
```
>>> xr.open_zarr("obs_clim_tp24_2022_2024_ecad.zarr")
<xarray.Dataset> Size: 9GB
Dimensions: (stnid: 10562, time: 1097)
Coordinates:
elevation (stnid) int64 84kB dask.array<chunksize=(10562,), meta=np.ndarray>
lat (stnid) float64 84kB dask.array<chunksize=(10562,), meta=np.ndarray>
lon (stnid) float64 84kB dask.array<chunksize=(10562,), meta=np.ndarray>
* stnid (stnid) int64 84kB 13 14 15 16 21 ... 27706 27707 27708 27710
* time (time) datetime64[ns] 9kB 2022-01-01 2022-01-02 ... 2025-01-01
Data variables: (12/100)
observation (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
perc1 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
perc10 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
perc11 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
perc12 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
perc13 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
... ...
perc98 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
perc99 (time, stnid) float64 93MB dask.array<chunksize=(138, 1321), meta=np.ndarray>
Attributes:
description: observations with climate percentiles from 1 to 99
licence: CC-BY-NC. See also https://knmi-ecad-assets-prd.s3.amazonaw...
version: 1.0.0
```
I suggest the following:
```
>>> xr.open_zarr("obs_clim_tp24_2022_2024_ecad.zarr")
<xarray.Dataset> Size: 9GB
Dimensions: (stnid: 10562, time: 1097, doy: 365)
Coordinates:
elevation (stnid) int64 84kB dask.array<chunksize=(10562,), meta=np.ndarray>
lat (stnid) float64 84kB dask.array<chunksize=(10562,), meta=np.ndarray>
lon (stnid) float64 84kB dask.array<chunksize=(10562,), meta=np.ndarray>
* stnid (stnid) int64 84kB 13 14 15 16 21 ... 27706 27707 27708 27710
* time (time) datetime64[ns] 9kB 2022-01-01 2022-01-02 ... 2025-01-01
* month (month) in64 1 2 3 ... 12
* perc (perc) int64 1 2 3 ... 100
Data variables:
observation (time, stnid) float64 93MB
percentile (month, stnid, perc) float64 …
Attributes:
description: observations with climate percentiles from 1 to 99
licence: CC-BY-NC. See also https://knmi-ecad-assets-prd.s3.amazonaw...
version: 1.0.0
```
Similarly, the `obs_seeps_tp24_2022_2024_ecad.zarr` dataset could be reorganized in corresponding fashion.
## Minor Editorial Comments:
L13: recognizes meteorological data as high value data, …
L143: PSS