SHELDA: Sub-hourly European Quality Controlled Sea Level Dataset

Balić, Marijana; Šepić, Jadranka

doi:10.5194/essd-2025-767

Preprints

https://doi.org/10.5194/essd-2025-767

Preprints

28 Dec 2025

| 28 Dec 2025

Status: a revised version of this preprint is currently under review for the journal ESSD.

SHELDA: Sub-hourly European Quality Controlled Sea Level Dataset

Marijana Balić and Jadranka Šepić

Abstract. Availability of high-quality sub-hourly sea level data is essential for understanding of a wide range of oceanic processes, including tidal oscillations, seiches, storm surges, tsunamis (including meteotsunamis), and their impact on sea level extremes and coastal flooding. Freely accessible sea level databases often contain time series measured with hourly or even longer sampling step, or they contain high-frequency data that have not undergone quality control procedures. To address this gap, the SHELDA (Sub-Hourly European Quality Controlled Sea Level DAtaset) has been created. This dataset comprises 257 individual tide gauge records in NetCDF format (https://doi.org/10.14284/764, Balić and Šepić, 2025), each representing quality-controlled sea level time series sampled at intervals between 1 and 15 minutes, along with residual time series derived by removing tidal components. This paper outlines the rigorous quality control procedures implemented and describes the spatial and temporal coverage of the dataset, along with technical specifications. SHELDA enables precise identification and analysis of sea level variability at timescales from minutes to multi-yearly along the European coasts, including Greenland, Canary Islands, Israel, Lebanon and Türkiye.

Received: 11 Dec 2025 – Discussion started: 28 Dec 2025

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims made in the text, published maps, institutional affiliations, or any other geographical representation in this paper. While Copernicus Publications makes every effort to include appropriate place names, the final responsibility lies with the authors. Views expressed in the text are those of the authors and do not necessarily reflect the views of the publisher.

Download & links

Marijana Balić and Jadranka Šepić

Status: final response (author comments only)

RC1:
'Comment on essd-2025-767', Anonymous Referee #1, 09 Apr 2026

The paper describes quality control performed on 257 high-frequency sea level records (sampled at intervals from 1 to 15 minutes) along European coastlines. The manuscript is generally well organized and clearly written, with a few shortcomings noted below.
The dataset, as described, has the potential to make a significant contribution to the scientific community, particularly for researchers investigating sea level processes on shorter timescales. However, the dataset available via the provided link currently contains only 30 station records, rather than the expected 257.
In light of this discrepancy, I provide comments on both the manuscript and the available subset of 30 records, and I recommend rejection with the possibility of resubmission once the complete dataset is made accessible.
Comments on the paper
L15: Please specify the license under which the data are distributed.
L17: “rigorous quality control procedures”: Please briefly outline the main steps undertaken, as well as those not applied (e.g. corrections of datum shifts and drifts). Also include in the Abstract the minimum and maximum lengths of the available records.
L20: Please use a consistent name for Turkey throughout the manuscript (see also Fig. 10).
L49: Please rephrase “some kind of averaging procedure”, as this is not appropriate. Also, I doubt you would measure wind waves with a tide gauge instrument typically installed on the coastline (as visible from your maps).
L58: ‘reduced’ > higher?
L91–94: Please rephrase for clarity; it is not clear whether MISELA includes only the high-frequency signal or both high-frequency and total sea levels.
Figure 1: Can you offer a better solution for records shorter than 3 years, as the white circles on the grey background are not visible?
Sections 2.2.1 and 2.2.2: These sections would benefit from clearer organization. The current structure with multiple (i), (ii), (iii).. points, some connected, others not, makes them difficult to follow.
Figure 6: (%) > [%]
L211: I would refer to all data from one station (regardless of how many segments with different sampling frequencies it contains) as a “record,” as the current usage is confusing (e.g. the number changes from 257 to 275 and then back to 257).
L220: I understand that correcting datum shifts is not possible without additional data. However, users would benefit from knowing which stations are affected, so that they can exclude them if necessary. I recommend including this information in the manuscript and in Tables A1–A3 (and also in NetCDF files). Furthermore, please specify in the paper what you consider to be a shift, a datum shift, and a drift (including possible causes).
Figure 3: The bathymetry color bar is missing.
L331: Please clarify what specific issues were identified in the German data that led to their exclusion from SHELDA.
Table 1: See comments on NetCDF files below.
Figures 7, 8, 9: Please define what is meant by a step sensor.
L380: In some places it is unclear whether you are referring to the original or processed data. For example, is the 1-minute interval the most common in the original data or after processing? Please clarify.
L380–387: It would be useful to include a brief discussion explaining the observed distribution of sampling frequencies (e.g. related to dynamics?) and sensor types (e.g. infrastructure constraints?).
L394: Please include the data license.
L471: ‘article’ > ‘manuscript’
L19, L133, L397: In my opinion, the SHELDA dataset is most suitable for short-period processes, particularly since drifts and datum shifts were identified but not corrected. Therefore, the term “multi-yearly” appears too strong.
Comments on NetCDF files
1. For each station, please include information on the data owner or the institution responsible for maintaining the station.
2. The current qc_flags variable (0 or 1) provides limited added value, as the presence or absence of data can already be inferred directly from the dataset. It would be beneficial to enhance this by distinguishing between data that are originally missing (i.e., not available from data centers) and data that have been removed during quality control procedures. As a further improvement, the authors might also consider introducing an additional flag to indicate data points that are present but potentially questionable or of lower quality (as noted in the manuscript, where some data were identified as doubtful).
3. Users would benefit from additional metadata indicating how many segments are contained within each file (i.e., whether a file consists of one segment or multiple segments).
Comments on data (see attached Figures)
1. Check whether interpolation was applied across missing data in the first segment (Fig. bodr.png).
2. It appears that some stations still exhibit constant sea levels over time (see figures in Constant_SL).
3. Please check station batr (see figures in Suspect_data; ignore the red stars). Some high-frequency episodes appear to be “cut off” from above. Is this expected behaviour?

Citation: https://doi.org/10.5194/essd-2025-767-RC1
- AC1:
  'Reply on RC1', Marijana Balić, 21 May 2026
  The paper describes quality control performed on 257 high-frequency sea level records (sampled at intervals from 1 to 15 minutes) along European coastlines. The manuscript is generally well organized and clearly written, with a few shortcomings noted below.
  The dataset, as described, has the potential to make a significant contribution to the scientific community, particularly for researchers investigating sea level processes on shorter timescales. However, the dataset available via the provided link currently contains only 30 station records, rather than the expected 257.
  In light of this discrepancy, I provide comments on both the manuscript and the available subset of 30 records, and I recommend rejection with the possibility of resubmission once the complete dataset is made accessible.
  We thank the reviewer for the thorough review and positive feedback on the organization and clarity of the manuscript. We greatly appreciate your recognition of the dataset's potential contribution to sea level research on shorter timescales.
  
  As for the dataset availability, we apologize. We originally uploaded the entire dataset and were not aware that it is not fully available via the provided link. Thank you for noticing this. We will upload everything again and make sure that all records can be downloaded. We will also incorporate the requested changes to the NetCDF file.
  
  Comments on the paper
  L15: Please specify the license under which the data are distributed.
  The license information will be added.
  
  L17: “rigorous quality control procedures”: Please briefly outline the main steps undertaken, as well as those not applied (e.g. corrections of datum shifts and drifts). Also include in the Abstract the minimum and maximum lengths of the available records.
  We will include the requested information on quality control steps and record lengths in the abstract.
  
  L20: Please use a consistent name for Turkey throughout the manuscript (see also Fig. 10).
  We will revise Fig. 10 accordingly and ensure that the Türkiye is used consistently throughout the manuscript.
  
  L49: Please rephrase “some kind of averaging procedure”, as this is not appropriate. Also, I doubt you would measure wind waves with a tide gauge instrument typically installed on the coastline (as visible from your maps).
  We will rephrase and correct.
  
  L58: ‘reduced’ > higher?
  Will be changed.
  
  L91–94: Please rephrase for clarity; it is not clear whether MISELA includes only the high-frequency signal or both high-frequency and total sea levels.
  We will rephrase to make it clear that MISELA includes only the high‑frequency sea‑level signal.
  
  Figure 1: Can you offer a better solution for records shorter than 3 years, as the white circles on the grey background are not visible?
  We will revise Figure 1 by changing the colour of the circle.
  
  Sections 2.2.1 and 2.2.2: These sections would benefit from clearer organization. The current structure with multiple (i), (ii), (iii). points, some connected, others not, makes them difficult to follow.
  We will reorganize Sections 2.2.1 and 2.2.2 to improve clarity, as suggested.
  
  Figure 6: (%) > [%]
  Will be changed.
  
  L211: I would refer to all data from one station (regardless of how many segments with different sampling frequencies it contains) as a “record,” as the current usage is confusing (e.g. the number changes from 257 to 275 and then back to 257).
  Thank you for pointing out the potential confusion in terminology. We will revise the paragraph accordingly to clearly distinguish records (full station datasets) from segments (subdivisions for QC due to varying sampling frequencies).
  
  L220: I understand that correcting datum shifts is not possible without additional data. However, users would benefit from knowing which stations are affected, so that they can exclude them if necessary. I recommend including this information in the manuscript and in Tables A1–A3 (and also in NetCDF files). Furthermore, please specify in the paper what you consider to be a shift, a datum shift, and a drift (including possible causes).
  We will add information about types of erroneous data find at each station record in both Tables A1-A3 and in NetCFD files. Furthermore, we will define what we consider a shift, a datum shift, and a drift and illustrate these where possible (if they are not already shown).
  
  Figure 3: The bathymetry color bar is missing.
  We will add the bathymetry colour bar.
  
  L331: Please clarify what specific issues were identified in the German data that led to their exclusion from SHELDA.
  "We will clarify the specific issues identified (i.e., presence of non-physical oscillations appearing throughout the time series).
  
  Table 1: See comments on NetCDF files below.
  Figures 7, 8, 9: Please define what is meant by a step sensor.
  Step sensor is a specific type of tide gauge sensor that uses a vertical array of metal electrodes to detect water level. We will define this in the manuscript.
  
  L380: In some places it is unclear whether you are referring to the original or processed data. For example, is the 1-minute interval the most common in the original data or after processing? Please clarify.
  In this sentence we refer to the 1‑minute interval in the processed data, but the statement also holds true for the original data, where 1‑minute sampling is the most common. We will clarify this in the revised manuscript to avoid ambiguity.
  
  L380–387: It would be useful to include a brief discussion explaining the observed distribution of sampling frequencies (e.g. related to dynamics?) and sensor types (e.g. infrastructure constraints?).
  We will revise the manuscript as requested by providing a more detailed explanation of: (1) the relationship between sampling frequency distribution and coastal dynamics and monitoring objectives (e.g., 15-minute sampling for tide and storm surge monitoring versus 1–10 minute sampling in regions prone to tsunamis, seiches, and meteotsunamis), and (2) how the distribution of sensor types reflects infrastructure constraints, funding availability, and the development of national monitoring networks across Europe.
  
  L394: Please include the data license.
  The license information will be added.
  
  L471: ‘article’ > ‘manuscript’
  Will be replaced.
  
  L19, L133, L397: In my opinion, the SHELDA dataset is most suitable for short-period processes, particularly since drifts and datum shifts were identified but not corrected. Therefore, the term “multi-yearly” appears too strong.
  We agree. We will omit the term.
  
  Comments on NetCDF files
  For each station, please include information on the data owner or the institution responsible for maintaining the station.
  We will implement this by adding dedicated metadata fields in the netCDF files that clearly identify the original data providers for each station.
  
  The current qc_flags variable (0 or 1) provides limited added value, as the presence or absence of data can already be inferred directly from the dataset. It would be beneficial to enhance this by distinguishing between data that are originally missing (i.e., not available from data centers) and data that have been removed during quality control procedures. As a further improvement, the authors might also consider introducing an additional flag to indicate data points that are present but potentially questionable or of lower quality (as noted in the manuscript, where some data were identified as doubtful).
  
  We will expand the flags to distinguish between: (i) high quality data; (ii) data originally missing (i.e., not available from data centers), (ii) data removed during quality control procedures (e.g., 1 = High quality data, 2 = Missing data, 3 = Removed data)
  
  Specific data which are doubtful or potentially of lower quality were not systematically flagged at the time of quality control. However general issues at each station (e.g., numerous outliers, drifts, timing errors) were noted – and will be added to NetCDF files, following the suggestion.
  
  Users would benefit from additional metadata indicating how many segments are contained within each file (i.e., whether a file consists of one segment or multiple segments).
  
  We will add metadata specifying the number of segments contained in each file (i.e., whether a file consists of one segment or multiple segments) to improve clarity.
  
  Comments on data (see attached Figures)
  Check whether interpolation was applied across missing data in the first segment (Fig. bodr.png).
  
  Interpolation was not applied across the missing data between two segments. The issue is caused by the plotting function connecting the last point of the first segment to the first point of the second segment. In the revised version, we will define the gap between the two segments more clearly to avoid this visual connection. Corrected time series for all stations will be uploaded.
  
  It appears that some stations still exhibit constant sea levels over time (see figures in Constant_SL).
  
  We have checked this and we will correct the time series. These short “constant” intervals were probably not automatically detected as erroneous, and they were unintentionally overlooked during the visual inspection. Corrected time series will be uploaded.
  
  Please check station batr (see figures in Suspect_data; ignore the red stars). Some high-frequency episodes appear to be “cut off” from above. Is this expected behaviour?
  
  We will correct this and upload corrected time series.
  
  Citation: https://doi.org/10.5194/essd-2025-767-AC1
RC2:
'Comments - manuscript essd-2025-767', Ivan Haigh, 21 Apr 2026
SHELDA: Sub-hourly European Quality Controlled Sea Level Dataset
Overall, I consider this to be a very strong paper based on an excellent and valuable dataset. The authors should be commended for the substantial effort involved in compiling and quality-controlling the data; this represents a meaningful contribution to the sea-level research community. I particularly welcome the fact that the dataset has undergone explicit quality control, which significantly enhances its usefulness. The manuscript itself is detailed, clearly written, and generally well structured.
That said, I have two major comments that I believe need to be addressed, along with a small number of minor issues. If these points are satisfactorily resolved, I would fully support acceptance of the paper for publication in Earth System Science Data.
Major comments
Relationship to existing datasets (MISELA): A key question that remains insufficiently addressed is how SHELDA differs from, and adds value beyond, the existing MISELA dataset. MISELA already provides high-frequency (1-minute), quality-controlled sea-level data from 331 sites. In the Introduction (particularly around page 3), the authors should explain much more clearly why a new dataset is required, what its specific advantages are relative to MISELA, and what scientific or practical gap it fills. Related to this, it is not clear why SHELDA is not presented as a sub-component or extension of MISELA. Clarifying this distinction would greatly strengthen the motivation for the dataset. In addition, the rationale for focusing exclusively on Europe should be explained more explicitly, as this choice is not currently well justified.

Temporal coverage of the dataset: The dataset ends in December 2021, meaning the most recent data are now almost five years old. While I fully appreciate the significant effort required to compile and quality-control datasets of this kind, it would substantially increase the value of SHELDA if more recent data could be included. If this is not feasible, the authors should provide a clearer explanation of why the dataset currently only extends to the end of 2021, for example due to data availability, access restrictions, or quality-control constraints.

Minor comments
Line 95– While it is true that IOC data are updated most frequently, it would be appropriate to note that data quality is often poor and inconsistent, which limits its suitability for many scientific applications.

Line 74– It is unclear which specific BODC datasets are being referred to here. The URL provided links to the general archive, and BODC no longer actively maintains a global sea-level dataset. I would recommend removing BODC from this list. Instead, the list would be clearer and more accurate if it included only PSMSL, UHSLC, IOC, GESLA, SONEL, and MISELA.

Line 203– The text states that the dataset ends in December 2020, which appears inconsistent with other parts of the manuscript. Please clarify how 2021 data are treated.

Line 228– The criteria used to remove values (a 50 cm difference from a neighbouring value, and a 30 cm difference from both neighbouring values) would benefit from a scientific justification. On what basis were these thresholds chosen? Were they informed by instrumental characteristics, tidal variability, or sensitivity testing?

Table 1: I strongly feel that the original data providers should be explicitly acknowledged in the netcdf files, not just IOC – this is to ensure each national tide gauge network gets the credit.

Table A1: It would also be good to list the actually data providers for each site here. The data provides run the networks that go into IOC, so I strongly think, she be named and get the credit – not just IOC.
Citation: https://doi.org/10.5194/essd-2025-767-RC2
- AC2:
  'Reply on RC2', Marijana Balić, 21 May 2026
  Overall, I consider this to be a very strong paper based on an excellent and valuable dataset. The authors should be commended for the substantial effort involved in compiling and quality-controlling the data; this represents a meaningful contribution to the sea-level research community. I particularly welcome the fact that the dataset has undergone explicit quality control, which significantly enhances its usefulness. The manuscript itself is detailed, clearly written, and generally well structured.
  That said, I have two major comments that I believe need to be addressed, along with a small number of minor issues. If these points are satisfactorily resolved, I would fully support acceptance of the paper for publication in Earth System Science Data.
  We thank the reviewer for the very positive comments and for recognizing the effort we put into compiling and quality‑controlling the SHELDA dataset. We are glad that the reviewer finds the paper strong and the dataset useful for the sea‑level research community. We will carefully address all major and minor points raised and revise the manuscript accordingly.
  
  Major comments
  Relationship to existing datasets (MISELA): A key question that remains insufficiently addressed is how SHELDA differs from, and adds value beyond, the existing MISELA dataset. MISELA already provides high-frequency (1-minute), quality-controlled sea-level data from 331 sites. In the Introduction (particularly around page 3), the authors should explain much more clearly why a new dataset is required, what its specific advantages are relative to MISELA, and what scientific or practical gap it fills. Related to this, it is not clear why SHELDA is not presented as a sub-component or extension of MISELA. Clarifying this distinction would greatly strengthen the motivation for the dataset. In addition, the rationale for focusing exclusively on Europe should be explained more explicitly, as this choice is not currently well justified.
  We will clearly highlight differences between MISELA and SHELDA in the Introductions. Although both MISELA and SHELDA are based on data available at the IOC-UNESCO website, these two datasets differ in the contained data. MISELA dataset contains only high-frequency component of sea level time series, i.e. it contains high-pass filtered series (with a cut-off period of 2 h), further implying that quality control was done only on high-pass filtered series. The SHELDA dataset, on the other hand, contains quality controlled original time series. Thus, MISELA allows only for study of short-period sea level oscillations (T < 2 h), whereas SHELDA allows for study of both short and long period processes, and additionally of contribution of short-period oscillations to total sea level, in particularly extreme sea levels.
  
  The regional focus to Europe is justified by: (i) the high effort needed to do quality control of 1-minute sea level data – doing quality control of this extent for the entire World would be too demanding given our resources; (ii) along with the northern America, the European coasts represent a continent for which most long-term 1-15 minute sea level data is available; (iii) sea level variability along the European coasts is very diverse – this refers to tidal ranges, tidal types, ranges of low-frequency (e.g., planetary scale variability, synoptic scale variability, storm surges) ranges of high-frequency (e.g., long ocean waves, tsunamis, seiches) oscillations, heights of extremes, exposure to seismic tsunamis and meteotsunamis, etc.
  
  Temporal coverage of the dataset: The dataset ends in December 2021, meaning the most recent data are now almost five years old. While I fully appreciate the significant effort required to compile and quality-control datasets of this kind, it would substantially increase the value of SHELDA if more recent data could be included. If this is not feasible, the authors should provide a clearer explanation of why the dataset currently only extends to the end of 2021, for example due to data availability, access restrictions, or quality-control constraints.
  We thank the reviewer for acknowledging the effort involved in compiling and quality-controlling this dataset. We fully recognize that more recent data would enhance its utility. SHELDA dataset is limited to December 2020 due to the extensive time and effort needed for performing quality control of the data. We plan to extend the dataset in the future as resources allow – and now clearly indicate this in the manuscript. We further discuss alternative quality control methodologies, including AI based ones, as these might speed and improve the process of quality control.
  
  Minor comments
  Line 95– While it is true that IOC data are updated most frequently, it would be appropriate to note that data quality is often poor and inconsistent, which limits its suitability for many scientific applications.
  We will briefly mention this point here, and we will emphasize that this aspect is explained more clearly and in greater detail at the beginning of the section 2.2.
  
  Line 74– It is unclear which specific BODC datasets are being referred to here. The URL provided links to the general archive, and BODC no longer actively maintains a global sea-level dataset. I would recommend removing BODC from this list. Instead, the list would be clearer and more accurate if it included only PSMSL, UHSLC, IOC, GESLA, SONEL, and MISELA.
  In line with your suggestion, we will remove BODC from the list of global data sources and retain only PSMSL, UHSLC, IOC, GESLA, SONEL, and MISELA.
  
  Line 203– The text states that the dataset ends in December 2020, which appears inconsistent with other parts of the manuscript. Please clarify how 2021 data are treated.
  To clarify, no data from the 2021 were included or analyzed in this study. The final data point in our time series is recorded precisely at 00:00 UTC on 1 January 2021.
  
  We will update the text to ensure consistency with the rest of the manuscript.
  
  Line 228– The criteria used to remove values (a 50 cm difference from a neighbouring value, and a 30 cm difference from both neighbouring values) would benefit from a scientific justification. On what basis were these thresholds chosen? Were they informed by instrumental characteristics, tidal variability, or sensitivity testing?
  These values were chosen to flag physically implausible spikes while preserving genuine short‑duration sea‑level variability, consistent with the high temporal resolution of 1‑minute data. Inspection of data done by authors of original papers using this threshold (Vilibić and Šepić, …, Zemunik et al…) revealed that these criteria remove only unphysical spikes, however not all of them, so visual control is needed as well.
  
  Table 1: I strongly feel that the original data providers should be explicitly acknowledged in the netcdf files, not just IOC – this is to ensure each national tide gauge network gets the credit.
  We will add the original data providers to NetCDF file.
  
  Table A1: It would also be good to list the actually data providers for each site here. The data provides run the networks that go into IOC, so I strongly think, they should be named and get the credit – not just IOC.
  We will add information on the actual data providers to Table A1 and to the other relevant tables (Table A2 and Table A3).
  
  Citation: https://doi.org/10.5194/essd-2025-767-AC2

Marijana Balić and Jadranka Šepić

Data sets

SHELDA: Sub-hourly European quality controlled sea level dataset Marijana Balić and Jadranka Šepić https://doi.org/10.14284/764

Marijana Balić and Jadranka Šepić

Viewed

Total article views: 815 (including HTML, PDF, and XML)

HTML	PDF	XML	Total	BibTeX	EndNote
418	344	53	815	51	91

HTML: 418
PDF: 344
XML: 53
Total: 815
BibTeX: 51
EndNote: 91

Views and downloads (calculated since 28 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	109	27	6	142
Jan 2026	125	96	7	228
Feb 2026	11	42	7	60
Mar 2026	48	50	13	111
Apr 2026	67	65	10	142
May 2026	42	44	7	93
Jun 2026	9	10	3	22
Jul 2026	7	10	0	17

Cumulative views and downloads (calculated since 28 Dec 2025)

Month	HTML	PDF	XML	Total
Dec 2025	109	27	6	142
Jan 2026	125	96	7	228
Feb 2026	11	42	7	60
Mar 2026	48	50	13	111
Apr 2026	67	65	10	142
May 2026	42	44	7	93
Jun 2026	9	10	3	22
Jul 2026	7	10	0	17

Viewed (geographical distribution)

Total article views: 815 (including HTML, PDF, and XML) Thereof 815 with geography defined and 0 with unknown origin.

Country	#	Views	%

Cited

Latest update: 27 Jul 2026

Short summary

SHELDA is sub-hourly European quality controlled sea level dataset with 257 records in NetCDF format, each representing a quality-controlled time series sampled at intervals between 1 and 15 minutes, along with residual derived by removing tidal components. It provides reliable, high-quality data essential for accurately analysing short-term sea level variations and extreme events, which are often overlooked due to limited resolution or insufficient quality control in existing datasets.


Total:	0
HTML:	0
PDF:	0
XML:	0

SHELDA: Sub-hourly European Quality Controlled Sea Level Dataset

Data sets

Viewed

Viewed (geographical distribution)

Cited

1 citations as recorded by crossref.