MISELA: 1-minute sea-level analysis global dataset

Sea-level observations provide information on a variety of processes occurring over different temporal and spatial scales that may contribute to coastal flooding and hazards. However, global research of sea-level extremes is restricted to hourly datasets, which prevent quantification and analyses of processes occurring at timescales between a few minutes and a few hours. These shorter period processes, like seiches, meteotsunamis, infragravity and coastal waves, may even dominate 10 in low-tidal basins. Therefore, a new global 1-minute sea-level dataset MISELA (Minute Sea-Level Analysis) has been developed, encompassing quality-checked records of nonseismic sea-level oscillations at tsunami timescales (T<2h) obtained from 331 tide-gauge sites (https://doi.org/10.14284/456, Zemunik et al., 2021b). This paper describes data quality-control procedures applied to the MISELA dataset, world and regional coverage of tide-gauge sites and lengths of time-series. The dataset is appropriate for global, regional or local research of atmospherically-induced high-frequency sea-level oscillations, 15 which should be included in the overall sea-level extremes assessments.


Introduction
Extreme sea-level events represent a major hazard in coastal zones and have an immediate impact on the coasts, unlike processes acting on longer timescales such as the rise of the mean sea-level, which leaves much more time to adapt (Menéndez and Woodworth, 2010). The sensitivity of the coastal zone infrastructure and population to extreme sea levels 20 emphasizes the need for investigation of their sources and characteristics, estimation of their incidence and strengths, cataloguing of historical events, assessments of their behaviour under the future climate, development of warning systems and ultimately arranging possible adaptation measures to these phenomena. However, these attempts are significantly limited by the availability of sea-level data in terms of resolution, coverage and quality.
Tide gauge observations provide information on a wide range of oceanographic processes, including extreme events 25 associated with tsunamis, storm surges and other causes of sudden coastal inundations. It has been recognized long ago that well-organised and accessible sea-level databases are a prerequisite for gaining knowledge on sea-level extremes (e.g. Vafeidis et al., 2008;Hunter et al., 2017) and, consequently, for the management of coastal hazards. However, no qualitychecked global sea-level datasets exist with temporal resolutions higher than an hour, i.e. covering periods at whichin addition to extraordinary events like tsunamisa variety of processes may contribute substantially to, or even dominate the 30 https://doi.org/10.5194/essd-2021-134 overall sea-level extremes (Vilibić and Šepić, 2017). Many research activities have been based on 1-minute sea-level records, mainly being focused on specific regions known for frequent occurrence of meteotsunamis or high-frequency sealevel oscillations, such as the Mediterranean Sea (e.g. Šepić et al., 2015), Sicily (e.g. Šepić et al., 2018;Zemunik et al., 2021a), the Adriatic Sea (e.g. Šepić et al., 2016), the Balearic Islands (e.g. Marcos et al., 2009), the Finnish coast (e.g. Pellikka et al., 2014), the Great Lakes (e.g. Bechle et al., 2016), the U.S. East Coast (e.g. 35 Pasquet et al., 2013), the Chilean coast (e.g. Carvajal et al., 2017), Japan (e.g. Heidarzadeh and Rabinovich, 2021), Australia (e.g. Pattiaratchi and Wijeratne, 2014), and many other.
Accessible global sea-level datasets differ in both sampling and latency, following the needs of the scientific and user communities, from quantification of climate changes and sea-level rise (e.g. Jevrejeva et al., 2006) through studying of sealevel extremes (e.g. Menéndez and Woodworth, 2010). Global sea-level datasets coming from tide gauge observations are 40 dominantly assembled and archived in the following data centres and datasets: 1. Permanent Service for Mean Sea Level (PSMSL, https://www.psmsl.org), providing monthly and annual mean values of sea-level for ca. 1550 stations, mainly being used in climate sea-level studies (Woodworth and Player, 2003); 2. British Oceanographic Data Centre (BODC, https://www.bodc.ac.uk), handling hourly sea-level data for ca. 215 45 stations in delayed-mode (up to a year), during which the centre performs inspection and quality-control; 3. Global Extreme Sea Level Analysis dataset (GESLA, http://www.gesla.org, Woodworth et al., 2016Woodworth et al., , 2017, containing global sea-level data with an hourly resolution at the majority of 1355 tide gauges, however the qualitycheck has not been undertaken centrally but relies on procedures undertaken by data providers; 4. University of Hawaii Sea Level Centre (UHSLC, https://uhslc.soest.hawaii.edu), distributing both preliminary 50 quality-checked data in fast-mode (1-2 months) for ca. 290 stations and fully quality-checked hourly sea-level dataset through Joint Archive for Sea Level (JASL) (Caldwell et al., 2015)  Convincingly, only the last dataset contains global sea-level records coming from tide gauges measuring at a minute resolution, however the disadvantage is that there is no possibility of undertaking quality-control in real-time, therefore these 60 raw records may contain many different problems (UNESCO, 2020). It should be noted here that some services freely share their 1-min data through specific databases, but covering national coastlines or limited areas, like NOAA Tides and Currents dataset (https://tidesandcurrents.noaa.gov). In order to override these issues and provide a consistent global-scale dataset of research quality, the Minute Sea-Level Analysis (MISELA) dataset was developed and will be presented in this paper. worldwide for a period from 2004 to 2019. Having access to a global dataset of 1-minute sea-level data may accelerate the research on various high-frequency sea-level phenomena such as seiches, meteotsunamis, infragravity and coastal waves (e.g. Monserrat et al., 2006;Yankovsky, 2009;Pellikka et al., 2014;Dodet et al., 2019), which is definitely not possible to achieve using hourly measurements.
The paper is organized as follows. In Section 2 the sources of the data used for the MISELA dataset and the quality-check 70 procedure are thoroughly described. Section 3 presents the MISELA dataset, the global and regional coverage of the qualitychecked time-series and the basic statistics of the dataset. The paper finishes with the data availability statement and discussion on applications, perspectives and possible improvements of the MISELA dataset.

Sources of data 75
The main source for constructing the MISELA dataset is Intergovernmental Oceanographic Commission (IOC) Sea Level Station Monitoring Facility (SLSMF, http://www.ioc-sealevelmonitoring.org), which provides raw sea-level data received in real-time from more than 160 providers that presently operate with approximately 935 tide gauge stations. However, the network of tide gauges is not completely operational as many stations do not report data regularly to the facility, but the coverage is still appropriate for global studies. 80 The IOC database has been established following the disastrous 2004 Indian Ocean tsunami (Chlieh et al., 2007), after which UNESCO through IOC coordinated efforts in developing regional tsunami warning systems (Amato, 2020). Besides giving access to the data, the main objective of the facility is to inform users about the status of station availability and performance. This includes controlling the tide gauge stations metadata and regularly checking the operational status of all stations, as well as contacting operators regarding non-operating stations. Another important objective is a display service 85 through which one can undertake quick visual inspection of the raw data in a selected half-daily, daily, weekly or monthly period during which the chosen station was operational (IOC, 2012). It is also possible to download the data.
As real-time data are mostly used for operational purposes, the IOC data have not undergone any quality-check procedure and are shared as received from providers. Expectedly, many time-series are of bad quality with spikes, shifts, drifts and other errors which are due to malfunctions of instruments ( Fig. 1), being dependent on the real-time quality procedures set 90 up by the operators and on the quality of sensors and instrumentation on the sites. The majority of the tide gauges are providing data with a 1-minute frequency of sampling, yet some of them are still recording on a multi-minute timescale and are thus not included in the MISELA dataset. Further, some stations have multiple sensors (e.g. pressure, radar and bubbler sensors) to provide cross-calibration between measurements. Each of the station comes with an information on a reference code, location and country of the tide gauge, contacts of the local agency operating the station, geographic position, type of 95 sensor for measurement and sampling rate, and other.  be included in the MISELA dataset. These stations, except Split, can also be found in the IOC SLSMF dataset, but only after October 2018, whereas the IOF provided the data from May 2017 onwards. 105

Quality-control (QC) procedures
The first step in the development of the MISELA dataset was implementing a procedure that reads and stores data from the IOC SLSMF portal for the period from the beginning of the station activity until June 2018. After obtaining the sea-level time-series from the IOC, FMI and IOF stations, for further processing, we selected stations having at least a 2-year-long series and containing no more than 30% of data gaps. For stations having multiple sensors we selected the series being the 110 longest or with the lowest percentage of data gaps. The stations having data records of very low quality (too many spikes, incorrect records), spotted by visual checking, were also not taken into the processing. Along with 13 FMI and 4 IOF stations, 314 stations were selected from the IOC satisfying the above conditions, constituting 331 time-series in total.
The dataset required further processing as it contained numerous data quality problems (Fig. 1). First, the series were detided by removing all significant tidal components using the Matlab software package T_Tide (Pawlowicz et al., 2002) in order to 115 allow for simpler visual inspection of the residual signal. The automatic quality-control procedures included removing of out-of-range values, i.e. values 50 cm differing from one neighbouring value or 30 cm differing from both neighbouring values (20 and 15 cm, respectively, in case of the FMI stations). The automatic spike detection procedure was continued by applying the methodology described by Williams et al. (2019), removing the values that deviate from a spline fitted using a least-square method. After the automatic control, remaining spikes were detected and removed by visual scanning of all 120 records. In this time-consuming process, each series was inspected over 15-day-long windows, and spurious spikes and isolated data that have passed through the automatic procedures were manually removed. During these quality-control steps, a considerable amount of data has been removed, in particular at the beginning or end of the time series. Therefore, the MISELA's time-series might be shorter (down to 1.5 years) or gap-denser, when compared to the raw series. Unlike the existing automatic quality-check systems SELENE (EuroGOOS DATA-MEQ working group, 2010) and Automatic Tide 125 Gauge Processing System from the NOC (Williams et al., 2019), our approach introduced manual procedure as well, given the great variety of data problems coming from a wide range of operators, operating procedures and sea-level sensors. Not all problems were removed properly and thus a more robust approach, than provided by the fully automated system, was combined with other existing datasets (at hourly resolutions) that are available by the known databanks (like these listed in Section 1). Prior to the filtering, linear interpolation of gaps shorter than one week was carried out, as the digital filtering requires a continuous time-series. While a great majority of data outliers have been removed from the records, some have undoubtedly remained in the data as the visual control is subject to errors and omissions. The complete process of the QC 140 procedure is illustrated in Fig. 2, while Fig. 3 demonstrates three examples of sea-level series before and after applied procedures.

Description of the MISELA dataset
The MISELA dataset contains 331 data files in the NetCDF format, each corresponding to high-frequency sea-level timeseries from one tide gauge. The file contains three variables: time, nslott (nonseismic sea-level oscillations at tsunami timescales, Vilibić and Šepić, 2017) and QC, along with global attributes on the station code, geographic position of the 150 station, origin of data and contact person for the dataset. Table 1 shows an example of a MISELA file with the station name abas. This is a 4-letter station code taken from the IOC Sea Level Station Monitoring Facility website, therefore one can easily find additional metadata about each IOC station if needed (e.g. location, country, local contact, type of sensor, etc.). is the final product obtained after the whole process of quality-check and contains the sea-level time-series filtered with a high-pass filter (cut-off period of 2 hours). 160  Figure 4 shows that the MISELA dataset has an acceptable geographical distribution, covering many of the World's coasts.
The tide gauge network is denser in the areas having a long history of sea-level monitoring, in particular at the tsunami timescale, like the Mediterranean Sea, both the East and West Coasts of the US and the coasts of Chile and Australia.
Additionally, many island countries and archipelagos have well-developed network of tide gauges such as Japan, New 165 Zealand, the Aleutian Islands, the Hawaii and the Caribbean. However, some areas still have lower spatial station coverage, including the east coast of South America and the entire African coast, the Middle East, the Indonesian and Russian coasts, presumably due to under-investment in sea-level monitoring or due to data-sharing restriction policies. In general, the Northern Hemisphere dominates over the Southern Hemisphere in terms of spatial coverage (70% of stations are in the Northern Hemisphere), particularly the zone between 30 and 60°N that contains 137 densely deployed stations spreading 170 over the coasts of North America, Europe and Japan.   In total, the MISELA dataset contains 2303 station-years of data spanning between 2004 and 2019, with an overall average record length of nearly 7 years, but varying from only 1.5 years at some stations to 12 years at others. Longer records (>10 years) are primarily located in the Baltic and Australia, while shorter ones (<4 years) are grouped in Chile, Central America 185 and Indonesia. An important contribution to the overall dataset comes from densified sub-systems such as the Mediterranean, Japan, Gulf of Mexico and New Zealand in which records of various lengths can be found.  Table 2 shows that in 190 average the longest time-series (8.3 years) are available for the stations of NWH, followed by the ANSA and EU regions (7.8 and 7.4 years), while shortest-averaged records are found in the SSA and ASWA regions (5.1 and 5.8 years).
Interestingly, some of the longest records are found in the ASWA and CSP regions that mostly have shorter time-series (Fig.   6b).   A new global dataset of high-frequency sea-level oscillations, the MISELA dataset, was specifically designed and created to serve as a tool for coastal hazard assessment, in particular coming from atmospherically-induced high-frequency sea-level oscillations. This hazard has so far been underrated (Pattiaratchi and Wijeratne, 2015), primarily due to a lack of sea-level 215 measurements at a minute timescale. Fortunately, the "rate" of the research on high-frequency sea-level oscillations, in particular on meteotsunamis, strongly increased in recent years (Vilibić et al., 2021). It is not certain how will highfrequency sea-level oscillations change under the future climate scenarios, although there is at least one research stating that these oscillations might become more frequent and of higher magnitude (Vilibić et al., 2018). Therefore, the importance of having a dataset that may provide the quality-checked global data for coastal studies is inevitable. 220

Number of stations
The MISELA dataset merges data from different sources to create a consistent dataset, which may serve for researching the magnitude and incidence of moderate and extreme high-frequency sea-level phenomena, like meteotsunamis, on the global scale. The primary motivation stems from the need to gather measurements, standardize them and bring to research-quality level. To this day, none of the existing sea-level databanks has provided global quality-checked dataset with the sampling interval of 1 minute. 225 Obtaining a global dataset of 1-minute sea-level data has several objectives. One of the main aims is to encourage an increase of the sampling resolution on numerous tide gauges that retained a lower frequency of sampling. Another objective, adopted from the Global Sea Level Observing System (GLOSS), is to stimulate the installation of tide gauges according to all international standards on coasts where none of them exist at present (IOC, 2012). New tools and technologies for observing and processing sea-level data (e.g. Pérez et al., 2014;García-Valdecasas et al., 2021) allowed for instrumentation 230 to reach a standard in sea-level measurements at a minute timescale, therefore contributing to the improvement of existing high-frequency sea-level networks and development of new ones. This also includes the development of quality-check procedures in real-time; however, for scientific purposes, such an automatic quality-check may not be enough to reach a fully controlled data product. Recent manual on the quality-check of sea-level data (UNESCO/IOC, 2020) has gathered all relevant aspects and recommendations on this topic. In summary, the quality-check must maintain common standards, 235 acquire consistency and ensure reliability and in that way may contribute to processing the data according to 'FAIR' Guiding Principles for scientific data management and stewardship (Wilkinson et al., 2016). Following these principles, all timeseries stored in the MISELA dataset have undergone such a standardized quality-check procedures (described in Sect. 2.2).
However, the vast of efforts during the quality-check were spent on visual (manual) inspection, as the series suffer from data problems undetectable by automatic procedures. Together with the development of new techniques for quality-check and a 240 great effort for standardisation, more procedures can hopefully be automatized in the future, hence the amount of time dedicated to visual inspection may be reduced.
In spite of all arguments, there are tide gauges and tide gauge networks having a lower sampling resolution, thus providing data from which high-frequency sea-level oscillations cannot be extracted nor studied properly. For example, the tide gauge network of the United Kingdom is still operating with the resolution of 15 minutes, although such a coarse sampling 245 resolution may strongly affect the estimate of coastal sea-level extremes (Tsimplis et al., 2009). For that reason, Vilibić and Šepić (2017) concluded that the global tide-gauge network should be standardized to sample at the minute resolution and to report in real-time, with as much as possible quality-check procedures implemented before releasing data to the public.
Hopefully, that will be the way of global development of future sea-level networks.
There are number of future improvements that can contribute to the evolving of the MISELA dataset. Specifically, some 250 areas have a low station coverage due to meagre sea-level station networks or restrictive data policies, while some regions stand out as having significant development over the past years. For example, a major gap in the provision of data is related to the African coasts (an exception is part of the east African coast and nearby islands where tide-gauge stations were installed following the Sumatra tsunami). This is not a new issue, as attempts have been made to construct a sea-level network in Africa since the last century (IOC, 1997). However, the problem remains in the long-term maintenance. 255 Moreover, the MISELA dataset contains very few stations in the areas of the Middle East, India, Russia and the east coast of South America. The Global Sea-Level Observing System (GLOSS) core network of active tide gauge stations today contains slightly higher number of stations in these regions, being excluded from the MISELA dataset as they do not meet specific conditions on the length and continuity of the time-series and the resolution of the measurements. In addition, in some of these regions data ownership restricts data exchange (Woodworth et al., 2016), yet we hope that their operators may consider 260 providing 1-minute sea level data to the MISELA dataset. Last but not least, polar regions have always represented a great issue for tide-gauge operations, and their records are highly desirable in all aspects of sea-level research.
In the future, the MISELA dataset can be updated with new data as these become available. Extending the time-series can bring more reliable results of the studies. Also, as new tide gauges are being installed, the total number of stations in the MISELA dataset can increase, and a better global coverage can thus be achieved. 265

Author contributions
All authors participated in performing quality-check procedures. I.V. and P.Z. developed the concept of the manuscript, P.Z.
wrote the initial version of the text, while all authors commented on and revised the text and approved the manuscript.

Competing interests
The authors declare that they have no conflict of interest.