Minute Sea-Level Analysis (MISELA): a high-frequency sea-level analysis global dataset

Sea-level observations provide information on a variety of processes occurring over different temporal and spatial scales that may contribute to coastal flooding and hazards. However, global research on sea-level extremes is restricted to hourly datasets, which prevent the quantification and analyses of processes occurring at timescales between a few minutes and a few hours. These shorter-period processes, like seiches, meteotsunamis, infragravity and coastal waves, may even dominate in low tidal basins. Therefore, a new global 1 min sea-level dataset – MISELA (Minute Sea-Level Analysis) – has been developed, encompassing quality-checked records of nonseismic sea-level oscillations at tsunami timescales (T < 2 h) obtained from 331 tide-gauge sites (https://doi.org/10.14284/456, Zemunik et al., 2021b). This paper describes data quality control procedures applied to the MISELA dataset, world and regional coverage of tide-gauge sites, and lengths of time series. The dataset is appropriate for global, regional or local research of atmospherically induced high-frequency sea-level oscillations, which should be included in the overall sea-level extremes assessments.


Introduction
Extreme sea-level events represent a major hazard in coastal zones and have an immediate impact on the coasts unlike processes acting on longer timescales, such as the rise of the mean sea-level, which allow much more time for adaptation (Menéndez and Woodworth, 2010). The sensitivity of the coastal zone infrastructure and populations to extreme sea levels emphasizes the need for investigation of their sources and characteristics, estimation of their incidence and strengths, cataloguing of historical events, assessments of their behaviour under the future climate, development of warning systems, and, ultimately, the conception of possible adaptation measures to these phenomena. However, these attempts are significantly limited by the availability of sealevel data in terms of resolution, coverage and quality.
Tide-gauge observations provide information on a wide range of oceanographic phenomena, including extreme events associated with tsunamis, storm surges and other causes of sudden coastal inundations. It has long been recognized that well-organized and accessible sea-level databases are a prerequisite for gaining knowledge on sea-level extremes (e.g. Vafeidis et al., 2008; and, consequently, for the management of coastal hazards. However, no quality-checked global sea-level datasets afford sufficiently high temporal resolution to cover periods at which -in addition to extraordinary events like tsunamis -a variety of processes may contribute substantially to, or even dominate, the overall sea-level extremes (Vilibić and Šepić, 2017). Many research activities have been based on 1 min sea-level records and have mainly been focused on specific regions known for the frequent occurrence of meteotsunamis or high-frequency sea-level oscillations, such as the Mediterranean Sea (e.g. Šepić et al., 2015), Sicily (e.g. Šepić et al., 2018;Zemunik et al., 2021a), the Adriatic Sea (e.g. Šepić Published by Copernicus Publications. 4122 P. Zemunik et al.: MISELA et al., 2016), the Balearic Islands (e.g. Marcos et al., 2009), the Finnish coast (e.g. Pellikka et al., 2014), the Great Lakes (e.g. Bechle et al., 2016), the East Coast of America (e.g. Pasquet et al., 2013), the Chilean coast (e.g. Carvajal et al., 2017), Japan (e.g. Heidarzadeh and Rabinovich, 2021), Australia (e.g. Pattiaratchi and Wijeratne, 2014), the Caribbean (Woodworth, 2017) and many others.
Accessible global sea-level datasets differ in both sampling and latency, following the needs of the scientific and user communities, from the quantification of climate changes and sea-level rise (e.g. Jevrejeva et al., 2006) through to the study of sea-level extremes (e.g. Menéndez and Woodworth, 2010). Global sea-level datasets from tide-gauge observations are dominantly assembled and archived in the following data centres and datasets:  Woodworth et al., 2016Woodworth et al., , 2017, which contains global sea-level data with an hourly or higher (e.g. 10 or 15 min) resolution at the majority of 1355 tide gauges, although quality control is not undertaken centrally and instead relies on procedures undertaken by data providers; Only the last dataset contains global sea-level records from tide gauges measuring at a 1 min resolution. However, the disadvantage is that there is no possibility of undertaking quality control in real time. Therefore, these raw records may contain many different problems (UNESCO, 2020). It should be noted here that some services freely share their 1 min data through specific databases, although the data only cover national coastlines or limited areas, like the NOAA Tides and Currents dataset (https://tidesandcurrents.noaa.gov, last access: 19 August 2021). In order to override these issues and provide a consistent global-scale dataset of research quality, the Minute Sea-Level Analysis (MISELA) dataset was developed and will be presented in this paper. MISELA contains delayed-mode 1 min quality-checked and high-pass-filtered (2 h cut-off period) sea-level records from a large number of tide gauges worldwide for a period from 2004 to 2019. Having access to a global dataset of 1 min sea-level data may accelerate the research on various high-frequency sea-level phenomena such as seiches, meteotsunamis, infragravity and coastal waves (e.g. Monserrat et al., 2006;Yankovsky, 2009;Pellikka et al., 2014;Pattiaratchi and Wijeratne, 2015;Dodet et al., 2019), which cannot be researched using hourly measurements. The paper is organized as follows. In Sect. 2, the sources of the data used for the MISELA dataset and the quality control procedure are thoroughly described. Section 3 presents the MISELA dataset, the global and regional coverage of the quality-checked time series, and the basic statistics of the dataset. The paper finishes with the data availability statement and discussion on applications, perspectives and possible improvements of the MISELA dataset.

Sources of data
The main source for constructing the MISELA dataset is the Intergovernmental Oceanographic Commission Sea Level Station Monitoring Facility (Flanders Marine Institute (VLIZ) and Intergovernmental Oceanographic Commission (IOC), 2021), which provides raw sea-level data received in real time from more than 160 providers that presently operate approximately 935 tide-gauge stations. However, the network of tide gauges contains some stations that are in disrepair (total number of the IOC stations is ca. 1100).
The IOC database has been established following the disastrous 2004 Indian Ocean tsunami (Chlieh et al., 2007), after which UNESCO, through IOC, coordinated efforts to develop regional tsunami warning systems (Amato, 2020). The main objective of the facility is to inform users about the status of station availability and performance (Aarup et al., 2019). This includes displaying the tide-gauge station metadata and regularly checking the operational status of all stations, as well as contacting operators regarding nonoperating stations. Another important objective is a display service through which one can undertake a quick visual inspection of the raw data in a selected half-daily, daily, weekly or monthly period during which the chosen station was operational (IOC, 2012). It is also possible to download the data for the whole operational period. However, any research use of these data would require additional processing (e.g. quality control), in order to properly prepare and involve data in statistical analyses and avoid misleading results and conclusions (Aarup et al., 2019).
As real-time data are mostly used for operational purposes, the IOC data have not undergone any quality control procedure and are shared "as received" from providers (see http:// www.ioc-sealevelmonitoring.org/disclaimer.php, last access: 19 August 2021). Expectedly, many time series are of poor quality with spikes, shifts, drifts and other errors due to instrument malfunctions (Fig. 1), with the quality being dependent on the real-time quality control procedures set up by the operators and on the quality of sensors and instrumentation at the sites. The majority of the tide gauges provide data with a 1 min sampling frequency; however, some of them still record on a multi-minute timescale and are, thus, not included in the MISELA dataset. Further, some stations have multiple sensors (e.g. pressure, radar and bubbler sensors) to provide cross-calibration between measurements. Each of the stations comes with information such as the reference code, location and country of the tide gauge, the contact information for the local agency operating the station, the geographic position, the type of sensor for measurement and the sampling rate.
Furthermore, 13 stations operated by the Finnish Meteorological Institute (FMI, https://en.ilmatieteenlaitos.fi/, last access: 19 August 2021) and situated on the east coast of the Baltic Sea are included in the MISELA dataset. The 1 min sea-level records are available from 2004 and have already been used in several regional studies on meteorological tsunamis along the Finnish coast (e.g. Pellikka et al., 2014;Jylhä et al., 2018). The FMI data are not included in the IOC SLSMF database. Finally, sea-level data from four stations in the Adriatic Sea were provided by the Institute of Oceanography and Fisheries (IOF, https://acta.izor.hr/wp/en/, last access: 19 August 2021). These stations, except Split, can also be found in the IOC SLSMF dataset, although only after October 2018, whereas the IOF provided the data from May 2017 onwards.

Quality control (QC) procedures
The first step in the development of the MISELA dataset was implementing a procedure that reads and stores data from the IOC SLSMF portal for the period from the beginning of the station activity until June 2018. After obtaining the sea-level time series from the IOC, FMI and IOF stations, we selected stations with at least a 2-year-long series and no more than 30 % of data gaps for further processing. As the dataset is intended to be applicable for the statistical analysis of highfrequency sea-level processes, we chose a length of 1.4 years (70 % of 2 years) as a threshold, because short time series or those overly intermitted with data gaps would not significantly contribute to the research. For stations with multiple sensors, we selected the longest series or the series with the lowest percentage of data gaps. These gaps were not interpolated with the data recorded by the other sensors at the same station, as it appeared that the sensors may measure the intensity of the sea-level oscillations at a 1 min timescale differently. The datum and clock shift were also not considered, as this would require information that is not available at the IOC SLSMF. Stations with data records of very low quality (spikes that are distributed throughout most of the time series and appear on an hourly or multi-hourly basis, or obvious incorrect records like spurious oscillations produced by malfunctions of instruments), established via visual inspection, were also not included in the processing. Along with 13 FMI and 4 IOF stations, 314 stations were selected from the IOC that satisfied the above conditions, constituting 331 time series in total.
The dataset required further processing, as it contained numerous data quality issues (Fig. 1). The series were first detided by removing all significant tidal components using the MATLAB software package T_Tide (Pawlowicz et al., 2002) in order to allow for simpler visual inspection of the residual signal. The automatic quality control procedures included removing of out-of-range values, i.e. values with a 50 cm difference from one neighbouring value or a 30 cm difference from both neighbouring values (in case of the FMI stations, a 20 cm difference from one or a 15 cm difference from both neighbouring values). The automatic spike detection procedure was continued by applying the methodology described in Williams et al. (2019): removing the values that deviate by 3 standard deviations from a spline fitted using a leastsquares method. After the automatic control, the remaining spikes were detected and removed by visual inspection of all records. During this time-consuming process, each series was inspected over 15 d windows, and spurious spikes and isolated data that had passed through the automatic procedures were manually removed. In these quality control steps, a considerable amount of data was removed, in particular at the beginning or end of the time series. Therefore, the MIS-ELA time series might be shorter (down to 1.5 years) or have a percentage of gaps higher than 30 %, when compared to the raw series. Unlike the existing automatic quality control systems, SELENE (EuroGOOS DATA-MEQ working group, 2010) and Automatic Tide Gauge Processing System from the National Oceanography Centre (NOC) (Williams et al., 2019), our approach also introduced a manual procedure, given the great variety of data issues stemming from a wide range of operators, operating procedures and sea-level sensors. Not all issues (e.g. spikes, spurious oscillations, "stucks of instruments"; see Williams et al., 2019, for an explanation of the latter) were removed properly; thus, a more robust approach than that provided by the fully automated system was necessary, although it required a lot of effort and time.
The next step in creating the MISELA dataset was to exclude sea-level records observed during seismic tsunamis, as the applications are directed towards research on atmospherically induced sea-level oscillations, which has been an emerging field during the last few decades (e.g. Pattiaratchi and Wijeratne, 2015;Vilibić et al., 2021). Using the National Geophysical Data Center/World Data Service (NGDC/WDS) Global Historical Tsunami Database (https://www.ngdc. noaa.gov/hazard/tsu_db.shtml, last access: 19 August 2021), we listed all tsunamis from 2006 to 2018 and deleted several days of data (depending on the tsunami intensity) during each recorded tsunami at all stations in the area. To restrict to the data to the high-frequency sea-level signal only, the final step included digital filtering of the data by the highpass Kaiser-Bessel filter (Thomson and Emery, 2014;Šepić et al., 2015;Vilibić and Šepić, 2017) with a cut-off period of 2 h. Therefore, the applications of the MISELA dataset are designed exclusively for researching atmospherically induced sea-level oscillations at tsunami timescales. However, the dataset might be combined with other existing datasets (at hourly resolutions) that are available from known databanks (like these listed in Sect. 1). Prior to filtering, linear interpolation of gaps shorter than 1 week was carried out, as digital filtering requires a continuous time series. While a great majority of data outliers were removed from the records, some undoubtedly remain in the data, as the visual control is subject to errors and omissions and is also, to a certain extent, subjective. It should be highlighted that sea-level data from the IOC SLSMF database up to June 2015 were downloaded, quality-controlled, processed and analysed by Vilibić and Šepić (2017); in this work, the data were further extended to June 2018, controlled following common quality control procedures and gathered into the MISELA dataset. The com- plete quality control (QC) procedure is illustrated in Fig. 2, while Fig. 3 demonstrates three examples of sea-level series before and after the procedures were applied.

Description of the MISELA dataset
The MISELA dataset contains 331 data files in the NetCDF format, each corresponding to high-frequency sea-level time series from one tide gauge. The file contains three variables: "time", "nslott" (nonseismic sea-level oscillations at tsunami timescales, Vilibić and Šepić, 2017) and "QC", along with global attributes including the station code, geographic position of the station, origin of data and contact person for the dataset. Table 1 shows an example of a MISELA file with the station name "abas". This is a four-letter station code taken from the IOC Sea Level Station Monitoring Facility website; therefore, one can easily find additional metadata about each IOC station if needed (e.g. location, country, local contact, type of sensor). The FMI and IOF stations differ from the IOC stations in that they have the full name of the station location in the title of the files (e.g. "helsinki", "degerby", "velaluka", "starigrad") instead of a shorter code name. The variable time is represented in the unit of minutes since 1 January 2000 00:00:00 UTC with the sea-level value noted in the same row as the nslott variable and the corresponding quality control flag of the data in the QC variable. The dimension of the variables provides quick information on the record length, considering that approximately half a million data points represent a 1-year-long record. The nslott variable is the final product obtained after the whole quality control process and contains the sea-level time series filtered with a high-pass filter (cut-off period of 2 h). Figure 4 shows that stations included in the MISELA dataset cover many of the world's coasts. The tide-gauge network is denser in the areas with a long history of sea-level monitoring, in particular at the tsunami timescale, like the Mediterranean Sea, both the East and West coasts of America and the coasts of Chile and Australia. Additionally, many island countries and archipelagos have well-developed networks of tide gauges, such as Japan, New Zealand, the Aleutian Islands, Hawaii and the Caribbean. However, some areas, including the east coast of South America and the entire African coast, the Middle East, and the Indonesian and Russian coasts, are still underrepresented in the IOC SLSMF, presumably due to underinvestment in sea-level monitoring or due to data-sharing restriction policies. In general, the Northern Hemisphere dominates over the Southern Hemisphere in terms of spatial coverage (70 % of stations are in the Northern Hemisphere), particularly in the zone between 30 and 60 • N that contains 137 densely deployed stations spread over the coasts of North America, Europe and Japan. Figure 5 shows a close-up of areas populated by stations, revealing densely distributed tide gauges on the coasts of the western Mediterranean and Europe, the Finnish coast, the Gulf of Mexico, the Caribbean Islands, the East and West coasts of America, and the Japanese and Chilean coasts, indicating that satisfactory coverage exists for regional investigations.
In total, the MISELA dataset contains 2303 station-years of data spanning between 2004 and 2019, with an overall average record length of nearly 7 years, although this varies from only 1.5 years at some stations to 12 years at others. Longer records (> 10 years) are primarily located in the Baltic region and Australia, whereas shorter records (< 4 years) are grouped in Chile, Central America and Indonesia. An important contribution to the overall dataset comes from densified subsystems, such as the Mediterranean, Japan, the Gulf of Mexico and New Zealand, for which records of various lengths can be found.
For regional statistics, we classified stations into eight macro-regions: Europe (EU), Central and North-east America (CNEA), North-west America and Hawaii (NWH), East Asia (EA), Africa and South-west Asia (ASWA), Australia, New Zealand and South Asia (ANSA), southern South America (SSA) and the central and southern Pacific (CSP). Table 2 shows that, on average, the longest time series (8.3 years) are available for the stations in the NWH macroregion, followed by the ANSA and EU macro-regions (7.8 and 7.4 years respectively), whereas, on average, the shortest records are found in the SSA and ASWA macro-regions   (5.1 and 5.8 years respectively). Interestingly, some of the longest individual records are found in the ASWA macroregion, which mostly has shorter time series (Fig. 6b). Most of the sea-level observations in the MISELA dataset were made after 2011, when many tide gauges were installed or added to the IOC Sea Level Station Monitoring Facility as a reaction to the disastrous 2011 Tōhoku earthquake and tsunami in Japan (Simons et al., 2011;Fig. 6a). The expansion of the sea-level network in 2012 is particularly evident for the EA, CNEA and NWH regions, and numerous stations were also added in the SSA region in 2013. The EU area continuously has the highest number of stations among all macro-regions. All macro-regions show a positive trend in the number of active stations over the period from 2006 to 2018. It should be highlighted that we obtained records from the IOC stations for the period from as early as 1 January 2006, when the portal began operating, up until 14 June 2018, when we last downloaded data.  Unfortunately, we have not downloaded sea-level time series since 14 June 2018 due to extended time requirements involved with the data quality control. Nonetheless, most stations had been installed or started providing data after than January 2006 and some were uninstalled or stopped providing data before June 2018; therefore, these stations contain shorter records. Records from the 4 IOF stations end in December 2019, and records from the 13 FMI stations begin in January 2004 (the EU region), resulting in a lower number of stations at the beginning and at the end of the whole MISELA period (2004-2019; Fig. 6a).

Data availability
The data described in this paper can be accessed through the Marine Data Archive of the Flanders Research Institute (VLIZ) at https://doi.org/10.14284/456 (Zemunik et al., 2021b).

Conclusions and perspectives
A new global dataset of high-frequency sea-level oscillations, the MISELA dataset, was specifically designed and created to serve as a tool for coastal hazard assessment, in particular those from atmospherically induced highfrequency sea-level oscillations. The ability to study this hazard has, until recently, been restricted by technological and computational limitations on data storage, computational power of data-processing systems and telecommunications of earlier tide-gauge technology. Fortunately, the "rate" of research on high-frequency sea-level oscillations, in particular on meteotsunamis, has strongly increased in recent years (Vilibić et al., 2021). It is not certain how high-frequency sealevel oscillations will change under the future climate scenarios; however, there are methods that describe a methodology for estimating their future occurrence rates . Therefore, it is important to have a dataset that may provide the quality-checked global data for such coastal studies. The MISELA dataset merges data from different sources to create a consistent dataset, which may serve for research into the magnitude and incidence of moderate and extreme high-frequency sea-level phenomena, like meteotsunamis, on the global scale. The primary motivation stems from the need to gather measurements, standardize them and bring them to a research-quality level. To date, none of the existing sea-level databanks have provided a global quality-checked dataset with a sampling interval of 1 min. However, it should be emphasized here that the quality control procedure imposes some limitations on the dataset. Numerous issues (including shifts, drifts and spurious signals) in the raw data disabled the preparation of high-quality 1 min sea-level data from original measurements; instead, this work was forced to focus solely on high-frequency part of the signal. Filtering of the data removed vertical shifts and drifts that could not be removed by other automatic procedures. This has restricted the use of the MISELA dataset to research of high-frequency processes only. Furthermore, some issues have remained unresolved -for example, datum and clock shifts have not been processed, as this would require a tremendous amount of time and information that is not available at IOC SLSMF. Nevertheless, we expect that these issues only impact a low percentage of the overall data. Another future improvement of the dataset could be achieved by filling the data gaps with data from other sensors (where more than one is available), rather than interpolating. However, various sensors may measure sea-level oscillations at a 1 min timescale differently, due to the use of different averaging methods or the fact that some sensors are installed in a stilling well whereas others are not. Thus, the standardization of data from different sensors is required at locations where it can be achieved, although this depends on time, effort and financial investment. Nevertheless, this would be a way to improve the MISELA dataset.
Herein, we suggest several components of the future perspective in the research of high-frequency sea-level phenomena. The main component is concerned with an increase in the sampling resolution for numerous tide gauges that have retained a lower sampling frequency. Another component, emphasized by the Global Sea Level Observing System (GLOSS), refers to the installation of tide gauges according to all international standards on coasts where no gauges currently exist (IOC, 2012). New tools and technologies for observing and processing sea-level data (e.g. Pérez et al., 2013;García-Valdecasas et al., 2021) have enabled instrumentation to reach a standard in sea-level measurements at a 1 min timescale, thereby contributing to the improvement of exist-ing high-frequency sea-level networks and the development of new ones. This also includes the development of quality control procedures in real time; however, for scientific purposes, such automatic quality control may not be sufficient to reach a fully controlled data product. The recent manual on the quality control of sea-level data (UNESCO/IOC, 2020) has gathered all relevant aspects and recommendations on this topic. In summary, quality checks must maintain common standards, acquire consistency and ensure reliability in order to contribute to processing the data according to the FAIR (Findability, Accessibility, Interoperability and Reusability) Guiding Principles for scientific data management and stewardship (Wilkinson et al., 2016). Following these principles, all time series stored in the MISELA dataset have undergone a standardized quality control procedure (described in Sect. 2.2). However, the vast efforts during the quality control were spent on visual (manual) inspection, as the series suffer from data issues that are not detectable by automatic procedures. Together with the development of new techniques for quality control and a great effort towards standardization, more procedures can hopefully be automated in the future; hence, the amount of time dedicated to visual inspection may be reduced.
In spite of the above-mentioned arguments, there are tide gauges and tide-gauge networks that have a lower sampling resolution, thereby providing data from which highfrequency sea-level oscillations cannot be extracted nor studied properly. For example, the tide-gauge network of the United Kingdom is still operating with a resolution of 15 min, although such a coarse sampling resolution may strongly affect the estimate of coastal sea-level extremes (Tsimplis et al., 2009). For that reason, Vilibić and Šepić (2017) concluded that the global tide-gauge network should be standardized to sample at a 1 min resolution and to report, as far as possible, near-real-time quality-controlled data. In addition to this, it is mandatory to regularly maintain installed tide-gauge stations to ensure the quality of the data. Hopefully, global sea-level networks will develop in this way in the future.
There are a number of future improvements that could contribute to the evolution of the MISELA dataset. Specifically, some areas have a low station coverage due to sparse sealevel station networks or restrictive data policies, whereas some regions stand out as having made significant developments over the past years. For example, a major gap in the provision of data is related to the African coasts (an exception is part of the East African coast and nearby islands where tide-gauge stations were installed following the Sumatra tsunami). This is not a new issue, as attempts have been made to construct a sea-level network in Africa since last century (IOC, 1997;Woodworth et al., 2007). However, long-term maintenance remains a problem. Moreover, the MISELA dataset contains very few stations in the areas of the Middle East, India, Russia and the east coast of South America. The Global Sea-Level Observing System (GLOSS) core network of active tide-gauge stations today contains a slightly higher number of stations in these regions, although they are excluded from the MISELA dataset as they do not meet specific conditions regarding the length and continuity of the time series and the resolution of the measurements. In addition, in some of these regions, data ownership restricts data exchange (Woodworth et al., 2016); however, we hope that their operators may consider providing 1 min sea-level data to the MISELA dataset in the future. Last but not least, polar regions have always represented a great issue for tidegauge operations, and their records are highly desirable in all aspects of sea-level research.
In the future, the MISELA dataset can be updated with new data as these become available, although this would require the engagement of more human resources (necessary for carrying such extensive quality control procedures), preferably from sea-level data centres. Further, putting these activities -which are basically fulfilling demand from the community carrying out research on high-frequency sealevel oscillations and meteotsunamis -under the umbrella of GLOSS or other sea-level programmes would institutionalize the efforts and would result in an improved-quality product. Extending the time series would also make study results more reliable. Moreover, as new tide gauges are installed, the total number of stations in the MISELA dataset can increase, and a better global coverage can be achieved.