An Integrated Marine Data Collection for the German Bight – Part I: Subaqueous Geomorphology and Surface Sedimentology

The German Bight located within the central North Sea is a hydroand morphodynamically highly complex 10 system of estuaries, barrier islands and part of the world’s largest coherent tidal flats, the Wadden Sea. To identify and understand challenges faced by coastal stakeholders, such as harbor operators or governmental agencies, to maintain waterways and employ numerical models for further analyses, it is imperative to have a consistent data base for both bathymetry and surface sedimentology. Current commercial and public data products are insufficient in spatial and temporal resolution and coverage for recent analyses method. Thus, this first part of a two-part publication series of the German joint 15 project EasyGSH-DB describes annual bathymetric digital terrain models in a 10 m gridded resolution for the German North Sea coast and German Bight from 1996 to 2016 (Sievers et al., 2020a, 10.48437/02.2020.K2.7000.0001), as well as surface sedimentological models of discretized cumulative grain size distribution functions for 1996, 2006 and 2016 on 100 m grids (Sievers et al., 2020b, 10.48437/02.2020.K2.7000.0005). Furthermore, basic morphodynamic and sedimentological processing analyses, such as the estimation of e.g. bathymetric stability or surface maps of sedimentological parameters, are 20 provided (Sievers et al., 2020a, 2020b, see respective download links). https://doi.org/10.5194/essd-2021-47 O pe n A cc es s Earth System Science Data D icu ssio n s Preprint. Discussion started: 15 March 2021 c © Author(s) 2021. CC BY 4.0 License.


Introduction
The German Bight and the adjacent central North Sea (Figure 1) are highly complex systems of barrier islands, the world's largest coherent system of coastal tidal flats and multiple estuaries (Elias et al., 2012;Kabat et al., 2012;Benninghof and Winter, 2019). Recent research interest frequently focuses on the morpho-and hydrodynamic processes in this area, e.g. 25 Elias et al. (2012), Zijl et al. (2013), Heyer and Schrottke (2015), Wang et al. (2015) and Benninghof and Winter (2019). For this, numerical models of various implementations require input parameters gained from e.g. bathymetric or sedimentological data sets to gain further insight and understand longer term processes (Zijl et al., 2013;Arns et al., 2015;Wang et al., 2018;Rasquin et al., 2020) such as the sea level rise. Harbor operators for small-and large-scale maritime economy and tourism and other marine actors need to be able to identify, understand and potentially counteract changes and 30 developments to the coastline and estuaries and its surface sedimentology to maintain and operate within a profitable margin as well as be prepared for potential hazardous situations (Roeland and Piet, 1995;Kirichek et al., 2018;Wölfl et al., 2019;Kiricheck et al., 2020). Additionally, further applications that are not directly obvious are dependent on bathymetric information, such as search efforts in open-sea scenarios (Wölfl et al., 2019). It is thus vital to have a consistent set of highresolution elevation and surface sediment information. 35 Currently available data sets are still highly volatile in spatial and temporal resolution and coverage are compiled in publicly accessible portals and services (Wölfl et al., 2019). Portals and services such as EMODnet Bathymetry (www.emodnetbathymetry.eu), GEBCO (www.gebco.net) or ETOPO (www.ngdc.noaa.gov) offer bathymetric data sets with spatial resolutions of 70 m to approximately 2 km, depending on the overall spatial coverage, yet in part no certain indication about important quality elements, such as currentness or timeliness and traceability or data source, of the utilized data set. Surface 40 sedimentological information of the central North Sea are coarse for the application in numerical models with spatial resolutions of approximately 2 to 13 km with predetermined classifications and analyses such as median grain diameter already performed and provided by e.g. Helmholtz Zentrum Geesthacht (coastmap.hzg.de), EMODnet Geology (www.emodnet-geology.eu) or Wilson et al. (2018), which hinders custom extraction of parameters such as sorting or skewness coefficients or grain size classes that for numerical models. Portals for individual data publication, such as Pangaea 45 (www.pangaea.de), usually offer local data sets at a high resolution. However, a low overall coverage results in a decrease of larger scale scientific value, as region-specific availability and consistency issues with adjacent data exist. Apart from spatial resolution and coverage issues, temporal information is in part inconsistent or unavailable. For hydro-and morphodynamically highly dynamic regions such as the German North Sea coast, a specific date or time span of validity is essential to adequately analyze the occurring processes. German joint project AufMod began the process of alleviating these 50 issues with both increased temporal and spatial coverage and resolution (Heyer and Schrottke, 2015). Bathymetric regular 50 m gridded digital terrain models (DTMs) were provided from 1982 to 2012 for the inner German Bight as well as surface sedimentological information including full discretized cumulative grain size distribution (GSD) functions for individual further processing in a 250 m grid resolution (Milbradt et al., 2015). However, state-of-the-art modeling studies deploy https://doi.org/10.5194/essd-2021-47 Usually, the only way to obtain data valid for a specific date is to measure it at that specific point in time. Studies to undertake surveys in a very high, quasi-continuous, temporal resolution alleviate this issue, they are, however, generally very 75 small scale or on singular transects (Gallagher et al., 1998;Moulton et al., 2014). Concepts to temporally interpolate hydrographic and bathymetric information between fewer temporal sample points to reduce time and costs of such studies are well known (Moulton et al., 2014;Kuusela et al., 2018;Genchi et al., 2020), yet still focus on singular series of samples of an isolated area. Missing synoptic analysis bringing together multiple such surveys of varying types and resolutions from different points in time and space classify most of these data sets as the previously mentioned highly regionalized 80 individually published information. As digital terrain models (DTMs) today already aggregate data and employ spatial interpolation to create high coverage information, the next step is combining spatial and temporal interpolation of multiple data sets to create not only a spatially but also temporally continuous model space for numerous applications.

Introduction to the Functional Seabed Model
The Functional Seabed Model is a holistic data-based hindcast simulation model (Milbradt et al., 2015) and was first utilized 85 in larger scale in joint project AufMod, (Heyer and Schrottke, 2015) to provide consistent DTMs and base data for numerical modelling and morphodynamic analyses from multiple data sets originating from separate surveys for user defined spatial and temporal extent and validity. In its current form, input data and thus spatial and temporal range cover primarily Germany, the Southern and Central North Sea and the British Isles from the early 1950s until today. Its data base contains approximately 127,000 elevation data sets with 115 billion single data points ( Figure 2a) and approximately 95,000 surface sediment samples represented by 95 continuous grain size distribution (GSD) functions (Figure 2b), to meet requirements of advanced model systems as described in part two of this publication (Hagen et al., 2021, in review). Approximately 45,000 surface sediment samples are located within the AWZ. For the highest attainable transparency of provenance of our products, each individual bathymetric and sedimentological data set hat metadata attached, which include among others sample title, source organization and survey time as either an isolated instance or a temporal span. 100 Both bathymetric and surface sedimentological surveys create information at isolated points, thus interpolation and approximation methods are applied to create spatially continuous products for specific points in time. As surveys have varying purposes, they are undertaken as various survey types and a consequence, the structure of the initial processed data set is highly variable as well. Consequently, the anticipated quality of spatial representation is highly dependent on the applied approximation or interpolation method itself. While linear interpolation may be sufficient in high-density surveys, 105 they might not be adequate in other cases. This is where the FSMs ability to connect geometric data sets to metadata is of utmost importance. The aforementioned metadata information contains not only trivial entries such as the descriptive attributes but also information about the method that is to be utilized for spatial interpolation of the specific data set in question, including parameters such as search radii or required point counts. A small, non-exhaustive example of this survey type dependent approximation and interpolation definition defined for and utilized in the FSM is displayed in Figure 3. 110

Spatio-Temporal Interpolation and Approximation of Bathymetric Datasets 115
Spatial interpolation or approximation of bathymetric data sets in the FSM follows the well-known and fundamental concept of the linear combination of a weighting factor from a so-called base function and the data points elevation information. The base function is generally dependent on the spatial position of the data point and commonly produces weighting factors based on distances between data points and points to be interpolated or approximated. Both interpolation and approximation follow therefore Eq. (1), where ( ⃗) is the interpolated or approximated elevation at a position ⃗, φ ( ⃗) is a weighting 120 factor for point , which commonly returns 1 at point and 0 for all other points with ≠ , = ( ) is the elevation of sample point and is the number of all data points within the data set. (1) While both approximation and interpolation methods rely on the same core concept, only interpolation algorithms, as a 125 special case of approximation, reliably reflect the original data points information at its position. Approximation methods do not need to fulfill this requirement. Regardless of the explicit implementation of the method itself, it is imperative to note that both interpolated and approximated values that do not coincide with the actual data points are estimations. Survey types and consequently interpolation or approximation methods (see Figure 3) define the base functions to be used.
Temporal interpolation between two data points in time at the same spatial position is equivalent to spatial interpolation, but 130 instead utilizes not spatial but temporal distances as parameters for the base functions. It is necessary do employ temporal interpolation as well, as surveys rarely take place at the desired point in time and even if by coincidence they do regionally, on a larger scale it is not reasonably to be expected to have consistent coverage. Due to the generally high density of regional bathymetric surveys spatially and temporally, linear temporal interpolation (compare Eq. (1)) between the closest older and younger data sets in regard to the point in time and space is determined to be sufficient and subsequently used in the FSM. 135 Combining both spatial and temporal approaches yields the FSMs first and foremost defining quality: The spatio-temporal interpolation. Spatio-temporal interpolation of bathymetric datasets gives the ability to derive an elevation value and consequently DTMs at any given point in space and time within the FSMs model boundaries, see and is the number of data sets used. (2)

Spatio-Temporal Interpolation and Approximation of Surface Sedimentological Datasets
Spatio-temporal interpolation of surface sedimentological data sets is met with additional obstacles as compared to 150 bathymetric information. Both the temporal and spatial density of surface sediment data sets, which are comprised of single sediment samples with single GSD functions, is much lower. Figure 5 displays the temporal distribution of the approximately 45,000 surface sediment samples within the AWZ (compare Figure 2B). Apart from being highly variable, even in one of the years with the most samples taken, 1963, one samples represents on average approximately 5.7 km 2 in the near coastal environment, which is insufficient. Utilizing all samples for the same point in time, however, would provide approximately one sample in 0.5 km 2 for the highly dynamic near coastal environment, which is in combination with the on average higher spatial density of samples in more active regions, 160 acceptable. As multiple samples at the same position are virtually non-existent, temporal interpolation as described before cannot be applied. Thus, an extrapolation approach was developed that relies on the parametrization of the cumulative GSD function into median grain size, sorting and skewness coefficients after Folk (1980) and an ordinary differential equation as an initial value problem. Equation (3)  Based on the change of the median grain size, the change of sorting and skewness coefficients can be approximated as well.
With these, the fully continuous cumulative GSD function can be regenerated with Eq. (4), modified after Tauber (1997), as a logistic function, where (Φ) (t) is the extrapolated cumulative GSD function depending on grain size in Φ and 175 extrapolation point in time t, Φ 50 ( ) is the time-dependent median grain diameter in Φ resulting from addition of the initial median grain diameter of the original GSD function and the time-dependent change after Eq. (3), is the approximated sorting coefficient and is the approximated skewness coefficient (Folk, 1980). 180 The combination of this temporal extrapolation allows for subsequent spatial interpolation or approximation as shown in Eq. Optimized data storage and access solutions were developed, since potentially up to 95,000 references to fully continuous functions had to be handled with acceptable memory usage for each interpolated GSD function in case of global interpolations without maximum search radii. The weighting factor φ ( ⃗) is in this application commonly based on Shepard 190 spatial interpolation approaches that is further extended by variable search radii depending on the data density around each specific sample point and hydrodynamic factors such as bed shear stress data, provided by the second part of this publication (Hagen et al., 2021, in review), to introduce anisotropic metrics. Thus, a GSD function can be interpolated at any given point in space and time within the FSMs model boundaries, see Figure 6.

Surface Sediment Porosity
As utilized in Eq. (3), the surface substrate porosity is an essential factor to consider in development of the sedimentological 200 surface composition of the ocean floor, as generally a lower porosity leads to a denser material that has higher thresholds to erode and consequently leads to less change in elevation and thus after Eq. (3) sedimentological composition. As this is a model approach that is continuously developed and adjusted, we are aware that very well sorted fine materials such as silt and clay actually increase their total porosity, yet as muds in our data base of surface sediment samples tend to be moderately sorted or lower, we determined this effect to be negligible for the current application. Based on the parameters 205 that can be extracted or extrapolated from the data base, Eq.

Gridded base products
By employing spatial and temporal interpolation as described in Sect. 2, the FSM was used 21 bathymetric digital terrain models (DTM) for the German Bight ranging from 1996 to 2016 with validity dates on 01.07., respectively. Each DTM is provided as a structured grid with elevation data in a GeoTiff format with a spatial grid resolution of 10 m (EPSG 25832). 215 Furthermore, a 100 m structured grid of the AWZ is provided for 01.07.1996, see Figure 7a.

220
By employing spatial and temporal interpolation and extrapolation as described in Sect. 2, the FSM was used to generate three surface sedimentological models for the German Bight for 1996, 2006 and 2016, with validity dates on 01.07., respectively. Each surface sedimentological model is provided to users serving as a base data set for individual analysis applications as a structured grid as a CSV-file in EPSG 25832 with a spatial grid resolution of 100 m. Each file contains the model information in a 0.25-phi discretized cumulative GSD function, as well as coordinates and meta data such as date of 225 validity. Furthermore, a 250 m structured grid of the EEZ is provided for 01.07.1996, see Figure 7b represented by a further analysis of the cumulative function (see also later sections): The median grain size 50 in mm.

Polygonal base products
Basic bathymetric and surface sedimentological information is in practice often utilized in form of polygonal isoline. We thus generated isobaths for each bathymetric DTM in full spatial coverage in 0.5 m steps for high resolution analyses and 230 10 m steps for general display purpose. Additionally, each median grain diameter ( 50 ) gridded product is provided with polygonal representation as well. As the 50 is provided in metric scale, a logarithmic discretization for the isolines is required and utilized as defined by the German language version standard DIN EN ISO 14688 (Deutsches Institut für Normung, 2018), which defines grain size fractions on a logarithmic scale.

Gridded analysis products 235
Based on the DTM products, geomorphological analyses i.e. the development of elevation, minimum and maximum elevation and their range (termed as the bed elevation range (Figure 8a) by Winter, 2011), are performed.
The bed elevation range provides valuable insight into recent morphological activity, as a high value imply strong morphological activity over the analysis time span. The morphological drive (MD, Figure 8b) further expands on the concept of stability by using rates of elevation changes instead of absolute changes, which helps to assess whether an area was 240 affected by gradual (e.g. tidal dynamics) or sudden (e.g. storm surges) change

245
Further sedimentological analyses utilize the full cumulative GSD function from the base products. Based on grids with the same extent and resolution as the base products, calculation of the median grain size d50 in mm, the sorting coefficient σ after Folk (1980), the skewness coefficient after Folk (1980)

Polygonal analytical products
For display purposes, sedimentological maps can be produced by a classification of grid points to their linguistic description 255 of their respective cumulative GSD function. The concept of generation of a (quasi-)bijective linguistic description from a function (Sievers et al., in prep.) is based on a reversal of initial concepts from Voss (1982) and Naumann et al. (2014). It relies on the definition of grain size classes and respective percentages within the cumulative function relative to each other to determine a description that would adequately regenerate the original GSD function. With respect to mostly German While Voss (1982) and Naumann et al. (2014) solely focus on SEP3, the developed reversal of their process could be transferred to multiple other description formats as well. In a similar concept to the Figge grain size classification maps (Figge, 1981), the description is then split into main and sub components to reduce the number of possible combinations to 265 be displayed. The grid points of these data sets are then summarized into polygonal maps of main and sub component A major advantage of these so called petrographical maps is that a cumulative GSD function can be reconstructed by recreating a full description from main and sub component descriptions following Naumann et al. (2014) at every single point within the map, without the need to store excessive spatial position data. For each base model, two sets of petrographical maps are created, one in the long, fully bijective form (i.e. a full description is stored) and a short form (i.e. 270 only the first main and first sub component is stored), which is displayed in Figure 10.

Plausibility Evaluation Products for Gridded Bathymetric Base Products
Commonly, it is not published with a DTM what data sets where used in its creation. A notable exception in this case is EMODnet Bathymetry (www.emodnet-bathymetry.eu), where the possibility to view certain meta data to data sets used for generating the DTM is provided. We aim to make the process of DTM generation fully transparent. As a consequence, each DTM itself generated by the FSM has specific meta data attached to it, which may give information about possibilities and 280 limitations for certain users and their desired applications. They include common information such as: 1. Dataset title 2. Spatial extent and coordinate reference system, commonly in EPSG notation 3. Elevation range and height reference system, commonly in linguistic notation 285 4. Short description for potential additional information 5. Interpolation or approximation methods optimal for this data set and possible parameters 6. Data provider and their contact information 7. Date or time span of validity 290 Additionally, the FSM has the ability to further provide metadata that allow for an evaluation of validity and reliability of the DTM. We supply data density maps and a set of data source maps, which are provided with each single DTM as separate datasets.

Data Density Maps
Data density maps are gridded datasets in identical extent and resolution to its DTM and hold information about the spatial 295 resolution of the underlying spatio-temporally interpolated bathymetric surveys (Figure 11a). The definition of resolution of a field survey is based on their structure, refer to Figure 3. For bi-linear grids it is determined to be the grid cell length, for unstructured datasets with elements it is the mean edge length or elements connected to the point and for unstructured datasets without elements it is the search radius as defined in the interpolation or approximation method (e.g. Shepardinterpolation). A low value is thus equivalent to a high resolution and is usually correlated with a higher quality, as they tend 300 to be from newer datasets close to or on land and are generally measured with more precise technology with relatively low uncertainty or error, such as airborne laser scanning or multi-beam hydrographic surveys.

Data Source Maps
Data source maps are polygonal datasets which hold the metadata of the survey datasets used for spatio-temporal interpolation of each individual point inside a DTM. As previously explained, spatio-temporal interpolation in the FSM 305 utilizes two datasets for each point before and after the interpolation date, thus two data source maps for datasets used before and after the interpolation date, respectively, are provided. To reliably trace provenance of a single elevation value within the DTM, both maps are to be considered. In this context, the provided metadata for each dataset utilized in interpolation are unique identifying IDs within the FSM, the name of the dataset, type and subtype of datasets (implicitly defining interpolation or approximation methods), data source organizations and date or time span of validity, compare Figure 11b. 310 Especially the date or time span of validity is of high interest, as higher temporal distances to older and younger datasets employed in spatio-temporal interpolation may imply lower reliability for a specific area, compared to other areas in the DTM with lower distances. Data source maps are an especially valuable tool for general quality assurance of a DTM, as potential implausible elevation values in the DTM can be traced back to the original survey datasets, in which a correction of potential erroneous information may be performed. 315 The generation of the data source map as such takes place during the spatio-temporal interpolation of the DTM, when each point of the model is individually handled. Attached to the calculation of the elevation itself, the meta data of the older and younger datasets used are stored in a grid of equal size and resolution as the DTM. After the DTM is successfully generated, an algorithm creates boundary polygons for each meta data group, which adhere to common multipolygon logic, in that a polygon shell can have holes that contain another meta data polygon. The traceability of density, interpolation method and 320 source data set for each individual elevation value of each DTM provides in our belief full transparency regarding potential user applications.

Conclusions and Outlook 340
The produced data are at writing of this publication temporally and spatially the most consistent and continuous available with the highest temporal and spatial resolution and versatility for the German Bight, as no fixed interpretation especially in surface sedimentological data is applied. Numerous new methodologies and validation approaches were developed and implemented, such as the morphological drive and petrographical surface sediment maps. The described Functional Seabed Model (FSM), as a data-based hindcast simulation model for the bathymetric development of the subaquatic surface and its 345 sedimentological composition, was formed and expanded over a time span of over a decade. As it was designed to be highly modular, it can therefore be expanded with new components very easily. Currently, the integration of the subsurface sediment composition of the morphologically active or activatable space is under development (see methodology excerpt in conference presentation Sievers et al., 2020) in the publicly funded KFKI project "Stratigraphic Model Components for the Improvement of High-Resolution and Regionalized Morphodynamic Simulation Models" (SMMS) in cooperation with the 350 German Federal Waterways Engineering and Research Institute and the Federal Maritime and Hydrographic Agency of Germany. SMMS aims to provide consistent and continuous stratigraphical data of the sub-seafloor and adjacent estuaries to further improve the ability of hydrodynamic numerical model systems to assess erosional processes. Additionally, ecological model components (see methodology excerpt in conference presentation Rubel et al., 2020) are developed in connection to the publicly funded KFKI project "Roughness effects of oyster reefs and blue mussel beds in the German Wadden Sea" 355 (BIVA-WATT). With this, we aim to be able to provide the most comprehensive synoptic base and validation data for numerous morphodynamic model systems and other general applications.