Development of soil and land cover databases for use in the Soil Water Assessment Tool from Irish National Soil Maps and CORINE Land Cover Maps for Ireland

Soil Water Assessment Tool (SWAT) is being extensively used by hydrologists and environmentalists to simulate river discharge and water quality at watershed/basin scale across the world. The SWAT is a physically-based semidistributed rainfall-runoff model and requires watershed related characteristics (elevation, land cover, and soil information 10 for the entire river basin) and meteorological variables (rainfall, temperature, relative humidity, solar radiation, and wind speed) information to simulate runoff and water quality data at the basin outlet. One drawback of SWAT is that the default database for the model is available for the United States and the modeller needs to develop a separate database to implement the model at river basins located outside the USA. This study generates a soil and landcover database that can be used for the SWAT modelling for river basins located in Ireland. The soil database has been created based on soil testing experiments 15 conducted during the STRIVE programme by Teagasc and Environmental Protection Agency Ireland. The landcover database has been created by relating the landcover data obtained from the CORINE database with the default SWAT landcover database. Furthermore, detailed information on the five meteorological data covering Ireland has been provided. A newly created SWAT geodatabase has been generated that can be used as a replacement from the default SWAT database for simulating runoff and water quality at river basins in Ireland. The database contains a digital elevation model, soil and 20 landcover maps along with river network and river subbasins for Ireland and is publicly available at: https://doi.org/10.5281/zenodo.4767926 (Basu, 2021).

basin/watershed outlet is the Soil and Water Assessment Tool (SWAT). The SWAT model is a conceptual semi-distributed 30 continuous-time hydrological model developed by the United States Department of Agriculture to simulate the water transport at a watershed scale on a daily/sub-daily time scale (Arnold et al., 1993;Arnold et al., 1998). The model is useful in the assessment of several environmental phenomena such as land management, hydrological processes and water use, bacteria and pathogen transport, pollution apportionment in rivers (Santhi et al., 2005;Ghaffari et al., 2010;Baker and Miller, 2013). SWAT is widely reported in databases used for tillage and crop management practices (Liu et al., 2013;Ulrich 35 and Volk, 2009). The SWAT model has been extensively used globally for river basin scale modelling as well as for water quality analysis (Tuppad et al., 2011). The SWAT model requires information on the elevation, land cover, and soil attributes from the watershed along with the meteorological variables in the region (Arnold et al., 1993;Arnold et al., 1998).
The preparation of soil and land-use database for running the SWAT model requires the soil and land classes information to be segregated and accumulated from various reports (Cordeiro et al., 2018). As part of the pre-processing information, 40 publicly available information can be useful for the preparation of soil and land cover databases for the SWAT model (Pflugmacher et al., 2019). The State Soil and Geographic database (STATSGO) and Soil Survey Geographic database (SSURGO) developed by the US Department of Agriculture for the United States is the publicly available national soil databases that can readily be used with the SWAT model (Geza and Mccray, 2008). The Soil Landscapes of Canada (SLC) database published by Agriculture and AgriFood Canada is another publicly used database used for SWAT applications for 45 Canadian watersheds (Cordeiro et al., 2018). This study develops a soil database that is compatible with the SWAT model for Ireland. The database was prepared based on the publicly available soil information created via the Teagasc-EPA Soils and Subsoils Mapping Project (Fealy, 2009) for Ireland. This soil database has various soil associations which were categorized and have relative ranking along with the extent of the soil (Fealy, 2009).
The CORINE landcover database classifies various land classes for Pan-Europe into areas such as urban land, forestry, 50 vegetation, water bodies, etc. (Feranec et al., 2016) and is being used for Ireland. Since the available soil and landcover data for Ireland cannot readily be used with the SWAT model, this paper aims to develop a framework to create a new soil database for Ireland that can be used as a substitute to the SWAT model's default soil database. The landcover classes available in the CORINE map have been reclassified into landcover classes that are recognisable by the SWAT model. Furthermore, the elevation and meteorological data file needed to simulate the model has been generated for Ireland that can 55 readily be loaded in the model. Location of the sub-basins and the river network for Ireland is shown in Figure 1. Sections 2, 3, 4, and 5 provide details on the soil, land cover, elevation, and meteorological data respectively for Ireland while section 6 provides details on the steps needed to provide the data into the SWAT model and integrate the newly developed soil database into the model. The soil parameter values of the SWAT model for the soil map of Ireland have been estimated and a soil database was developed to be used for SWAT analysis at river basins in Ireland. Details of the Ireland soil map, SWAT soil database, and the steps used to create the SWAT database for Ireland are provided below.

Ireland Soil Map
The Irish National Soil Map was developed as part of the Irish Soil Information System project funded under the Science, 65 Based on the soil data collected during the project, a total of 213 different types of soil data present in the topsoil layer as well as subsoil layer were identified. Based on the types of soils, Ireland's topsoil has been subdivided into 69 classes and is shown in Figure 2. A set of soil characteristics namely soil colour, texture, Munsell colour value, structure, consistency, presence of roots, Hydrochloric acid reaction, pH, percentage of Nitrogen and Carbon, percentage of free Iron, carbon exchange capacity, percentage of sand, silt and clay, unsaturated bulk density were estimated form those soil types using 75 laboratory tests (Simo et al., 2007). Values of those characteristics are provided in http://gis.teagasc.ie/soils/soilguide.php.
Out of those characteristics, soil depth, organic carbon, sand, silt, clay and rock percentage content and Munsell values were used to develop the soil database for SWAT modelling.

SWAT soil database
The SWAT model has a default "usersoil" database developed for United States of America (USA) soil maps. Each row in 80 the database denotes each soil type. The soil database table has a total of 152 columns where the first column is a OBJECTID field that provides a unique identification to a soil type. The second column is called map unit identifier (MUID) which is used to map areas with the same soil characteristics. In situations where more than one soil types are present in the same MUID, a different sequence number (SEQN) was provided in the third column (Cordeiro et al., 2018). The fourth, fifth and sixth column records the soil name (SNAM), soil interpretation record (S5ID) and percent of each soil component 85 (CMPPCT) of the State Soil Geographic (STATSGO) soils data developed for USA (Sheshukov et al., 2009) (ANION_EXCL), column 11 contain potential/maximum crack volume of the soil profile in fraction of total soil volume 90 (SOL_CRK), and column 12 records the soil layer texture (TEXTURE).
The SWAT soil database can account for a total of 10 different soil and subsoil layers. Corresponding to each layer 12 soilrelated parameters needs to be recorded in the usersoil database (Arnold et al., 2013). The next 120 columns starting from column 13 till column 132 has the provision to record the soil characteristics of 10 soil layers. In situations where the number of soil layers is less than 10, the soil parameter values higher than the soil layer numbers NLAYERS are considered 95 to be zero. The 12 soil parameters recorded in the database are i) depth of the soil layer in mm (SOL_Z), ii) moist bulk density in gm/cm3 (SOL_BD), iii) available water capacity of the soil layer in mm H20/mm soil (SOL_AWC), iv) saturated hydraulic conductivity in mm/hr (SOL_K), v) organic carbon content in percentage of soil weight (SOL_CBN), vi) clay content in percentage of soil weight (CLAY), vii) silt content in percentage of soil weight (SILT), viii) sand content in percentage of soil weight (SAND), ix) rock fragment content in percentage of the total weight of that layer (ROCK), x) moist 100 soil albedo (SOL_ALB) only for the topsoil layer, xi) soil erodibility factor K in (metric ton m2 hr)/(m3 metric ton cm) estimated using the Universal Soil Loss Equation (USLE_K), and xii) electrical conductivity in dS/m (SOL_EC). The electrical conductivity SOL_EC is currently inactive in the SWAT model and assigned a value zero for all layers. The SWAT database also considers the CaCO3 and pH values of the 10 soil layers from columns 133 to 152 respectively.
However, at the current state, those two parameters are also inactive in the SWAT model and hence assigned a zero value. 105

Development of soil database in Ireland for Irish National Soil Map
Based on the Irish National Soil Map, Ireland can be subdivided into 69 different soil classes. Each of those soil classes has 1-6 number of layers and covers 213 different soil types. In the new database those soil classes are provided a unique OBJECTID ranging from 1 to 69 and the MUID and SNAM were provided as D1 to D69, respectively. The soil classes were categorized into one of the four hydrological soil groups A, B, C or D based on the infiltration characteristics. 110 Based on the SWAT manual (Arnold et al., 2013) and Gijsman et al. (2007) the hydrological soil groups for the 69 soil classes of Ireland were selected as follows: Hydrological group A: Sandier, deeper soils (% sand > 86, depth >= 1500 mm) Hydrological group B: Fairly sandy, mid-depth soils (% sand >50, % clay < 35, and depth > 500 mm) Hydrological group C: Soils having slightly more clay than sand, fairly shallow depth (% clay >= 28, % sand <= 44, and 115 depth <= 800 mm) Hydrological group D: Very clay soils (% clay >= 50) In situations where a soil class cannot readily be classified into one of the four classes, a default value of B is provided. The infiltration rate is highest for hydrological group A, gradually reduces and takes the least value for group D. Map of Ireland in terms of hydrological group is shown in Figure 3. 120 The number of soil layers and the maximum soil depth for each of the 69 soil classes are obtained based on the Irish Soil Information System Soil profile handbook prepared by Simo et al. (2007). Out of the 11 soil-related parameters (excluding whereas Sentinel-2 and Landsat-8 satellite data collected for 2017-2018 were used to generate 2018 land cover images. The CORINE land cover maps for Europe consist of 44 different land cover classes, shown in Table 1. The default land use and 170 land cover classes available in the SWAT database do not completely match with the land cover classes present in each of the CORINE land cover maps. Hence, a relationship between the CORINE land cover classes with the default SWAT model land use and land cover classes is necessary. In Table 1 each of the CORINE land cover classes is linked with the default land cover classes that are recognizable by the SWAT model. In order to run the SWAT model, a user-defined land cover file needs to be loaded that has been prepared based on Table 1. For brevity, the CORINE land cover map for the year 2018 175 and the reclassified land cover map compatible with SWAT classification for Ireland are shown in Figure 4. Out of these 44 land cover classes, Ireland has 35 land cover classes present from the CORINE database.

Digital Elevation Model data
The SWAT model requires elevation information along with the soil and land cover information for analysis. needed for the SWAT model as it is used to delineate the watershed/basin boundary, develop the stream network in the watershed, and estimate the slope of the watershed. Fine resolution DEM can delineate the watershed and generate the stream network with considerable accuracy, which is important to simulate surface runoff at the basin outlet. The EU-DEM was generated for 39 countries across Europe by DHI GRAS Geocenter Denmark (DHI GRAS, 2014) as a hybrid DEM based on STRM DEM, ASTER DEM, and publicly available topographic maps, and hence was found to have better 190 accuracy than both STRM and ASTER. However, even though the Bluesky DEM has the finest resolution, it is available only for a portion of Ireland and is not open-source data, unlike the other three datasets. For this reason, the EU-DEM has been suggested for analysis with the SWAT model. It needs to be noted that even though for the majority of the watersheds the EU-DEM can be considered to be accurate enough since they are mainly generated based on satellite imagery, the elevation values at certain pixels (where each pixel is of the size of 25m x 25m) can be erroneous due to the presence of 195 sudden changes of elevations al local scale, which can lead to a generation of discontinuous river networks in the river basin using SWAT model. One option is to perform a filling of the EU-DEM before using it for modelling. The filled DEM values for Ireland have been provided in Figure 5.

Meteorological variables
Along with the three watershed-related variables soil, land cover, and DEM, the SWAT model requires time series data of 200 five meteorological variables at daily/sub-daily time scale covering the entire time period for which the model outputs are required. The five meteorological variables are rainfall, daily maximum and minimum temperature, daily average relative humidity (RH), daily average wind speed (WS), and daily average solar radiation (SR). The rainfall data can be at a daily scale or an hourly scale. The SWAT model output will be of the same time scale as that of the rainfall data. The five meteorological variables time series data available at locations within or surrounding the river basin are used by the SWAT 205 model to simulate runoff. The SWAT model output is highly dependent on the accuracy of those meteorological variables' inputs, particularly the rainfall data. For this reason, it is advisable to provide observed meteorological data (Arnold et al., 2013). However, in situations where the watershed is large in area and a relatively small number of rain gauges, temperature gauges, and weather stations measuring RH, WS, and SR are located inside or in the near vicinity of the watershed, alternative databases need to be considered for modelling purposes. Figure 6 provides the locations of the rain gauges, 210 temperature gauges, and weather stations in Ireland. Those gauges/stations are maintained by the Met Eireann office, which is a government organization that provides meteorological services to the Republic of Ireland. It needs to be noted that some of those gauges/stations have considerable missing data and/or not active at the present state. In situations where considerable gauges/stations are located in close proximity to the target watershed selected for analysis, the historical meteorological data can be readily used. However, when the number of gauges/stations is less, or considerable data in those 215 stations are missing, then alternate data sources need to be considered for the development of the SWAT model. It can be noted from Figure 6 that apart from the rain gauges, the number of temperature gauges and the weather stations is considerably low for Ireland.
A readily available data source of the meteorological variables is the COPERNICUS E-OBS dataset. The dataset was generated based on a network of station data observations from European National Meteorological and Hydrological 220 Services and other European database centres. The dataset covers the majority of Europe and provides rainfall, maximum and minimum temperature, relative humidity, and shortwave solar radiation at 0.1° grids spacing at a daily time scale from the year 1950 till the present time. The data can be obtained from the following link: https://cds.climate.copernicus.eu/cdsapp#!/dataset/insitu-gridded-observations-europe?tab=form. It can be noted that wind speed is not available in this dataset. The ERA5-Land data from the COPERNICUS database can be another alternative data 225 source where four out of five meteorological variables (excluding relative humidity) are available from 1981 to the present date at hourly time scale, available from https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form. It can be noted that the dew point temperature is available in the ERA5-Land database, which can be used to estimate the relative humidity. The grid spacing for ERA5-Land data is equal to 0.1° covering the entire globe. Locations of the E-OBS grids and the ERA5 grids for Ireland are shown in Figure 7. The soil, land cover, and filled DEM maps that can readily be used with the SWAT model for Ireland are provided in the following link: https://doi.org/10.5281/zenodo.4767926 with a DOI: 10.5281/zenodo.4767926 (Basu, 2021). In the data folder, the following data and databases are available: i) SWAT2012.mdb, ii) DEM_IRL_ITM.tif, iii) Environmental Protection Agency Ireland, while the landcover database has been created to link the landcover classes in CORINE map with the landcover classes recognisable by SWAT model.

Author contribution
Basu has created the database and prepared the manuscript. 265

Competing Interests
The authors declare that they have no conflict of interest.