The Soil and Water Assessment Tool (SWAT) model has been commonly used in
Canada for hydrological and water quality simulations. However,
preprocessing of critical data such as soils information can be laborious
and time-consuming. The objective of this work was to preprocess the Soil
Landscapes of Canada (SLC) database to offer a country-level soils dataset in
a format ready to be used in SWAT simulations. A two-level screening process
was used to identify critical information required by SWAT and to remove
records with information that could not be calculated or estimated. Out of
the 14 063 unique soil records in the SLC, 11 838 records with complete
information were included in the dataset presented here. Important variables
for SWAT simulations that are not reported in the SLC database (e.g., hydrologic soils groups (HSGs) and erodibility factor (
Integrated environmental modeling is inspired by modern environmental problems and enabled by transdisciplinary science and computer capabilities that allow the environment to be considered in a holistic way (Laniak et al., 2013). In an agricultural context, synthesis and quantification of multidisciplinary knowledge via process-based modeling are essential to manage systems that can be adapted to continual change (Ahuja et al., 2007). The Soil and Water Assessment Tool (SWAT) (Arnold et al., 1998) is an example of such a process-based model. It has been developed over the past 30 years to evaluate the effects of alternative management decisions on water resources and nonpoint-source pollution in large river basins through the simulation of major processes including hydrology, soil temperature and properties, plant growth, nutrient and pesticides dynamics, bacteria and pathogens transport, and land management (Arnold et al., 2012; Douglas-Mankin et al., 2010). Furthermore, a weather generator is included in the model to fill gaps that may exist in meteorological records.
The SWAT model has been extensively tested around the world for a wide range of hydroclimatic conditions, water and land management practices, and timescales (Douglas-Mankin et al., 2010; Arnold et al., 2012; Gassman et al., 2014). The wide adoption of the SWAT model has been prompted by preprocessing and post-processing software tools such as a GIS interface and extensive user documentation (Arnold et al., 2012), as well as several linked databases for crops, soils, fertilizers, tillage, and pesticides (Santhi et al., 2005). Among these, soil properties are especially important as they are needed for the simulation of influential processes such as evapotranspiration, soil water balance, nutrient dynamics, and sediment transport (Neitsch et al., 2005). However, the existing built-in database is only valid for SWAT applications in the USA. Accordingly, studies outside the USA require the development of a soils dataset by preprocessing available soils data into a format readable by SWAT, a time-consuming process as not all data required by SWAT are readily available for countries outside of the USA.
Worldwide, SWAT has emerged as one of the most widely used water quality watershed- and river-basin-scale models for simulation of a broad range of hydrologic and/or environmental problems (Gassman et al., 2014). These applications in different regions are described in the extensive body of peer-reviewed SWAT literature (Arnold et al., 2012). Specifically in Canada, the SWAT model has been used for hydrological simulations in most provinces, including Prince Edward Island (Edwards et al., 2000), New Brunswick (Chambers et al., 2011; Yang et al., 2009), Nova Scotia (Ahmad et al., 2011), Ontario (Asadzadeh et al., 2015; Rahman et al., 2012), Quebec (Lévsque et al., 2008), Manitoba (Yang et al., 2014), Saskatchewan (Mekonnen et al., 2016), Alberta (Mapfumo et al., 2004; Watson and Putz, 2014; Faramarzi et al., 2015), and British Columbia (Zhu et al., 2012). However, preparation of Canadian soils information in a consistent and usable format for SWAT is time-consuming (Rahman et al., 2012), as information has to be collected from soil reports and cross-checked against GIS datasets, missing soil variables have to be calculated from other physical and hydraulic properties, and all parameters have to be attributed to specific soil grids or polygons.
Spatial extent of the Soil Landscapes of Canada (SLC) database showing coverage in the provinces of Newfoundland and Labrador (NL), Prince Edward Island (PE), Nova Scotia (NS), New Brunswick (NB), Quebec (QC), Ontario (ON), Manitoba (MB), Saskatchewan (SK), Alberta (AB), and British Columbia (BC), as well as the Northwest Territories (NT).
Some of this preprocessing work can be alleviated by using publically available databases that contain most of the information required by SWAT. The Soil Landscapes of Canada (SLC) database published by Agriculture and Agri-Food Canada (Soil Landscapes of Canada Working Group, 2010) is an example, and has been used in SWAT applications in Ontario (Asadzadeh et al., 2015; Rahman et al., 2012), Saskatchewan (Mekonnen et al., 2016), Alberta (Faramarzi et al., 2015), and British Columbia (Zhu et al., 2012). The SLC contains a GIS dataset series that provides information about the country's agricultural soils at the provincial and national levels. It was compiled at a scale of 1 : 1 million, and the information is organized according to a uniform national set of soil and landscape criteria based on permanent natural attributes (Soil Landscapes of Canada Working Group, 2010). The SLC encompasses the southern portions of the provinces of Ontario and Quebec and a larger portion of the Prairies provinces of Manitoba, Saskatchewan, and Alberta as far north as to the boreal shield. Coverage in the maritime provinces of New Brunswick, Nova Scotia, and Prince Edward Island is nearly complete (Fig. 1).
Although there are more detailed soil datasets available at provincial levels (e.g., AGRASID dataset in Alberta), selection of SLC for integration with SWAT was based on the fact that (i) it covers most soils across the agricultural regions of Canada in a single dataset; (ii) it has been used in regional studies in Canada, as described above; and (iii) it is more easily applicable to large-scale national studies as broad-scale datasets require reduced resources to prepare and process data (Moriasi and Starks, 2010). Modeling studies comparing the performance of a single model (calibrated and uncalibrated) but using soil datasets with varying spatial resolution in the USA (i.e., the State Soil Geographic database (STATSGO) compiled at 1 : 250 000 scale, and the Soil Survey Geographic database (SSURGO) with scales ranging from 1 : 12 000 to 1 : 63 360) also revealed that using either dataset produced comparable results (Mednick, 2008).
Besides the American databases (i.e., STASTSGO and SSURGO), the authors are not aware of any other effort to produce a similar dataset from a national soils database for specific use with SWAT, such as the one presented here for Canada. Past efforts in preparing a large-scale soils dataset for modeling include the standardization of the FAO–UNESCO, but this dataset was not optimized for SWAT and is presented at a much coarser spatial resolution (i.e., 1 : 5 000 000; Batjes, 1997). The SOTER (Soil Terrain) database is another initiative to provide a global soils dataset, which was intended to have a global coverage at 1 : 1 million scale but was later degraded to 1 : 5 million scale due to the lack of means (Dobos et al., 2005). However, SOTER is not optimized for SWAT use and requires some variables to be calculated or estimated to this end (Bossa et al., 2012). Other databases at continental scale, such as the HYPRES in Europe, only cover soil hydrologic properties (Wösten et al., 1999). At national level, only a few countries besides the USA and Canada have a soil electronic database (e.g., Australia, Brazil, and China; Shi et al., 2004; Cooper et al., 2005), while these data are not available in most countries (Cooper et al., 2005). The reduced number of available datasets, coupled with the technicalities involved in translating these datasets into SWAT format and the required variables not reported in them, contribute to the lack of large-scale soil databases fully compatible with SWAT. These limitations emphasize the novelty and importance of the dataset presented in this paper. Besides presenting a soils database ready to use in SWAT simulations in Canada, the present work provides a framework to support similar initiatives in other regions using data from global soil databases.
Description of variables in SWAT's “usersoil” table.
Due to the importance of the SWAT model for integrated environmental modeling in Canada, and the prominence of the SLC database as a potential input dataset for this model at a national level, the objective of this work was to offer a country-level soils dataset in a format ready to be used in SWAT simulations. The dataset was derived to provide over 20 parameter values for different soil types that are varied for each soil layer. It was prepared in a format suitable for use in the ArcSWAT version of the model, which is attributed to a grid or polygon-based soil map. Such a laborious preprocessing exercise is widely, but inconsistently, adopted in SWAT simulations reported in the literature. Finally, deficiencies in the dataset are also presented and discussed.
The SLC database ( The Polygon Attribute Table (PAT) provides the linkage between geographic
locations (polygons in the SLC GIS coverage) and soil landscape attributes in
the associated database tables (e.g., unique soil ID in the Soil Name Table (SNT) and respective
number of layers in the Soil Layer Table (SLT)). The Component Table (CMP) describes each of the individual soil and
landscape features comprising the polygons. That is, it describes which soil
records are present in each spatial unit (i.e., polygon) in the GIS layer. The Soil Name Table (SNT) describes the general physical and chemical
characteristics for all of the soils identified in a geographic region. The Soil Layer Table (SLT) contains soil information that varies in the
vertical direction (i.e., layered attributes).
The CMP table describes the proportion of each nonspatially defined soil
component in a polygon if more than a soil component exists (the soil
component(s) refers to the soil(s) element(s) that comprise each polygon). The
component numbering follows a sequence of decreasing proportion in a polygon
(i.e., first component has the highest proportion; last component has the
smallest proportion). This component-based structure of the SLC database does
not affect the analysis since all the soil records listed in the SNT table
were processed to generate the present dataset. However, it has implications
for the SWAT model user, who has to make a decision on how to handle the
relationship between the polygon (spatially defined) and each nonspatially
defined soil component in multicomponent polygons (e.g., selecting the larger
component in a polygon or generating a hybrid soil incorporating properties
of each soil component).
The SWAT soils information is stored in the “usersoil” table, located within the SWAT 2012 database in Microsoft Access format (i.e., SWAT2012.mdb). Each soil record is stored as a new record (i.e., row) in the table. Specific soil variables (Table 1) comprise the 152 columns of the usersoil table. The first column is an OBJECTID field assigning a unique identifier for each record. Columns two through six pertain to soil classification. The second column is the map unit identifier (MUID), which is used for mapping a collection of areas grouped by the same soil characteristics. A single MUID may describe different soil types, which are stored with a record counter in the third column (SEQN), while a soil identifying name (SNAM), a soil interpretation record (S5ID), and the percent of each soil component (CMPPCT) are recorded in the fourth, fifth, and sixth columns, respectively (Sheshukov et al., 2009). Columns 7 through 12 describe major soil properties pertaining to the soil record, namely, the number of layers (NLAYERS), the hydrological soil group to which that soil belongs (HYDGRP), the maximum rooting depth of the soil profile (SOL_ZMX), the fraction of soil porosity from which anions are excluded (ANION_EXCL), the potential of maximum crack volume of the soil profile expressed as a fraction of the total soil volume (SOL_CRK), and the texture of the soil layer (TEXTURE).
The next 120 columns starting from column 13 (i.e., columns 13 to 132)
describe the information for each layer of the soil record. These columns are
arranged in sets of 12 variables each for 10 possible soil layers. The
variable NLAYERS indicates how many of these sets should be populated.
Variables for any sets beyond NLAYERS should be assigned a value of zero. The
variables included in each set of soil layers are the depth from the soil surface
to the bottom of the layer (SOL_Z), moist bulk density (SOL_BD), available
water capacity of the soil layer (SOL_AWC), saturated hydraulic
conductivity (SOL_K), organic carbon (SOL_CBN), clay (CLAY), silt
(SILT), sand (SAND), and rock fragment (ROCK) contents, moist soil albedo
(SOL_ALB), erodibility factor (USLE_K), and electrical conductivity
(SOL_EC). Beyond the columns describing layered soil information, there
are 20 columns (i.e., columns 133 to 152) describing two variables (i.e.,
soil
Despite its usefulness as a source of soil information for hydrological simulations, the SLC dataset is not assembled in a format readable by SWAT or other similar models. For example, SWAT stores all the properties for a specific soil record in a single row in the usersoil table, while this information is stored in the SLC as multiple rows in two different tables (i.e., SNT and SLT). Thus, the information contained in the SLT database has to be processed to satisfy SWAT's format requirements. In addition, all properties in the usersoil table are spatially defined, while those of SLC are often stored in a multi-polygon structure with no unique spatial identification. Variables required by SWAT and contained in the dataset presented here were either extracted from SNT and SLT, or calculated from the information therein. Some other variables were estimated from published values. Extraction or calculation of variables was done through an R code that imported both SNT and SLT, screened the data for missing records and missing SWAT-required information (data screening is described in Sect. 5), and sequentially populated unique soil records in the database. The next section describes how these variables were defined.
Variables included in the SWAT usersoil table.
Adapted from Arnold et al. (2013) and Sheshukov et al. (2009).
The use of the SNT is necessary as it links the soils information to the GIS
coverage containing the PAT. However, a first screening was required to
remove soil records from the SNT that are not present in the SLT, as soil
layer information is required by SWAT. The mismatch among soil records in
both tables can occur for a number of reasons. For example, there are records
in both tables that pedologists have identified but whose properties have not
yet been characterized. Also, soil records listed in one table may be absent
from another table due to changes in soil classification. Finally, soil
records listed as unclassified in the SNT (i.e., variable KIND
Out of the 14 063 unique soil records in the SNT, 489 were missing in the SLT and, therefore, removed from the analysis. These 489 soil records correspond to around 3.5 % of the soils listed in the SNT. Most of the missing records were reported as unclassified (305 soils; 62.2 %), suggesting that these soils have been identified, but their properties have not yet been characterized. Mineral soil records corresponded to 29.4 % (144 soils) of the total, likely a reflection of changes in classification. The other two classes comprised non-true soils (e.g., mine tailings, urban land; 33 soils; 6.7 %) and organic soils (8 soils; 1.6 %). Also, only 58 of the 489 missing soil records (11.0 %) could be mapped through linking with the CMP table, making it impossible to do any spatial analysis on the distribution of these soils across the country. However, since the SNT assigns a province for each soil record, it is possible to identify where these missing records occur. Most of the missing soil records were in British Columbia (167 soils; 34.2 %), Manitoba (151 soils; 30.9 %), and Saskatchewan (133 soils; 27.2 %), with smaller proportions in Yukon (13 soils; 2.7 %), Ontario (11 soils; 2.3 %), Nova Scotia (9 soils; 1.8 %), and Newfoundland (5 soils; 1.0 %).
The SWAT data requirements were used as a second level of screening to build
the present dataset. The soil input variables in SWAT can be either required
or optional (Table 2; Arnold et al., 2013). Required variables that could not
be calculated or estimated (e.g., SOL_BD, SOL_K, SOL_CBN, CLAY,
SILT, and SAND) were used to separate complete from incomplete records. Soil
records in the SLT containing or allowing derivation of all the variables
required by SWAT were compiled in a dataset comprising 11 838 unique records
that were importable into the model. Soils in the SLT with missing records
(i.e., variables entered as
As for the nonmatching soil records in the SNT and SLT, only 547 out of 1736 (i.e., 31.5 %) records with missing information could be mapped through linking with the CMP table, which renders any spatial representation of these records nonmeaningful. However, the provinces where these records occur could also be identified. The highest proportions of soil records with incomplete information were in British Columbia (490 records; 28.2 %) and Manitoba (391 records; 22.54 %). Ontario (182 records; 10.5 %) and Alberta (180 records; 10.4 %) had intermediate values, while Newfoundland (123 records; 7.1 %), Saskatchewan (102 records; 5.9 %), New Brunswick (93 records; 5.4 %), the Northwest Territories (80 records; 4.6 %), Nova Scotia (47 records; 2.7 %), Quebec (30 records; 1.7 %), and Yukon (17 records; 1.0 %) had less than 10 % of the soil records missing information.
The variables in SWAT's usersoil table refer to record indexing and soil classification, as well as soil properties pertaining to the entire profile or specific layers. The variables in each of these groups are described in the following subsections. The usersoil table starts with a number of columns that define the database and soil classification variables, followed by soil profile and layer information, and inactive soil properties (Table 2).
The SWAT soil classification variables include the OBJECTID (general listing number), MUID (map unit identifier), SEQN (sequence number), SNAM (soil name), S5ID (Soils5 ID number for USDA soil series data), and CMPPCT (percentage of the soil component in the MUID). A numbering system used for the OBJECTID variable was chosen to avoid conflicts with existing soil records in the usersoil table. The SWAT model comes with more than 200 soil records in a built-in database that cannot be easily overwritten, and any soil record imported into the database with the same OBJECTID as the existing record will not be imported. Thus, the OBJECTID field was populated sequentially from 1001 to the number of unique soil records in the SLC database plus 1000 (i.e., OBJECTID ends in 12 838 in the case of the COMPLETE dataset, which has 11 838 unique soil records). The map unit ID (MUID) was assigned the SOIL_ID code in the SLC dataset, which is a concatenation of the province code (two digits), a soil code (three digits), a modifier code (five digits), and a profile code (one digit). The sequence number (SEQN) variable was assigned the same value as the OBJECTID variable. This process created a unique SEQN for each recurrence in the SLC dataset.
Similar to the MUID variable, the soil name variable (SNAM) was also assigned the SOIL_ID code in the SLC, despite the soil name being in the database, so as to link the soil information to the GIS layer. The S5ID variable was created as a concatenation between the acronym “SLC” and the province two-digit abbreviation code. For example, all the soil records in the province of Alberta have an S5ID equal to “SLCAB”. The CMPPCT variable was assigned a value of 100, meaning that the soil comprises 100 % of this component. As stated in Sect. 2, the user has to make a decision on how to handle multipart polygons in the preprocessing of the SLC GIS dataset since the soil records in multicomponent polygons are not spatially defined.
The following six variables in the dataset (i.e., columns 7 to 12) pertain to soil profile information. The number of layer variables (NLAYERS) was defined according to the soil layers in the SLT below the soil surface. The SLT table also contains information for layers above the soil surface, as is the case for litter, which have negative values for upper and lower depths (i.e., the ground surface corresponded to the zero depth, while above-surface and below-surface layers have negative and positive values, respectively). Above-surface layers were removed from the dataset prior to analysis through filtering layers with lower depth above the soil surface (i.e., lower depth less than or equal to zero).
The hydrologic soil group (HSG) variable (HYDGRP) is an influential parameter for estimation of runoff using the SCS curve number method and, consequently, for hydrological simulations in SWAT (Gao et al., 2012; Neitsch et al., 2005). The HSGs were calculated according to the method outlined by USDA-NRCS (1993), which is based on depth to the impermeable layer (e.g., bedrock), depth from soil surface to shallowest water table during the year, hydraulic conductivity of the least conductive layer of the soil profile, and depth range of the hydraulic conductivity. The specific criteria used are provided in tabular form in the Supplement. Soils in the dual HSG classes were assigned to the less restrictive class since most agricultural areas in Canada exhibit some degree of drainage (e.g., municipal drainage network, surface drains, or tile drainage). SWAT translates HSG alphabetical classification into a numeric system, where HSGs A, B, C, and D, are interpreted as 1, 2, 3, and 4, respectively. The runoff potential increases with increasing numeric designations.
Spatial distribution of the hydrologic soil groups (HYDGRP)
variable calculated for the Soil Landscapes of Canada (SLC) database. HSG
A
The depth to the impermeable layer is not reported in the SLC database and was estimated based on the soil layers available in the SLT. When a bedrock layer or specific soil horizons were present (i.e., fragipan; duripan; petrocalcic; ortstein; petrogypsic; cemented horizon; densic material; placic; bedrock, paralithic; bedrock, lithic; bedrock, densic; or permafrost; USDA-NRCS, 1993), its upper depth was used as the depth to impermeable layer. When a bedrock layer was absent, the lower depth of the deepest mineral soil layer was used as an alternative. The shallowest annual depth to water table is also not reported and was estimated based on drainage class reported in the SNT. Very poorly drained, poorly drained, imperfectly drained, moderately well drained, and well drained (or better) soils were assigned water table depths of 0, 25, 75, 100, and 125 cm, respectively. The variables pertaining to hydraulic conductivity of the least conductive layer of the soil profile and depth range of the hydraulic conductivity were both calculated using information from the SLT.
Out of the 11 838 soil records in the generated dataset, 21.3 %, 24.6 %, 39.0 %, and 15.1 % belonged to HSGs 1, 2, 3, and 4, respectively. These results suggest that more than half of the agricultural soil records in Canada have a relatively high or high runoff generation potential (i.e., HSGs 3 and 4, respectively). A spatial analysis indicated that 20.0 %, 26.8 %, 36.7 %, and 16.5 % of the areal extent of the soil records belonged to HSGs 1, 2, 3, and 4, respectively. Many of the soil records with higher potential for runoff generation are in the humid regions of Ontario, Quebec, and the Maritimes (Fig. 2). Not surprisingly, this region has extensively adopted measures to address excess moisture in agricultural soils, such as tile drainage (Stonehouse, 1995; Rasouli et al., 2014). Excess moisture is also a problem in areas of Canadian Prairies, such as the Red River Valley in Manitoba, where surface drainage (Bower, 2007) and a growing use of tile drainage (Cordeiro and Sri Ranjan, 2012, 2015) have been used to address this problem. Conversely, soil records with low potential for runoff generation are located in Saskatchewan and southeastern Alberta (along the Saskatchewan border), which are among the most arid regions in Canada (Wolfe, 1997).
The maximum rooting depth of the soil profile (SOL_ZMX) was assumed to be the lower depth of the deepest layer in the SLC soil profile. The fraction of soil porosity from which anions are excluded (ANION_EXCL) was not available in the SLC database and was set to the default value of 0.5 in SWAT (Arnold et al., 2013). This variable affects the concentration of nitrate in the mobile water fraction, which is directly related to nitrate leaching. The potential of maximum crack volume of the soil profile expressed as a fraction of the total soil volume (SOL_CRK) can be calculated with the FLOCR model using 30-year weather data (Bronswijk, 1989). However, due to the fact that the model is not readily available for download and the unreasonable time required to run the model for such a large number of soil records, as well as the fact that SOL_CRK is optional in SWAT, its value was set to 0.5. In large-scale studies this value is further adjusted through a spatially explicit calibration scheme (Whittaker et al., 2010). The SOL_CRK variable controls the potential crack volume for the soil profile. This value was selected based on the fact that all of the built-in soil records in the SWAT soils database have the SOL_CRK variable set to 0.5. The TEXTURE variable, although not required for simulations with the SWAT model, was estimated for reference using the “TT.points.in.classes” function from the “soiltexture” R package (Moeys, 2016). The Canadian soil texture classification system was used as a reference.
The soil profile variables are followed by 10 sets of 12 variables (i.e.,
columns 13 to 132) pertaining to layered soil information. The lower depth of
each soil layer in the SLT was used as the depth from soil surface to the
bottom layer (SOL_Z). The soil bulk density (SOL_BD) was extracted
directly from the SLT. The available water capacity of the soil layer
(SOL_AWC) was calculated from the water retention of the soil reported in
the SLT at different matric potentials. The water moisture content at
Average soil moisture content at matric potentials
The variables saturated hydraulic conductivity (SOL_K) and soil organic carbon content (SOL_CBN), as well as the clay (CLAY), silt (SILT), sand (SAND), and rock fragment (ROCK) contents, were extracted directly from the SLT. The moist soil albedo (SOL_ALB) variable was only required for the top layer as subsequent layers were assigned a value of zero. Since this variable is not reported in the SLC database, it was estimated as the average (i.e., 0.10) of the range reported by Maidment (1993) for moist, dark, plowed fields (i.e., 0.05–0.15). Again, this value was selected since the SLC version 3.2 focuses on agricultural areas, which is also the major domain simulated by SWAT.
Another important variable for SWAT is the erodibility factor (USLE_K),
used as an input to the Universal Soil Loss Equation (USLE). This equation is
used to calculate soil erosion, which is inherently linked to sediment and
nutrient transport (Sharpley et al., 1992, 2002; He et al., 1995; Aksoy and
Kavvas, 2005; Koiter et al., 2013) and therefore, critical for simulations of
non-point sources of pollution. The erodibility factor was calculated using
the method presented by Sharpley and Williams (1990), which is based on the
sand, silt, clay, and organic carbon content of the soil (Eq. 1):
As for SOL_ALB, USLE_K is only required for the top layer and
subsequent layers were also assigned a value of zero. When converted from
imperial to SI units (Foster et al., 1981), the range of calculated values
(Table 4) generally agrees with the ranges reported for Canada (Wall et al.,
2002), taking into consideration that
Spatial distribution of the erodibility factor (
Comparison between the average erodibility factor (
The soil electrical conductivity (SOL_EC) information was extracted directly from the SLT. The last 20 columns of the dataset (i.e., columns 133 to 152), which correspond to SOL_CAL for the 10 soil layers followed by SOL_PH for the same layers, were all populated with zeros since these variables are not currently active in SWAT. These variables also had values of zero for all the preexisting soil records in the built-in database in the model.
Soil properties are inherently uncertain due to spatial variability and precision of measurement methods (Lacasse and Nadim, 1996). This uncertainty has direct implications for hydrologic simulations and their interpretation (Beven, 2011). The SWAT model simulations, therefore, are subject to the uncertainty of the soil properties used as input. For example, hydraulic conductivity is highly spatially and temporally variable (Hillel, 1998), and these uncertainties are very difficult to be avoided. The processing applied to the original SLC database in the present analysis did not introduce any further uncertainty to the variables reported by SLC (e.g., saturated hydraulic conductivity). There is, however, some uncertainty relating to estimated and calculated parameters. These uncertainties are discussed in this section, although their quantification is beyond the scope of the present work.
An example of introduced uncertainty is the moist soil albedo in the present dataset (0.10), which is the average of a range reported in the literature (Sect. 6.3). However, any value selected would have some uncertainty associated with it from a modeling standpoint because a single value cannot represent the variability in moist soil albedo as the soil dries up. This is a recognized problem when trying to represent spatially or temporally variable parameters (e.g., soil moisture) using a point measurement or single value in hydrological models (Beven, 2011).
Another example of uncertainty is the HSG calculations, which required a number of assumptions. For example, the shallowest annual depth to water table was unavailable in the SLC and therefore estimated based on the drainage class of each soil record. Also, the assumption of artificial drainage resulted in assignment of dual-class HSGs to the less restrictive one. An assessment of HSG in the USA indicated a standard error of about one HSG (Stewart et al., 2012), suggesting that classifying soils in the neighboring groups is not uncommon and that there is some uncertainty associated with those estimates.
The estimation of erodibility factor (
Finally, one last variable worthwhile discussing in term of uncertainty is
the available water capacity of the soil layers. This variable was estimated
as the difference between field capacity and permanent wilting point. The
procedure used here to estimate available water content (i.e., the difference
between field capacity and permanent wilting point) follows the same
procedure used by SWAT (Neitsch et al., 2005) and is described elsewhere in the
soil physics literature (Hillel, 1998). Therefore, it would not introduce any
further uncertainty. However, using the soil moisture content at
Overall, prediction of uncertainty in regional hydrologic modeling and a careful input data discrimination analysis prior to calibration is unavoidable (Faramarzi et al., 2015). Especially in large transboundary river basins where a consistent soil dataset is not available from a single source, a careful uncertainty assessment provides information on data and model quality. Although the authors are unaware of SWAT hydrologic simulations in binational watersheds that use soil datasets from both the USA and Canada, maybe due to lack of large-scale datasets for Canada, it is expected that the model output is subject to the quality and quantity of both datasets. Some aspects contributing to this uncertainty are (i) possible discontinuity in the mapping units (i.e., polygons) between the GIS layers of both datasets, (ii) the soil record being mapped in multicomponent polygons in the GIS coverage (Soil Survey Staff, 1999; Agriculture and Agri-Food Canada, 1998), (iii) differences in soil taxonomy between the USA system (Soil Survey Staff, 1999) and the Canadian system (Agriculture and Agri-Food Canada, 1998) of soil classification, (iv) the methods used to measure/estimate the physicochemical variables, which may differ in accuracy and precision, and (v) the natural variability in the calculation of some variables that cannot be measured (e.g., HSG; Stewart et al., 2012). Given the number of aspects influencing trans-boundary uncertainty and the large spatial scale of both the USA and the dataset discussed here, an assessment of such uncertainty is beyond the scope of the present study. However, this assessment is suggested to quantify the share of errors from these sources in hydrologic model projection in both upstream and downstream tributaries. These are the subjects of our continuing studies.
Although the SWAT database is in a proprietary format (i.e., Microsoft Access), the present soils dataset has been published in a nonproprietary format (i.e., comma-separated values (CSV) file) that can be opened in a variety of software packages. However, the dataset can be easily imported into the SWAT soils database using an automated import routine in Microsoft Access (Fig. 4). This import process consists of opening the SWAT2012 database and using the “Import Text File” tool under the “Import & Link” section of the “External Data” tab to read the CSV file. This action will prompt a window where the user can select the path to where the present dataset is stored and specify how and where the data are stored in the database. The option “Append a copy of the record to the table” should be selected, which activates a drop-down menu from which the usersoil table should be highlighted. Once these options have been processed, an “Import Text Wizard” window will be prompted, where the option “Delimited – Characters such as comma or tab separate each field” should be selected. Processing of this selection will prompt another window where the option “comma” should be automatically selected by the wizard. However, the user should activate the box “First Row Contains Field Names” since the first row of the present dataset contains the variable labels. Confirming the processing of the next windows should finalize the import process, and the data should be ready to be used in SWAT predictions.
Flowchart showing the steps for importing the present soils dataset into SWAT's database.
PANGAEA, an open access library to archive, publish, and
distribute georeferenced data, supports database-dependent research.
Therefore, the entire dataset (Cordeiro et al., 2017) is published and
archived in the PANGAEA database (
The soils dataset presented and discussed in this work represents an effort to facilitate hydrological simulations using the SWAT model in Canada. The dataset consists of a compilation of 11 838 different soil records from the SLC database with all the information required by SWAT and is ready to be imported into the model's soils database. A two-level data screening procedure removed 489 soil records with missing layered information (i.e., not present in the SLT), while 1736 records were removed due to the lack of critical information required by SWAT, such as soil bulk density or saturated hydraulic conductivity. Among the major contributions of this dataset, the calculation and/or estimation of variables not reported in the SLC database are of special importance. The hydrologic soil groups (HSGs) calculated from the SLC database suggest that about half of the soil records in Canada belong to classes with higher potential to generate runoff (i.e., HSG classes 3 and 4). Occurrence of soils in HSG 3 and 4 agree with management practices aimed at addressing excess moisture conditions in agricultural fields, such as subsurface drainage in southern Ontario and Manitoba. The erodibility factor, which is another important variable for SWAT simulations of non-point source pollution, suggests a relationship with runoff potential in portions of southern Ontario and Nova Scotia. However, low erodibility potential, likely driven by high clay to silt ratios or high organic carbon content, was found in areas with higher runoff potential in Manitoba and British Columbia.
The supplement related to this article is available online at:
MRCC and RK developed the concept for development of the dataset. GL interpreted the soil information contained in the SLC database. MRCC and GL developed the methodology for deriving the soil variables. MRCC developed the code using R programming language to process the SLC dataset and performed data analysis. All the authors revised the dataset and participated in manuscript preparation.
The authors declare that they have no conflict of interest.
This research was supported by the Beef Cattle Research Council and Agriculture and Agri-Food Canada through the Beef Cluster, Environmental Footprint of Beef Project, and the Alberta Livestock and Meat Agency (ALMA) of the Alberta Agriculture and Forestry (grant no. 2016E017R). Edited by: David Carlson Reviewed by: two anonymous referees