A global monthly climatology of oceanic total dissolved inorganic carbon: a neural network approach

. Anthropogenic emissions of CO 2 to the atmosphere have modiﬁed the carbon cycle for more than 2 centuries. As the ocean stores most of the carbon on our planet, there is an important task in unraveling the natural and anthropogenic processes that drive the carbon cycle at different spatial and temporal scales. We


Introduction
The ocean is the major carbon reservoir of the Earth. Most of this carbon occurs as dissolved inorganic carbon (TCO 2 , also known as DIC or C T ; Ciais et al., 2013;Tanhua et al., 2013). Three species make up TCO 2 : dissolved CO 2 (generally considered to be the sum of the dissolved CO 2 itself, CO 2 (aq), and carbonic acid, H 2 CO 3 ), bicarbonate ion (HCO − 3 ), and carbonate ion (CO 2− 3 ). The relative concentrations of these species with respect to each other determine the seawater pH (Zeebe and Wolf-Gladrow, 2001). The seawater CO 2 chemistry system can be represented as a set of chemical equilibria reactions that describe the speciation of the various ions of TCO 2 as follows: CO 2 (g) CO 2 (aq) Since the Industrial Revolution, the concentration of TCO 2 in the global ocean has increased, generally to a certain depth level (depending on the particular processes in each ocean area) due to the entry of CO 2 into the seawater from the atmosphere (Sarmiento and Gruber, 2002;Doney et al., 2009;Vázquez-Rodríguez et al., 2009;Bates et al., 2012;Sallée et al., 2012;Khatiwala et al., 2013). The uptake is driven by the increasing partial pressure of CO 2 (pCO 2 ) in the atmosphere relative to the ocean, generated by the anthropogenic emissions of CO 2 that cause an annual net flux of this gas into the ocean (Le Quéré et al., 2018). Accompanying the change in TCO 2 , the pH and carbonate ion concentration have been declining because of the anthropogenic process previously mentioned, these changes being reflected in the proportions of the chemical species of TCO 2 (Kleypas and Langdon, 2000;Orr et al., 2005). These changes in seawater chemistry framed in the ocean acidification process can negatively influence various processes involving marine organisms such as calcification, growth, and survival (Orr et al., 2005;Fabry et al., 2008;Hendriks et al., 2010;Hoegh-Guldberg and Bruno, 2010;Kroeker et al., 2013). In addition to the secular trends driven by the uptake of anthropogenic CO 2 , ocean TCO 2 varies both temporally and spatially as a consequence of several natural processes. This variability may reach values of 15 % of the mean TCO 2 value in the ocean (Lee et al., 2000). The processes that increase TCO 2 are net flux of CO 2 from the atmosphere to the ocean, organic matter remineralization, and the dissolution of calcium carbonate (CaCO 3 ). The processes that reduce TCO 2 are net flux of CO 2 from the ocean to the atmosphere, primary production, and calcification. Advection and mixing also influence the variability of TCO 2 in these two ways (Sabine et al., 2002). In the surface ocean, the main vari-ables influencing the variability of TCO 2 are temperature and salinity (Weiss et al., 1982;Lee et al., 2000;Wu et al., 2019), through the modification of the solubility of CO 2 , affecting the seawater pCO 2 (which is almost instantaneous) and thus the air-sea CO 2 flux, which eventually drives the change in TCO 2 over time. Nutrients and oxygen can also reflect the processes that modify the concentration of TCO 2 through their consumption and release, like during the cycling of organic matter (Körtzinger et al., 2001;Bauer et al., 2013). From products generated with measured data (Key et al., 2004;Takahashi et al., 2014; and in modeling studies (e.g., Doi et al., 2015), it is known that the global surface distribution of TCO 2 follows a zonal gradient: there is a reduction in its concentration from the poles to the Equator, reflecting the processes that control its variability. Key et al. (2004) emphasize that this distribution is associated with the distribution pattern of nutrients. Recently, Wu et al. (2019) found that the distribution of surface-salinitynormalized TCO 2 (nDIC) has two main drivers: temperature and upwelling. At depth, the variation shown in almost any measured profile of TCO 2 mainly reflects the remineralization of organic matter and, to a lesser extent, the dissolution of CaCO 3 (Millero, 2007), resulting in an increase in TCO 2 from the surface to intermediate depths.
Understanding the distribution and variability of TCO 2 in the ocean and its secular trends driven by anthropogenic carbon uptake is needed to assess the magnitude and possible impacts of ocean acidification. It is also necessary for the evaluation of numerical models that include the carbon cycle and their estimates of past, current, and future ocean carbon cycle behavior (e.g., Yool et al., 2013;Aumont et al., 2015;Butenschön et al., 2016;Le Quéré et al., 2016;) Goris et al., 2018). Seasonality of TCO 2 and the horizontal and vertical variability underscore the necessity to design a climatology with both monthly and spatial resolutions according to the processes that influence this variable on a global scale. The existing climatologies of TCO 2 do not include all these characteristics collected together. Key et al. (2004) and  built an annual climatology at 33 depth levels using interpolation techniques with data from the Global Ocean Data Analysis Project version 1 (GLODAPv1; Key et al., 2004) and version 2 (GLO-DAPv2; Key et al., 2015;Olsen et al., 2016), respectively. Takahashi et al. (2014) published a monthly climatology for the surface ocean computed from climatologies of pCO 2 and total alkalinity (A T ). Other studies used the covariability between TCO 2 and other more commonly measured variables discussed above for mapping and gap-filling via empirical regressions and neural networks. Lee et al. (2000) used temperature and nitrate to compute surface nDIC with an areaweighted error of ±7 µmol kg −1 . Sauzède et al. (2017) and Bittig et al. (2018) trained neural networks with GLODAPv2 data to compute TCO 2 over the depth range 0-8000 m with an accuracy of ±9 and ±7.1 µmol kg −1 , respectively. The input variables used in those studies were location, pressure, temperature, salinity, dissolved oxygen, and time.
In the present study, we introduce the use of neural networks for going one step further in the design of a climatology. We have generated a climatology of TCO 2 with a resolution consistent with that of the climatology of A T of Broullón et al. (2019): horizontal resolution of 1 • × 1 • , 102 depth levels between 0 and 5500 m, and a monthly (0-1500 m) and annual (1550-5500 m) temporal resolution. The availability of global databases containing variables of the seawater CO 2 system with more and more data -e.g., GLODAPv2.2019; the Lamont-Doherty Earth Observatory (LDEO) database (Takahashi et al., 2017); the Surface Ocean CO 2 Atlas (SO-CAT; Bakker et al., 2016) -and the great ability of the neural networks to interpolate as shown in other climatological studies about CO 2 system variables (Landschützer et al., 2014;Broullón et al., 2019) show the appropriateness of this approach for generating a global monthly climatology covering more than the surface ocean.

Neural network design
A feedforward neural network was configured to compute TCO 2 in the global ocean and to create a global climatology based on the good results previously obtained with this method in similar studies (e.g., Broullón et al., 2019). Briefly, a neural network of this type (Fig. S1a in the Supplement) is used to extract relationships between a set of input variables and a target one through a training process. At this stage, the inputs are passed through different parallel layers composed of a tunable number of neurons to reach values as closest as possible to the target ones (Fig. S1a). Initially, all inputs enter each neuron of the first layer, where they are multiplied by different weights depending on the neuron they go to. Inside the neurons (Fig. S1b), the results of the previous operation are summed, and a bias is added. The obtained value inside each neuron is passed through an activation function, which yields an output. The outputs of each neuron in each layer go to the following layer undergoing the same process described to this point. In the last layer, which is composed of one neuron, a unique value for the target variable is calculated for each pair consisting of inputs and a target. This value is compared to the desired one, and the difference between both values is backpropagated through the entire network in order to adjust the weights and biases and to start the processes again and reach an accurate output value after multiple iterations. A complete description of the most common algorithms used to backpropagate and minimize the errors can be found in Rumelhart et al. (1986), Levenberg (1944), and Marquardt (1963).
The method used here is equivalent to that fully described by Broullón et al. (2019) for A T . In addition to the target vari-able (TCO 2 instead of A T ), the main changes in the present study compared to that of Broullón et al. (2019) are the inclusion of the input variable "year", accounting for the anthropogenic increase in the TCO 2 pool, and the use of the pCO 2 database from the LDEO (Takahashi et al., 2017) in addition to the extended GLODAPv2.2019  to enable more robust TCO 2 estimates in the surface ocean. Similar to Broullón et al. (2019), the neural networks were trained using the Levenberg-Marquardt method (Levenberg, 1944;Marquardt, 1963) through the trainlm function (detailed in Beale et al., 2018) in MATLAB. The splitting of the database used in the present study (see Sect. 2.2) into the sets needed for training and testing the network is depicted in Fig. 1. The data were randomly associated with each dataset to capture (training) and evaluate (test) all possible variability. The input variables are temperature, salinity, phosphate, nitrate, silicate, oxygen, sample position, and year (Fig. S1a). The number of neurons tested in the unique hidden layer to find the best neural network was 16, 32, 64, 128, and 256. Ten networks were trained for each number of neurons. The criteria for selecting the final number of neurons are based on a trade-off between the root mean square error (RMSE; between the measured TCO 2 and that estimated by the neural network) on the one hand and the generalization of the network (to prevent overfitting, maintaining a similar error in the training and in the test sets) on the other hand. Furthermore, an additional criterion based on the influence of each input variable on the TCO 2 extracted with the connection weight approach (Olden and Jackson, 2002) was followed to ensure that biogeochemical input variables have a larger influence on the TCO 2 estimates than the input variables related to sample position for selecting a proper network. The influence of each input variable on the computed TCO 2 was obtained from Eq. (1): where C i is the relative importance of the input variable i, H the number of neurons in the hidden layer, w ik the weight of the connection between the variable i and the neuron k of the hidden layer, and w k the weight of the connection between the neuron k of the hidden layer and output layer.

Data
We included the LDEO database version 2016 (Takahashi et al., 2017; https://www.nodc.noaa.gov/ocads/oceans/ LDEO_Underway_Database, last access: 13 November 2017) because it contains significantly more data in the surface layer than GLODAPv2.2019. Since the higher variability in the surface layer may lead to high errors in modeling variables of the seawater CO 2 system (e.g., Carter et al., 2018;Bittig et al., 2018;Broullón et al., 2019), including the LDEO database should force the network to reach a more Figure 1. Division of the complete database in the datasets needed to train the neural network. The percentages in each level are relative to the number of data in the previous one. Data in the datasets of the first level are always the same for each network. Data in the sets of the second level are randomly associated with each set for each network to find the best network weights because of the different starting points in the error weight space of the training process (see also Broullón et al., 2019).
robust fit. The idea is that these additional data probably have more different relationships between input variables and TCO 2 to help the neural network to adequately capture spatiotemporal variability. The pCO 2 , temperature, and salinity data from the LDEO were monthly averaged for each year in a 1 • × 1 • grid. The points where the standard deviation (SD) of the averaged pCO 2 , temperature, and salinity were greater than ±20 µatm, 1.5 • C, and 0.5, respectively, were discarded since the objective is to capture the monthly variability, and therefore an extremely high submonthly variability could lead to errors. To obtain TCO 2 values from the LDEO data, an additional variable of the CO 2 system is necessary, for which we take A T computed using the neural network NNGv2 of Broullón et al. (2019). The input variables required by NNGv2 were obtained from (1) temperature and salinity from the LDEO; (2) filtered oxygen from the World Ocean Atlas version 2013 (WOA13; see Broullón et al., 2019); and (3) phosphate, nitrate, and silicate computed with CANYON-B (Bittig et al., 2018) using the previous variables as inputs. Finally, TCO 2 was calculated from this A T and the averaged pCO 2 using the MATLAB version of the CO 2S YS program (van Heuven et al., 2011); we used the dissociation constants of Mehrbach et al. (1973; as refit by Dickson and Millero, 1987) and the borate dissociation constant of Dickson (1990). Note that we used this software and set of constants for all seawater CO 2 chemistry calculations in the present study. The TCO 2 calculated this way and the associated input variables were used as a part of the training and testing data for the neural networks created here. The final number of data points derived from the LDEO was 54572.
To represent interior ocean conditions, the GLO-DAPv2.2019 database  was added to the LDEO dataset for training and testing the neural network. Only samples which had data for all input variables and TCO 2 were used. This database was included in two ways: (1) only samples where all variables passed the second quality control (n = 287 953; Olsen et al., 2016;Olsen et al., 2019; hereafter abbreviated Gv2QC) and (2) all samples (n = 321 647; hereafter abbreviated Gv2). Therefore, two neural network options were trained and tested: NNGv2QCLDEO and NNGv2LDEO, respectively.

Comparison of methods
We compared our method with CANYON-B of Bittig et al. (2018), where TCO 2 values were also computed from multiple input variables. Both methods are based on neural networks but with certain differences as summarized in Table 1.
An error analysis was carried out in the same areas for which this was done by Broullón et al. (2019) for A T and in several depth ranges (0-50, 50-200, 200-500, and 500-1000 m and 1000 m-bottom) for the two methods (our method and CANYON-B) and for the two datasets (Gv2QC and LDEO). The Gv2QC database was analyzed in this section instead of Gv2 because in the designing of CANYON-B only quality-controlled data were included. The analysis of CANYON-B using the LDEO dataset is useful to evaluate the validity of the approach followed by converting pCO 2 to TCO 2 since CANYON-B has not been trained with this dataset.
Computed pCO 2 from A T and TCO 2 derived from neural networks was also evaluated in the LDEO dataset to assess the adequacy of including this dataset in our approach and to assess the ability of NNGv2 of Broullón et al. (2019) and the present TCO 2 neural network to compute other variables of the seawater CO 2 system. Furthermore, we compared the magnitude of the errors with the ones obtained by Landschützer et al. (2014), in which pCO 2 is computed directly with a neural network, to evaluate the accuracy of our computed pCO 2 .

Validation
In addition to the ability to compute TCO 2 using the Gv2 and LDEO test sets, the neural network has been tested using independent data from 10 ocean time series, located in different regions of the world ocean (data were obtained from https://www.nodc.noaa.gov/ocads/oceans/time_series_ moorings.html, last access: 4 June 2019): Hawaii Ocean Table 1. Differences between the methods used in the present study and in CANYON-B (Bittig et al., 2018). Bittig et al. (2018) This study Training technique Bayesian regularization Levenberg-Marquardt Input variables Temperature, salinity, oxygen, position, and time Temperature, salinity, oxygen, phosphate, nitrate, silicate, position, and time Datasets GLODAPv2 ( Gislefoss et al., 1998), and Kerguelen Islands in the Indian sector of the Southern Ocean (KERFIX; Jeandel et al., 1998). CANYON-B was also used to compute TCO 2 in the time series to show the differences between that method and ours. The TCO 2 values were obtained by feeding the neural networks with the measured values of the input variables at each time series. The data from these time series allow us to test the ability of the neural network to reconstruct not only the seasonal variability of TCO 2 at the various locations and depths sampled but also its long-term trends. For the trend analyses, the measured and estimated TCO 2 values were deseasonalized following Bates et al. (2014).
As an additional test, the measured pCO 2 or the pCO 2 calculated from measured TCO 2 and A T at the time series stations was compared with pCO 2 calculated from the neuralnetwork-generated values of A T and TCO 2 . This provides insight into the combined performance of the NNGv2 of Broullón et al. (2019) and the neural network designed in the present study. Furthermore, we compared the magnitude of the errors to that obtained by Landschützer et al. (2014) for some of the time series.

Climatology of TCO 2
We used the selected network, based on the results of the analyses described above, to construct a climatology of TCO 2 . Climatologies of the input variables were passed through the network to obtain the climatological fields of TCO 2 . The spatiotemporal resolution of the product is determined by that of the climatologies used as inputs: 1 • ×1 • horizontal resolution, 102 upper depth levels of the WOA13, and monthly (for 0-1500 m depth) to annual (for 1550-5500 m depth) temporal resolution. Temperature and salinity climatologies were obtained from objectively analyzed WOA13 fields Zweng et al., 2013; https: //www.nodc.noaa.gov/OC5/woa13/woa13data.html, last ac-cess: 6 February 2017). Oxygen, phosphate, nitrate, and silicate climatologies were taken from Broullón et al. (2019; https://doi.org/10.20350/digitalCSIC/8644, last access: 1 August 2019). These climatologies of nutrients were created using the objectively analyzed climatologies of temperature, salinity, and oxygen (Garcia et al., 2014; oxygen climatology from WOA13 has been filtered by applying a fifth-order one-dimensional median filter through the depth dimension; see Broullón et al., 2019) from the WOA13 in CANYON-B (Bittig et al., 2018). As a year input is needed, we decided to center the TCO 2 climatology around 1995 based on the time distribution of the data used to create the WOA13 climatologies: the World Ocean Database 2013 .
The computed climatological values were compared with those from measured data to assess the uncertainty of the climatology since the WOA13 does not offer an uncertainty field with the objectively analyzed climatologies. Unfortunately, only two locations have enough measured data to calculate a pure climatological value of TCO 2 for each month: HOT ALOHA and BATS. The measured values were monthly averaged at several depth levels, and the anthropogenic carbon as calculated by  was added or subtracted to correct the data to the reference year of the climatology according to where TCO year 2 2 is the TCO 2 corrected for year 2 , which is the reference year of the climatology; TCO year 1 2 is the TCO 2 measured in year 1 , C ant 2002 is the anthropogenic carbon for 2002; and 0.0191 is the annual increase rate derived from the scaling factor determined by Gruber et al. (2019) for the global ocean between 1994 and 2007.
We compared our climatology with previously published climatologies of TCO 2 . The monthly surface climatology created by Takahashi et al. (2014) was used to assess the spatiotemporal differences in the surface layer. The annual climatology of  was used to evaluate the spatial differences in the deeper parts of the ocean. For the comparisons, the climatologies of Takahashi et al. (2014) and  were adjusted to the year 1995, subtracting the anthropogenic carbon (C ant ) of  as in Eq. (2).

D. Broullón et al.: A global monthly climatology of TCO 2
Finally, a surface climatology of pCO 2 was computed from the TCO 2 climatology of the present study and the A T climatology of Broullón et al. (2019) to assess the potential of computing climatologies of other variables of the seawater CO 2 system. For comparison, the updated monthly pCO 2 climatology from Landschützer et al. (2016Landschützer et al. ( , 2017 was used. The values between 1981 and 2010 were averaged to obtain the climatological year 1995. The variable selected from Landschützer et al. (2017) was that labeled as spco2_raw (sea surface pCO 2 ) in the netCDF file.
It should be noted that the RMSE and the bias were obtained for all the comparisons, the last statistic being computed as the difference between the measured (or computed by the method to compare) TCO 2 and the one obtained with the neural network of the present study.

Neural network analysis
Following the established criteria to obtain the optimal number of neurons, the configuration with 128 neurons in the hidden layer was selected. From the 10 networks trained with this number of neurons for each approach (NNGv2LDEO and NNGv2QCLDEO), the ones with the lowest influence of the position input variables were selected. These two networks present a similar RMSE in both training and test datasets, showing there is no overfitting. Because in Gv2QC, both NNGv2LDEO and NNGv2QCLDEO produce the same global RMSE (6.1 µmol kg −1 ), it is likely that the Gv2 dataset contains high-quality measurements, and the possible errors in the non-QC data of this dataset are clearly avoided by the network; otherwise NNGv2LDEO should have a higher RMSE in the test dataset than NNGv2QCLDEO because of an overfitting of the errors in the Gv2 dataset. The same holds for the LDEO dataset. The network properly fitted TCO 2 derived from the LDEO dataset since it does not significatively increase the global RMSE relative to a network only trained with Gv2. Therefore, we decided to continue with NNGv2LDEO only since it has fitted more relationships between variables (e.g., Gv2 has more data points than Gv2QC in the Mediterranean Sea), providing a more robust fitting. For this network, the influence of each input variable on the computed TCO 2 is depicted in Fig. S2. The position variables together (latitude, clongitude, slongitude, and depth) have no more than 30 % influence, allowing biogeochemical variables to be the main ones responsible for the variability of TCO 2 . Furthermore, the input variable year has an influence lower than 5 %. This is probably responsible for capturing the positive interannual trend due to the TCO 2 increase derived from anthropogenic emissions of CO 2 to the atmosphere (see Sect. 3.2).
The global RMSE is quite low for the Gv2 dataset and for the LDEO dataset (Fig. 2). The measured and the computed data are highly correlated (Fig. 2), and the bias is negligi-ble in both datasets. The higher RMSE in the LDEO dataset likely results from the higher variability of TCO 2 in the surface layer and from uncertainties in its calculation from pCO 2 .
The RMSE by area and depth for NNGv2LDEO and CANYON-B in Gv2QC is shown in Table 2. The highest errors for the two methods are in the 0-50 m layer for the Gv2QC dataset and the LDEO dataset. These errors get smaller with increasing depth for all areas, and the depthweighted RMSE of the two methods is not significantly different below 50 m. In the LDEO dataset, NNGv2LDEO produces a lower error than CANYON-B, except for two areas: East GIN (Greenland, Iceland, and Norwegian) Seas and the Bengal Basin (Table 2), although there are only 9 and 13 data points, respectively, in each area. Interestingly, CANYON-B is able to reproduce the TCO 2 data derived from the complete LDEO dataset with a lower error than the one it obtains for the complete Gv2QC dataset in the surface ocean (RMSE LDEO: 16.4 µmol kg −1 ; RMSE Gv2QC, 0-5 m: 17.8 µmol kg −1 ), supporting the approach of computing reliable TCO 2 values from the pCO 2 of the LDEO dataset and the A T computed with NNGv2 (Broullón et al., 2019) since CANYON-B was not trained with the LDEO database. A similar result was obtained for NNGv2LDEO but with a higher difference between the two errors (RMSE LDEO: 11.4 µmol kg −1 ; RMSE Gv2QC, 0-5 m: 17.1 µmol kg −1 ). Finally, the surface RMSE towards LDEO data of NNGv2LDEO is clearly lower than that of CANYON-B. This shows the value of including pCO 2derived surface TCO 2 among the training data, through which there are more fitted relations in our new method.
For data from Gv2 where no QC was performed for at least one of the variables used in the present study (Gv2noQC), the RMSE also decreases with increasing depth ( < 50 m: 22.5 µmol kg −1 ; 50-200 m: 9.8 µmol kg −1 ; 200-500 m: 7 µmol kg −1 ; 500-1000 m: 5.4 µmol kg −1 ; > 1000 m: 5.4 µmol kg −1 ). Thus, the error in Gv2noQC is similar to that in the areas with the highest error in Gv2QC (Table 2; except in Beaufort Sea, where the error is considerably higher). However, the higher error in Gv2noQC is mainly caused by the samples located in the Arctic Ocean since cruises in the Atlantic and Pacific oceans are modeled with a very low error. Therefore, using Gv2noQC does not imply the introduction of low-quality data in our study; otherwise the network would not compute TCO 2 with low errors in Gv2QC because of an overfitting of the possible low-quality data that Gv2noQC could contain.
In general, the highest differences between measured and estimated TCO 2 occurs in the high-latitude surface oceans (Figs. 3 and 4). In Gv2, 40 % of the samples with differences beyond ±3RMSE (3 times RMSE; threshold selected to refer to samples with large residuals) are at latitudes greater than 70 • N. In the LDEO dataset, 39 % of the samples with differences beyond ±3RMSE are from latitudes south of 70 • S. These samples where RMSE is high are 7.5 % of the total   (Fig. 4). A total of 41.5 % of the samples in Gv2 and 43 % in the LDEO dataset with differences beyond ±3RMSE have salinities below 33. Furthermore, in the LDEO dataset, the number of samples with residuals beyond ±3RMSE increases with increasing SD of both pCO 2 and salinity in the monthly averaging in each pixel in the LDEO subset (Fig. S3). This result shows the difficulty of modeling areas with a high submonthly variability in pCO 2 and salinity and supports the exclusion of the averaged LDEO data with a high SD since it could cause the network to interpret the submonthly variability as monthly variability (note that the purpose of this study is to capture the monthly variability).
Like for modeling A T (Takahashi et al., 2014;Broullón et al., 2019), the Arctic Ocean is one of the regions with the highest RMSE of neural-network-estimated TCO 2 . The major Arctic rivers contribute TCO 2 concentrations ranging between 400 and 3600 µmol kg −1 (estimated by Tank et al., 2012), derived mainly from carbonated rocks in the watersheds. Other areas like the Okhotsk Sea also show a high RMSE (Table 2 and Fig. 3), probably because of the high riverine input of TCO 2 (Watanabe et al., 2009). An input variable accounting for the contribution of the rivers to the TCO 2 pool would improve the neural network performance in areas like these, but it is not available.
The errors of the pCO 2 computed in the LDEO dataset with TCO 2 from NNGv2LDEO and A T from NNGv2 (Broullón et al., 2019) are similar to the errors obtained by Landschützer et al. (2014) for the SOCAT database in some of the areas (10-16 µatm; Table 2). This result shows the potential of computing pCO 2 values with neural networks trained for other variables of the seawater CO 2 system, at least in some ocean regions. The global error of the pCO 2 in the LDEO dataset is clearly higher than that obtained by Table 2. RMSE (bias) by area and depth for TCO 2 and pCO 2 , computed with CANYON-B and NNGv2LDEO in Gv2QC and LDEO datasets. For each depth range, the RMSE (bias) in each area was weighted by the contribution of its data to the total. Units are micromoles per kilogram (µmol kg −1 ) for TCO 2 and microatmospheres (µatm) for pCO 2 .    Landschützer et al. (2014) for the SOCAT dataset (22 vs. 12 µatm, respectively), although the critical areas are mainly the same (Fig. S4): equatorial Pacific upwelling system, Arctic and subarctic waters around the Alaska Peninsula, the Southern Ocean, the Gulf Stream, and the North Atlantic Current. At this point, the following should be considered: (1) the pCO 2 computed in the present study is derived from A T and TCO 2 and not from specific modeling for pCO 2 , and therefore it contains errors associated with this computation (∼ 6 µatm; Millero, 1995) and the neural network estimates of A T and TCO 2 ; (2) the present study includes the Arc-tic region where the highest errors occur (Table 2; Beaufort Sea and High Arctic areas); and (3) there is a longer temporal range in the present study . The analysis of Landschützer et al. (2014) in the LDEO dataset for data that differ from SOCAT shows a global error higher than the one obtained in the present study for all LDEO data between 1998 and 2011 (25.9 vs. 21.3 µatm, respectively). The error between 40 • S and 40 • N is similar in the two studies (Landschützer et al., 2014: 16.5 µatm;NNGv2LDEO: 16.4

µatm).
Although it is not the main objective of this work, these last two results show how NNGv2LDEO and NNGv2 (Broullón et al., 2019) have the potential to compute pCO 2 values between 40 • S and 40 • N with similar errors as the method with the lower error in the pCO 2 modeling to obtain a climatology and with lower errors in high latitudes for the LDEO dataset, even taking into account the inclusion of the critical area of the Arctic in the computation of the error of the pCO 2 from the present study (it is not included in Landschützer et al., 2014) and the higher number of data from high latitudes in the present study (15 479 vs. 3799).

Time series validation
The good generalization of the network in the test dataset containing data from Gv2 and the LDEO dataset with a similar RMSE as the one reached in the training set is also evidenced through independent time series data (Table 3). Except for KERFIX, where the number of data points is very low and Olsen et al. (2019) suggested an adjustment to the original data of −39 µmol kg −1 , TCO 2 computed using NNGv2LDEO and CANYON-B at the time series locations is characterized by low errors and biases (Table 3). NNGv2LDEO computes TCO 2 with a lower RMSE and bias than CANYON-B for most of the time series stations (Table 3). CANYON-B reaches a lower RMSE in HOT ALOHA SURFACE and ESTOC than NNGv2LDEO, but the bias is considerably higher in these time series for CANYON-B. The seasonal variability is well captured by NNGv2LDEO, showing its great potential to design a monthly climatology. In the surface layer, where the seasonal variability is the highest, the computed values are strongly correlated with the measured TCO 2 in all the time series (Fig. 5). In addition, the high correlation holds for all depths (Table S1). The TCO 2 computation with a low error in these time series located in different oceanic regimes as well as in the areas of Table 2 shows the good performance of NNGv2LDEO in almost any region of the ocean.
Assessing the potential of neural networks to obtain values of other variables of the seawater CO 2 system in the time series, pCO 2 calculated with A T from NNGv2 (Broullón et al., 2019) and TCO 2 from NNGv2LDEO compared quite well with pCO 2 as measured or calculated from A T and TCO 2 at the time series stations (Table 4). Except for BATS, the pCO 2 obtained in the present study has a lower error than that reported by Landschützer et al. (2014; Table 4). In contrast, the bias in the present study is higher, except for ESTOC. Considering the error involved in the calculation of pCO 2 from A T and TCO 2 (∼ 6 µatm); Millero, 1995), and the error in the computed A T and TCO 2 with the neural networks (Table 4), our results demonstrate again the ability of NNGv2 and NNGv2LDEO to calculate other variables of the seawater CO 2 system with a relatively low error.
Using NNGv2LDEO, it is also possible to reproduce the secular trends in TCO 2 . Using seasonal detrending to enhance the multiannual changes, similar trends in the longer time series are found for the measured TCO 2 and the neural- network-computed TCO 2 ( Table 5). The same holds for pCO 2 (Table 5), although at the IRMINGER site the trend obtained from the neural-network-generated data is significantly lower than that from measured data. The neural networks seem to capture the anthropogenic influence in the seawater CO 2 system and thus the ocean acidification process (Fig. 6). Furthermore, using NNGv2LDEO increases the number of TCO 2 data where the various inputs were measured but not TCO 2 itself. This allows for the evaluation of high-frequency changes (Fig. 6) and for the calculation of Table 3. RMSE and bias between measured and computed TCO 2 concentrations in several time series. The comparison was done using only water samples where all the input variables for NNGv2LDEO and the TCO 2 were measured in the same water sample.

NNGv2LDEO
CANYON-B  Table 4. RMSE and bias between measured pCO 2 (and in some cases, computed from measured A T and TCO 2 in time series where pCO 2 was not measured) and computed pCO 2 with A T from NNGv2 (Broullón et al., 2019) and TCO 2 from NNGv2LDEO in several time series. The time period for pCO 2 from this study is the same as in Table 3. Consult Table 2  interannual trends with a low error (as temporal sampling biases are reduced).

Climatology
Using NNGv2LDEO we have demonstrated its ability to compute TCO 2 values with low errors and especially to capture the monthly variability of this variable. In addition, the climatologies of the input variables used to create the climatology of TCO 2 have been satisfactorily evaluated previously for the construction of an A T climatology (Broullón et al., 2019). Considering these results, a monthly climatology of TCO 2 is obtained by passing the input climatologies through NNGv2LDEO.
The spatial distribution of the surface annual mean climatology of TCO 2 (Fig. 7a) is similar to two recent climatologies: those of Takahashi et al. (2014) and . The largest surface TCO 2 concentra-tions occur in the Southern Ocean, subpolar North Atlantic, Nordic Seas, and Mediterranean Sea (note that the latter is not included in these other climatologies). In general, surface TCO 2 decreases from high to low latitudes. The Indian and the Pacific oceans are characterized by lower concentrations of TCO 2 at higher latitudes than the Atlantic, the latter being the ocean with the highest surface TCO 2 by area. TCO 2 increases with depth in all oceans, in particular in the upwelling regions, where this increase is expanded eastwards with depth ( Fig. 7b and video at https://doi.org/10.20350/digitalCSIC/10551, Broullón et al., 2020). Depending on the area, the values reach a maximum at certain intermediate depths, and below it the concentration gradually decreases or remains almost constant (Fig. S5).
The largest seasonal variability occurs at the surface at high latitudes, in the Pacific upwelling region, the equatorial African coasts, and in the area under influence of the Amazon River (Fig. 8a). At depth, the seasonal variability de-  creases, except for the Pacific upwelling region, where it increases and moves progressively northward between 30 and 150 m (Fig. 8b). This increase is correlated with the high seasonal variability of the climatologies of nutrients, oxygen, and temperature at these depths. Czeschel et al. (2012) also showed an increase in the subsurface variability of oxygen from measured profiles. Similar increases also occur in the Indian Ocean north of 20 • S between 50 and 100 m and in the equatorial Atlantic Ocean in the same depth range. At the 1500 m level, the seasonal variability is below 10 µmol kg −1 in most of the ocean (Fig. 8c). This last result shows that an annual climatology below 1500 m is sufficient.
Although the surface patterns of the annual mean of the TCO 2 climatology are very similar to those of the other recent climatologies (Takahashi et al., 2014;, differences do occur. The annual mean climatology of the present study is closest to that of Takahashi et al. (2014;Table 6). The largest differences between these two climatologies are located in the Arctic, North Pacific, Peru upwelling area, western South Pacific, and the area of influence of the Antarctic Circumpolar Current (Fig. S6a). The Atlantic and the Indian oceans do not show significant differences. Our climatology shows more deviations from that of , compared in the grid of Takahashi et al. (2014; Table 6). The highest differences are found in the North Pacific, around Antarctica, the Nordic Seas, the South and North Atlantic, and in several less localized areas around the oceans (Fig. S6b). When the climatology of Takahashi et al. (2014) is compared to that of , the differences are even higher (Table 6), and the critical areas are the same of those of the previous comparison. Although it is clear that discrepancies between the three climatologies are derived from the different methods used, the higher similarity between ours and the one of Takahashi et al. (2014) is probably due to the influence of the same source used to create them, the World Ocean Atlas.
The comparison of our climatology with that of  at the 33 depth levels of  shows a reduction in the RMSE with depth. Between 0 and 1000 m, the RMSE is reduced from ∼ 32 to 7 µmol kg −1 (Table S2; note the higher RMSE at surface compared to the one obtained for the grid of Takahashi et al., 2014, because of  the inclusion of areas which are not included in the latter's grid, and the difficulty of modeling TCO 2 in some areas, like the Arctic and the Mediterranean Sea). This reduction with depth is probably due to the reduction in the variability in most of the ocean below the surface. The surface values in  are likely characteristic of months in which most of the sampling was carried out. Because of the lower variability of TCO 2 at depth, the values are closer to the annual mean, and therefore the two compared climatologies are more similar at depth than at surface depth levels. Below 1000 m, the differences between the two climatologies are not significant, with an RMSE around 5 µmol kg −1 and a bias around 0.5 µmol kg −1 at each depth level (Table S2).
Our monthly climatology shows a high correspondence with that of Takahashi et al. (2014), although the RMSE values show that there are also large differences in certain areas (Table 7). These areas are mainly the same as those in the comparison of the annual mean climatologies, but some other small regions with high differences appear for each month all through the ocean (Fig. S7).
Unfortunately, the uncertainty of the TCO 2 climatology cannot be assessed globally and robustly. As Broullón et al. (2019) stated, the unavailability of an uncertainty field  Takahashi et al. (2014) 13.2 23.7 - * The domain analyzed is the same as in Takahashi et al. (2014) for coherency reasons. associated with the objectively analyzed WOA13 climatologies does not allow us to perform a proper global uncertainty assessment. Therefore, the analysis is relegated to the areas where repeated sampling of TCO 2 has been carried out monthly over a long period, that is, the HOT ALOHA and BATS time series stations. The climatology of TCO 2 from NNGv2LDEO is consistent with the monthly climatological values at these two places (Fig. 9). In general, the profiles from the TCO 2 climatology are within the variability range (shadow area in Fig. 9) of the monthly averaged measured data for each depth level. In the upper 30 m of the water column, the climatology of TCO 2 differs from  Fig. 5a, where the computed TCO 2 decreases from maximum to minimum sooner than the measured TCO 2 . For HOT ALOHA, the RMSE of the profiles of the TCO 2 climatology oscillates between 3.6 and 9.2 µmol kg −1 with a mean value of 6.3 µmol kg −1 (bias range: −3.8 to 1.2 µmol kg −1 ; mean bias: −1.4 µmol kg −1 ). At BATS, the RMSE is lower than for HOT ALOHA: 1.1 to 8.5 µmol kg −1 with a mean value of 4.4 µmol kg −1 (bias range: −1.4 to 7.6 µmol kg −1 ; mean bias: −2.5 µmol kg −1 ). Furthermore, the seasonal variability of the TCO 2 climatology is quite similar to that of the measured data at BATS and HOT ALOHA (Fig. S8). Although in other time series there are not enough measured data to obtain climatological values, these pseudoclimatological values also correlate very well with the TCO 2 climatology (data not shown). These results suggest that the climatology is robust in different oceanographic regimes and adequately captures the seasonal cycle of TCO 2 . It has been demonstrated in this study that pCO 2 and possibly other variables of the seawater CO 2 system can be computed from A T and TCO 2 derived from neural networks with a relatively low error in different datasets (LDEO in Sect. 3.1 and time series in Sect. 3.2). The pCO 2 climatology (Fig. S9) computed from the TCO 2 climatology of the present study and the A T climatology of Broullón et al. (2019) is very similar to that of Landschützer et al. (2017). The differences between the annual mean climatology of the two studies are below 15 µatm in most of the ocean (RMSE: 8.3 µatm; bias: 2.9 µatm; r 2 : 0.82). The differences above this threshold are mainly located in the Pacific equatorial upwelling system, the eastern part of the South Pacific Gyre, Nordic Seas, Labrador Sea, Atlantic section of the Southern Ocean, Bay of Bengal, and the waters surrounding the eastern margin of Asia (Fig. S10). In most of these areas, both methods have the greatest errors (Figs. 2 and 4 in Landschützer et al., 2014, and Fig. S4 of the present study).
On a monthly basis, the RMSE between the two pCO 2 climatologies is between 13.6 and 15.6 µatm, and the correlation is lower than for the annual mean comparison (r 2 : 0.55-0.72 vs. 0.82). The areas with the higher differences are the same as in the annual comparison, but other small regions appear along the ocean month by month (Fig. S11). Furthermore, the seasonal variability in the two climatologies matches in a great extension of the ocean, although there are areas with notable differences (Fig. S12). In general, the pCO 2 climatology is quite similar to that of Landschützer et al. (2017), and this result helps show that both the TCO 2 climatology of the present study and the A T climatology of Broullón et al. (2019) are mostly robust and suggest that climatologies of other seawater CO 2 system variables can be confidently computed.

Data availability
The climatologies of TCO 2 and pCO 2 as well as NNGv2LDEO designed in this study are available at the data repository of the Spanish National Research Council (CSIC; https://doi.org/10.20350/digitalCSIC/10551; Broullón et al., 2020).

Conclusions
We presented a tool for computing TCO 2 in the global ocean. Compared to previous methods, the uncertainties in such computations have been reduced. Including two updated datasets containing thousands of measurements of inorganic carbon variables across the ocean in the training of the neural network, we were able to capture a wide range of variability of TCO 2 . The low errors obtained in independent subsets at time series stations are further evidence of the potential of the network in computing TCO 2 .
Our global monthly climatology created with a neural network is the first that covers the oceans from the surface to the abyss at such temporal resolution. In addition to the accuracy of the network, the low uncertainty of the climatology in different regions and its usefulness in creating climatologies of other seawater CO 2 chemistry variables (i.e., pCO 2 ) show its robustness. Therefore, we present the global climatology of TCO 2 to the scientific community to complement the recently designed climatology of A T by Broullón et al. (2019) for its use in the initialization and evaluation of models or any other analysis related to the carbon cycle.
Author contributions. DB, FFP, and AV designed the study. The manuscript was written by DB and revised and discussed by all the authors. The dataset of the climatology and the neural network were created by DB.
Competing interests. The authors declare that they have no conflict of interest.
Acknowledgements. The authors want to thank the three referees, whose comments improved the study. The paper is dedicated to Taro Takahashi, who contributed immensely to the understanding of the ocean's role in the carbon cycle and passed away in December 2019.
Financial support. This research was supported by Ministerio de Educación, Cultura y Deporte (FPU grant no. FPU15/06026); Ministerio de Economía y Competitividad through the ARIOS (grant no. CTM2016-76146-C3-1-R) project, cofunded by the Fondo Europeo de Desarrollo Regional 2014-2020 (FEDER); and EU Horizon 2020 through the AtlantOS project (grant agreement no. 633211). Are Olsen was supported by the Norwegian Research Council through ICOS (grant no. 245927). Mario Hoppema was partly supported by the European Union's Horizon 2020 program under grant agreement no. 821001 (SO-CHIC).
Review statement. This paper was edited by Jens Klump and reviewed by three anonymous referees.