Feasibility of reconstructing the summer basin-scale sea surface partial pressure of carbon dioxide from sparse in situ observations over the South China Sea

Sea surface partial pressure of CO2 (pCO2) data with a high spatiotemporal resolution are important in studying the global carbon cycle and assessing the oceanic carbon uptake. However, the observed sea surface pCO2 data are usually limited in spatial and temporal coverage, especially in marginal seas. This study provides an approach to reconstruct the complete sea surface pCO2 field in the South China Sea (SCS) with a grid resolution of 0.5× 0.5 over the period of 2000–2017 using both remote-sensing-derived pCO2 and observed underway pCO2, among which the gridded underway pCO2 data in 2004, 2005, and 2006 are presented for the first time. Empirical orthogonal functions (EOFs) were computed from the remote-sensing-derived pCO2. Then, a multilinear regression was applied to the observed pCO2 as the response variable with the EOFs as the explanatory variables. EOF1 explains the general spatial pattern of pCO2 in the SCS. EOF2 shows the pattern influenced by the Pearl River plume on the northern shelf and slope. EOF3 is consistent with the pattern influenced by coastal upwelling along the northern coast of the SCS. When pCO2 observations cover a sufficiently large area, the reconstructed fields successfully display a pattern of relatively high pCO2 in the mid and southern basin. The rate of sea surface pCO2 increase in the SCS is 2.4±0.8 μatmyr−1 based on the spatial average of the reconstructed pCO2 over the period of 2000–2017. This is consistent with the temporal trends at Station SEATS (SouthEast Asia Time-series Study; 18 N, 116 E) in the northern basin of the SCS and at Station ALOHA (A Long-Term Oligotrophic Habitat Assessment; 2245 N, 158W) in the North Pacific. We validated our reconstruction with a leave-one-out cross-validation approach, which yields the root-mean-square error (RMSE) in the range of 2.4–5.2 μatm, smaller than the spatial standard deviation of our reconstructed data and much smaller than the spatial standard deviation of the observed underway data. The RMSE between the reconstructed summer pCO2 and the observed underway pCO2 is no larger than 31.7 μatm, in contrast to (a) the RMSE from 12.8 to 89.0 μatm between the remote-sensing-derived pCO2 and the underway data and (b) the RMSE from 32.6 to 44.5 μatm between the neural-network-produced pCO2 and the underway data. The difference between the reconstructed pCO2 and those calculated from observations at Station SEATS is in the range from−7 to 10 μatm. These comparison results indicate the reliability of our reconstruction method and output. All the data for this paper are openly and freely available at PANGAEA under the link https://doi.org/10.1594/PANGAEA.921210 (Wang et al., 2020). Published by Copernicus Publications. 1404 G. Wang et al.: Feasibility of reconstructing the summer basin-scale sea surface pCO2

G. Wang et al.: Feasibility of reconstructing the summer basin-scale sea surface pCO 2

Introduction
The ocean plays an important role in absorbing atmospheric CO 2 and consequently helps slow down global warming (Le Quere et al., 2018a). Over the last half-century the ocean has taken up approximately 24 % of the total emitted CO 2 at an increasing rate from 1.0 ± 0.5 in the 1960s to 2.6 ± 0.6 Gt C yr −1 in 2019 (Friedlingstein et al., 2020;Le Quere et al., 2018b). The ocean has been found to be responsible for up to 40 % of the decadal variability of CO 2 accumulation in the atmosphere (DeVries et al., 2019). However, the regional and global patterns of the oceanic carbon sink vary both spatially and temporally (Doney et al., 2009;Fay and McKinley, 2013;Landschutzer et al., 2014;Le Quere et al., 2010;Rodenbeck et al., 2015;Turi et al., 2014). Thus, it is necessary to improve the spatiotemporal coverage and accuracy of the data in the evaluation of oceanic carbon uptake capacity in order to better understand the global carbon cycle and to better project the future climate.
The sea-air CO 2 flux is primarily determined by the difference in the atmospheric and sea surface partial pressure of CO 2 (pCO 2 ). The measurements of sea surface fugacity of CO 2 (f CO 2 , which is equal to pCO 2 corrected for the nonideal behavior of the gas; Pfeil et al., 2013) have increased to 28.2 million and are presently available in almost all ocean basins based on the Surface Ocean CO 2 Atlas version 2020 (Bakker et al., 2020). However, for a given year the observations of sea surface pCO 2 may still have sparse spatial coverage. Thus, interpolation and/or extrapolation methods are needed to obtain a complete pCO 2 field in space and time over the concerned oceanic areas. Various methods have been applied for this purpose in the past 2 decades, including statistical interpolation (Chou et al., 2005) and empirical formulas between pCO 2 and proxies such as sea surface temperature, salinity, chlorophyll a, sea surface height, and mixed-layer depth (Boutin et al., 1999;Denvil-Sommer et al., 2019;Jo et al., 2012;Laruelle et al., 2017;Lefevre and Taylor, 2002;Ono et al., 2004;Zhai et al., 2005a). These studies usually present their pCO 2 fields in a monthly timescale and at a 1 • × 1 • or even coarser grid. In marginal seas a finer grid resolution is needed to discern influences posed by local forces such as plumes and upwelling.
The South China Sea (SCS) is the largest marginal sea in the western Pacific. Measurements of sea surface pCO 2 in the SCS started as early as 2000 (Zhai et al., 2005b). Seasonal and spatial variations are present in different domains of the SCS Zhai et al., 2013). However, the data coverage is still so sparse each year that on global compilation maps the SCS has large blank areas (Bakker et al., 2016;Fay and McKinley, 2013;Takahashi et al., 2009). For example, the summer observations of 2017 cover 7 % of the SCS, and those of 2001 cover only 1 %. Consequently, the observational data themselves cannot quantitatively depict the pCO 2 Figure 1. Reconstruction procedure of the sea surface pCO 2 in the SCS. Here, RS pCO 2 means the original remote-sensingderived pCO 2 ; Obs. pCO 2 represents the original observed in situ pCO 2 . RS estimates are the grid-aggregated remote-sensingderived pCO 2 , and Obs. data are the grid-aggregated observed pCO 2 . The standard deviations are the temporal standard deviation of the RS pCO 2 estimates on each grid box. field over the entire SCS basin. Thus, it is necessary to reconstruct a space-time complete pCO 2 field in the SCS in order to better assess the CO 2 source and sink features in the SCS and to supplement the global pCO 2 map.
The purpose of this paper is to demonstrate the feasibility of reconstructing the pCO 2 field over the SCS basin from the sparse in situ observations in the SCS with a grid resolution of 0.5 • × 0.5 • , using a method illustrated in the flowchart of Fig. 1. This paper focuses on the pCO 2 reconstruction for the summer season. As indicated in Fig. 1, we need to use an auxiliary dataset, the remote-sensing-derived pCO 2 estimates, e.g., from Bai et al. (2015a), to calculate empirical orthogonal functions (EOFs) for spatial patterns of pCO 2 . The remote sensing pCO 2 estimates are relatively complete in the space-time grid but less accurate, compared with in situ observations. The singular-value decomposition (SVD) method is applied to the remote sensing estimates to compute the EOFs. These EOFs form an orthogonal basis for the spectral optimal gridding (SOG) method (Shen et al., 2014Gao et al., 2015;Lammlein and Shen, 2018). The method uses a multilinear regression to blend the in situ data (treated as the data of the response variable in the regression) and the EOFs (treated as the explanatory variables) together to reconstruct the complete summer pCO 2 field at 0.5 • × 0.5 • over the SCS. Section 2 will describe the datasets and methods; Sect. 3 includes results and discussion; and the conclusions are in Sect. 4.

Observed data in the SCS
In the SCS, the underway sea surface pCO 2 data are hardly available for every month of each year, so we decided to compile the data seasonally. This study focuses on the summer data, since the greatest temporal coverage of the sampling occurs in summer. The available underway summer pCO 2 data from 2000 to 2017 are compiled in this study and shown in Table 1, in which the pCO 2 data in August 2004, July 2005, and June 2006 are new and obtained continuously with a non-dispersive infrared gas analyzer (LI-COR 7000). The summer data are the June-August mean for each year in this period excluding 2002excluding , 2003excluding , 2010excluding , 2011excluding , and 2013excluding (Li et al., 2020Zhai et al., 2005a). Thus, we have observed underway pCO 2 data for 13 summers during 2000-2017. Figure 2 shows that these data are distributed mainly on the northern shelf and slope and in the northern and mid basin of the SCS with a different frequency of summer observations on different grid boxes. The observational data were aggregated onto 0.5 • × 0.5 • grid boxes in the 5-25 • N, 109-122 • E region of the SCS. The aggregation used a simple space-time average of the data in a grid box. The aggregated data for the 13 summers are shown in Fig. 3, which shows the distribution pattern of the observed underway pCO 2 data of each year. The aggregated pCO 2 in general falls in the range of 160-480 µatm with relatively larger spatial variation nearshore and smaller spatial variability in the basin. In addition, the large differences are apparent in the spatial coverage from year to year. For example, in the summer of 2007 the observed underway pCO 2 data cover a spatial range of 12 • in latitude and 13 • in longitude, with 231 grid boxes with data that cover 22 % of the SCS. The data fall in the range of 281-480 µatm. In the summer of 2017 the observed data cover a spatial range of 13 • in latitude and 6 • in longitude, with 77 grid boxes with data that cover 7 % of the SCS. The data are in the range of 279-440 µatm. The summer of 2000 has only five grid boxes (covering 0.5 % of the SCS) with data in the range of 400-425 µatm. The lowest observational pCO 2 values appear on the northern SCS shelf due to the influence of the Pearl River plume (see Fig. 2), where nutrient-stimulated phytoplankton uptake consumes CO 2 . The relatively high sea surface pCO 2 values occur mainly in the basin, which are often higher than the atmospheric pCO 2 . The high pCO 2 values off the northeastern coast of the SCS and the southern coast of Hainan Island in the summer of 2007 are consistent with local upwelling occurrences, which bring CO 2 -enriched water from the subsurface . In the summer of 2012, the spatial coverage is 7 • in latitude and 9.5 • in longitude. The pCO 2 data are  (Gan et al., 2015;Jing et al., 2015). Some other data, for example, in the summer of 2000, however, are relatively localized so that no certain spatial pattern is shown before the reconstruction. Our reconstruction results will help display the spatial patterns of the complete sea surface pCO 2 field. As a dataset for our reconstruction validation, we calculated the sea surface pCO 2 from the observed temperature, salinity, total alkalinity, dissolved inorganic carbon, phosphate, and silicate at Station SEATS (SouthEast Asia Timeseries Study; 18 • N, 116 • E) in the northern basin of the SCS in the summer of 2007, 2009, 2012, 2014, and 2017. The nutrient sample collection and measurement are described in Du et al. (2013Du et al. ( , 2017. The samples of total alkalinity and dissolved inorganic carbon were collected and measured following the same procedure in Guo et al. (2015). The calculation of pCO 2 was made using the program of Lewis and Wallace (1998), in which the apparent dissociation constants for carbonic acid of Mehrbach et al. (1973) refit by Dickson and Millero (1987) and the dissociation constant for bisulfate ion from Dickson (1990) were employed. Another sea surface pCO 2 dataset calculated in the same way at Station SEATS in the summer of 2000, 2001, 2004, and 2006was compiled from Lui et al. (2020.

Remote-sensing-derived sea surface pCO 2 data
The satellite remote-sensing-derived sea surface pCO 2 in the SCS was estimated for the years of 2000-2014 using a "mechanistic semi-analytical algorithm" (MeSAA) developed by Bai et al. (2015a). In the summer in the SCS, the thermodynamic, mixing, and biological effects on the sea surface pCO 2 were parameterized in the MeSAA algorithm as a function of major controlling factors derived from multiple-satellite-derived sea surface temperature, colored dissolved organic matter, and chlorophyll. The spatial resolution of the remote-sensing-derived pCO 2 estimates is 1 ×1 . These estimates were aggregated into 0.5 • ×0.5 • grid boxes in our study region (5-25 • N, 109-122 • E). As shown in Fig. 4, the gridded remote-sensing-derived pCO 2 data cover almost all the areas of the SCS (see the boxes of RS pCO 2 and RS estimates of full coverage in Fig. 1). We made a validation study for the remote-sensing-derived pCO 2 by comparison with the observed underway pCO 2 (Fig. 5). In general, most of the remote-sensing-derived pCO 2 overestimate the sea surface pCO 2 but not by more than 50 µatm. The root-mean-square error (RMSE) falls in the range from 12.8 to 89.0 µatm with a median of 33.8 µatm ( Table 2). The RMSE values are high in the years when the underway data covered only the shelf regions. With the MeSAA algorithm, the derived pCO 2 dataset represents the major CO 2 variation in large scales. However, variations shown by these remotesensing-derived pCO 2 data are much less than those shown by the observed pCO 2 data because the current MeSAA al-gorithm does not consider some local processes, such as eddies and potentially different carbonate system patterns in coastal areas. Larger spatial variations are expected especially in areas influenced by river plumes. This makes it necessary to reconstruct a pCO 2 field not only from the remotesensing-derived pCO 2 but also constrained by the observed in situ pCO 2 data from cruises. Figure 1 is a flowchart of our method. We used the remotesensing-derived data to compute the EOFs for the SOG reconstruction. The grid with 0.5 • ×0.5 • resolution covered 5-25 • N and 109-122 • E with 1040 grid boxes in total. The land area data were marked with NaN. The data were arranged in a 1040 × 15 space-time matrix with rows for grid boxes and columns for time. Then, we removed the 143 land grid boxes from the data and computed the climatology and standard deviation for the remaining 897 non-NaN grid boxes from the 15 years of remote-sensing-derived data from 2000 to 2014. The standardized anomalies were computed for each grid box using the remote-sensing-derived data minus the climatology and subsequently dividing the difference by the standard deviation. The singular-value decomposition (SVD) method was applied to the standardized anomalies in the space-time matrix to compute the EOFs. The results are shown in Sect. 3. The climatology and standard deviation calculated from the remote-sensing-derived data were also used to compute the standardized anomalies of the observed data, which were used as the response variable in the SOG regression reconstruction. Following the reconstruction of the standardized anomalies, the remote-sensing-derived climatology and standard deviation were then used to produce the full field as the final reconstruction result.

Reconstruction method
The SOG reconstruction method is basically a multilinear regression model for the space-time field at grid box x and time t, expressed as follows: (1) where P (x, t) is the response variable whose data are the standardized anomalies of the observed data, β 0 (t) is the re- is the area factor, φ x is the centroid's latitude (expressed in radians) of the grid box x, and e (x, t) is the regression error. The error is assumed to be normally distributed with a zero mean and has an independent error variance of where E denotes the mathematical operation of the expected value. The explanatory variables in the above multilinear regression are E m (x), computed from the area-weighted standardized anomalies of the remote-sensing-derived data. The Table 2. The RMSE between the remote-sensing-derived pCO 2 estimates and the observed underway pCO 2 data (RMSE RS ), of the crossvalidation (RMSE CV ), and between the reconstructed pCO 2 and the observed underway pCO 2 data (RMSE RC ) (unit: µatm).
Year 2000 2001 2004 2005 2006 2007 2008 2009 2012 2014 2015 2016  x 1 , x 2 , . . ., x 17 . The data in these 17 boxes were used to estimate the intercept β 0 (t) and coefficients β m (t) of the regression. With the estimates b 0 (t) and b m (t), m ∈ M, the reconstructed standardized anomalies are expressed aŝ where x runs through the entire 893 grid boxes over our study region in the SCS. These anomalies were converted to the full field by adding the climatology and multiplying the standard deviation computed from the remote-sensing-derived data for each of the 893 grid boxes. In this way, the full reconstructed field was produced and is presented in Sect. 3. Many computer software packages are available to compute the EOFs using SVD and to compute multilinear regres- sions. This paper chose to use R, a computer program language that has become a popular data science tool in the last few years, for this purpose.
SOG usually uses the first few EOFs or the first M EOFs that account for more than 80 % of the total variance or that are determined by response data via a correlation test (Smith et al., 1998). Here, M is the size of set M. The current paper used eight EOFs that explain 87 % of the total variances (Fig. 6).

EOFs and PCs
EOF1 demonstrates the mode of an average level of pCO 2 with lower or higher values near the coastal regions of mainland China (Fig. 7a). This mode accounts for 49 % of the variance, which indicates the dominance of the average field and hence a small overall spatial variation, except in the coastal regions. The remote-sensing-derived pCO 2 data support this mode well. EOF2 shows a north-south dipole (Fig. 7b), which is supported by the observed data shown in Fig. 3, particularly in the summer of 2017, showing lower values in the north on the shelf and slope and higher values in the south in the ocean basin. The minimum values in the north occur where the Pearl River plume dominates Zhai et al., 2013). EOF3 shows an east-west pattern (Fig. 7c), in addition to the north-south dipole in EOF2. EOF3 thus reflects a spatial variation of a smaller scale. This pattern is consistent with that influenced by coastal upwelling along the northeastern China coast and off eastern Hainan Island (Gan et al., 2015;Jing et al., 2015).
The PCs are a temporal stamp of the occurrence of the spatial patterns. PC1 basically shows the temporal trend (Fig. 8d). It has been concluded that surface SCS pCO 2 has an increasing trend with time (Tseng et al., 2007). PC2 indicates the strength of the north-south dipole. This strength seems to be related to the strength and extent of the Pearl River plume on the northern shelf and slope (Bai et al., 2015b;Li et al., 2020;Zhai et al., 2013). PC3 shows the temporal variation corresponding to the east-west spatial pattern of EOF3. Figure 8 shows that the reconstructed pCO 2 fields in the SCS have successfully displayed the spatial patterns of the observed pCO 2 and in general are consistent with previous studies Zhai et al., 2013). Relatively low values appear in the northern coastal region where the Pearl River plume is dominant in summer, and generally high values occur in the mid and southern basins.

Reconstruction results in the SCS
The reconstructions have taken the advantages of both the observed underway data for retaining spatial and temporal variations and the remote-sensing-derived data for EOF patterns. By default, the reconstructed field has fidelity to the in situ data because the SOG reconstruction method is a fit of EOFs to the in situ data. The reconstruction is, thus, consistent with the in situ observations. When the in situ data cover a sufficiently large area and hence provide a proper constraint to the EOF fitting through the SOG procedure, the reconstruction result is more faithful to the reality. For example, the reconstructions of the summers of 2004, 2007, 2009, 2012, and 2014-2017 nicely demonstrate the spatial pCO 2 patterns (Fig. 8c, f, h, and i-m) that are consistent with observations Zhai et al., 2013) and ocean dynamics (Gan et al., 2015;Jing et al., 2015).
When the observational data are scarce, as long as the in situ data provide a proper constraint to the EOFs, the reconstruction can still yield reasonable results. For example, the summer of 2001 has few in situ data, but its reconstruction, with an RMSE of 7.3 µatm between the reconstructed data and the observed data, appears reasonable (Fig. 8b).
In cases of extreme data scarcity, the reconstruction may not be reliable. For example, the reconstructed data in the summer of 2000 appear to be of poor quality (Fig. 8a), since the relatively low values in the mid-SCS basin may not be realistic. These poorly reconstructed data may be due to the poor spatial coverage of the in situ pCO 2 data in the summer of 2000, which had only five grid boxes with data (Fig. 3a). These five boxes are all located together and cover only 0.5 % of the SCS. Similarly, the reconstructed pCO 2 data for the summers of 2005, 2006, and 2008 are not well constrained by Figure 5. The difference between the remote-sensing-derived summer pCO 2 estimates and the observed underway pCO 2 (unit: µatm) in 2000, 2001, 2004-2009, 2012, and 2014. the in situ pCO 2 data that cover only the northern shelf and slope of the SCS so that the reconstructed pCO 2 in the mid basin are less than 350 µatm (Figs. 8d, e, and g). These small values are unlikely, since the sea surface pCO 2 in the basin is generally higher than the atmospheric pCO 2 (380-420 µatm) Zhai et al., 2013). Another cause of the less ideal reconstruction results for the summers of 2005, 2006, and 2008 may be the large spatial variations in the in situ data. These variations, such as those for the summer of 2008 (Fig. 3g), in the in situ data can cause a large deviation of the regression coefficients because the linear regression is not robust.
The reconstruction results have demonstrated the feasibility of the SOG reconstruction of the sea surface pCO 2 over the SCS, as long as the in situ data provide a proper constraint to the EOFs. The percentage of the in situ data coverage needs not necessarily be large. However, large spatial variations in the in situ data can distort the reconstruction and lower the quality of reconstruction because the linear regression method is not robust.

Reconstruction validation and uncertainty quantification
To quantitatively validate our reconstruction, we conducted a leave-one-out cross-validation study: withholding a grid box datum, making the reconstruction using the remaining in situ data, and computing the difference between the withheld datum and the reconstructed datum at the same grid box. This was repeated for every grid box with in situ data for each year. The final cross-validation result is output as RMSE (Table 2). The maximum RMSE is 5.2 µatm, which occurred in 2006 when there were only 25 grid boxes with in situ pCO 2 data, and the in situ data had the largest spatial standard deviation, 49.4 µatm, among the 13 years under consideration. The minimum RMSE is 2.4 µatm, which occurred in 2017 with 77 in situ data grid boxes and a spatial standard deviation of 17.6 µatm for the in situ data. This accuracy is very good compared to the spatial standard deviation of the in situ data in the same year. Compared to the 2006 data, a more accurate reconstruction for 2017 is expected because of more grid boxes with in situ data and smaller spatial variability. This is supported by the cross-validation result. The spatial standard deviation of the reconstructed data is in the range of 2.1-6.6 µatm. The cross-validation RMSEs are in the range of 2.4-5.2 µatm. We thus conclude that the reliability of our reconstruction is well supported by the leave-one-out crossvalidation result.
We have also considered other types of cross-validations, such as leaving out data in half of the study region. A numerical test was made for the following situation: leaving out the western or eastern half of the data in a year, making the reconstruction using the remaining half of data, and computing the RMSE between the removed data and the reconstructed data. The analysis was done for the years with better spatial coverage : 2007, 2009, and 2012. When the western halves (longitude < 115.5 • E) of data in these years were removed, the resulted RMSEs were 2.77, 4.46, and 3.82 µatm for 2007, 2009, and 2012, respectively. When the eastern halves (longitude > 115.5 • E) of data were removed, the RMSEs were 4.32, 3.66, and 3.55 µatm in 2007, 2009, and 2012, respectively. These RMSEs fall within the range of the RMSEs of the leave-one-out cross-validation. These numerical results are another confirmation of the reliability of our reconstruction.
The uncertainty in our reconstruction was quantified by grid-by-grid comparisons of the reconstructed pCO 2 with the observed pCO 2 in two ways. One is the comparison with the observed underway data (Fig. 9). The difference between the reconstructed data and the observed underway data mostly falls within the range from −30 to 30 µatm (Fig. 9). The greatest deviation from the underway data appears near the coast, likely due to the lack of some typical patterns in coastal areas transferred via EOFs from the remote sensing estimates. The RMSE between the reconstructed data and the observed underway data is no larger than 31.7 µatm with a median of 16.5 µatm, which is smaller than the RMSE between the remote-sensing-derived pCO 2 and the underway data with the relative difference between the two RMSEs (rows 1 and 3 in Table 2) by at least 29 %. When comparing the pCO 2 data produced by Jo et al. (2012) in the northern SCS by a neural network approach in the summer of the years [2004][2005][2006][2007] with the underway pCO 2 , the resultant RMSE falls in the range from 32.6 to 44.5 µatm and is twice as much as the median RMSE between our reconstructed pCO 2 and the underway pCO 2 (Table 2). Another uncertainty quantification for our reconstruction is to compare it with the pCO 2 calculated from long-term observations at Station SEATS (Fig. 10). The difference between the reconstructed pCO 2 and the observed data at Station SEATS ranges from −7 to 10 µatm with the relative difference from −1.5 % to 2.1 %. Both comparisons confirm that our reconstruction results are reliable.
As an application of our reconstruction and a validation, we examine the temporal trend of sea surface pCO 2 over the SCS. The rate based on the linear temporal trend of the spatial average of the reconstructed sea surface pCO 2 over the SCS from 2000 to 2017 is 2.4 ± 0.8 µatm yr −1 (see Fig. 11a). It is lower than the rate of f CO 2 increase (4 µatm yr −1 ) for the period of 1999-2003(Tseng et al., 2007, while it is higher than the rate of pCO 2 increase for the period of 1998-2006 (0.8 µatm yr −1 ) at Station SEATS (Lui et al., 2020). The differences between their rates and ours exist because (a) our rate is a spatial average in summer and their rates are based on data collected in spring, summer, fall, and winter at a basin station and (b) the period to derive our rate is much longer than theirs. Using the summer data in Lui et al. (2020), the rate we estimated from the year 2000, which is the beginning year of our data, to the year 2006 at Station SEATS is 2.5 ± 1.0 µatm yr −1 . Although the period of 2000-2006 is much shorter than our period of 2000-2017, the summer rate at Station SEATS is almost the same as our rate based on the reconstructed data over the SCS. When compared with the summer rate of observed pCO 2 at Station ALOHA (A Long-Term Oligotrophic Habitat Assessment station) (22 • 45 N, 158 • W) in the North Pacific, which is 2.0 ± 0.2 µatm yr −1 over 2000-2017 (Dore et al., 2009) (see Fig. 11b), our rate is consistent with the rate in the North Pacific. The consistency of the trend in our reconstructed sea surface pCO 2 over the SCS with the local trend at Station SEATS and the North Pacific trend at Station ALOHA confirms that our reconstruction is reasonable.

Outliers of the observed data in the reconstruction
The SOG method is basically a linear regression method, which is known to be sensitive to the outliers of the response Figure 9. The difference between the reconstructed summer pCO 2 and the observed underway pCO 2 (unit: µatm) in 2000, 2001, 2004-2009, 2012, and 2014-2017.  data. Some outliers, whether due to observational biases or extreme events, can cause a large change in the regression coefficients and hence the regression results and can even make the regression results outside the physically valid domain, such as negative pCO 2 values in the reconstructed data. Although we cannot conclude that the outliers of 3σ away from the mean in the observed data are due to data biases, we have decided not to use them in our reconstruction to avoid the unphysical reconstruction results. Table 3 shows the 14 outlier entries excluded from our response data for regression. These outliers are located in the region of 21. 25-23.25 • N, 113.25-116.75 • E. This region is near the Pearl River estuary. Thus, these extremely low pCO 2 values may result from the Pearl River plume where the observed pCO 2 can be very low. These very low values, such as those at least 3σ away from the mean, may cause a very large gradient in the observed pCO 2 . Our reconstruction has excluded these extremely low values influenced by the river plumes. Our reconstructed data may therefore overestimate the pCO 2 values in the Pearl River estuary and its nearby region.

Code availability
The R computer codes and their required files for this paper are freely available at the following link: https://doi.org/10.5281/zenodo.4567859 (Wang et al., 2021).

Data availability
The gridded underway sea surface pCO 2 data, the remote-sensing-derived sea surface pCO 2 estimates, and the reconstruction result data are openly and freely available at PANGAEA at the following link: https://doi.org/10.1594/PANGAEA.921210 (Wang et al., 2020).
This study has demonstrated the feasibility of using the SOG method to reconstruct the sea surface pCO 2 data into regular grid boxes. We compiled the observed underway and remote-sensing-derived sea surface pCO 2 data in the SCS in summer over the period of 2000-2017 and aggregated these data with a grid resolution of 0.5 • × 0.5 • for reconstruction. The SOG method based on the multilinear regression was applied to reconstruct the space-time complete pCO 2 field in the SCS. The method took the EOFs calculated from the remote-sensing-derived pCO 2 as the explanatory variables and treated the observed pCO 2 as the response variable. The EOFs reflect reasonably well the general spatial pattern of the sea surface pCO 2 in the SCS and reveal features affected by regional physical forcing such as the river plume and coastal upwelling in the northern SCS. As long as the in situ data provide a proper constraint to the EOFs, the reconstructed pCO 2 fields are, in general, consistent with the patterns of the observed pCO 2 and demonstrate relatively low values along the northern coast affected by the Pearl River plume and consistently high values in the ocean basin of the SCS. The leave-one-out cross-validation result validates our reconstruction with an RMSE smaller than the spatial standard deviation of the observed underway data in the same year. The grid-by-grid comparison of the reconstructed summer pCO 2 with the observed underway pCO 2 has an RMSE smaller than that of the remote-sensing-derived pCO 2 , as well as that of the neural-network-produced pCO 2 in the same year. Moreover, our reconstructed pCO 2 compares well with the pCO 2 calculated from observations around Station SEATS in the northern basin of the SCS. These comparisons confirm that our reconstruction is reliable. The temporal rate of our reconstructed sea surface pCO 2 over the SCS is consistent with the local rate at Station SEATS and the North Pacific rate at Station ALOHA, which further validates our reconstruction. These reconstructed pCO 2 fields provide full spatial coverage of the sea surface pCO 2 of the SCS in summer over a temporal scale of almost 2 decades and therefore help fill the long-lasting blanks on the global sea surface pCO 2 map. Thus, the reconstruction products will help improve the accuracy of the estimate of the oceanic CO 2 flux of the largest marginal sea of the western Pacific so as to better constrain the global oceanic carbon uptake capacity.
Although the SOG method can optimize the information from both the in situ data and the remote-sensing-derived data, the reliability of the reconstructed results is still limited by the observed data. When the observed data are limited to only a few grid boxes in a small region, the reconstruction results may not be realistic. Additional constraints have to be considered.
Author contributions. MD conceptualized and directed the field program of the in situ observations. BC and XG participated in the in situ data collection. YB provided the remote-sensing-derived data. GW, YC, and SSPS developed the reconstruction method, wrote the MATLAB and R codes, analyzed the data, and plotted the figures. ZW participated in the uncertainty analysis of the reconstruction and plotted some of the figures. HQ developed the data repository and revised and tested the R codes. GW and SSPS wrote the paper. All the authors contributed to the original writing, editing, and revisions of the paper.