The Berkeley Earth Land/Ocean Temperature Record

. A global land/ocean temperature record has been created by combining the Berkeley Earth monthly land temperature field with spatially-kriged version of the HadSST3 dataset. This combined product spans the period from 1850 to present and covers the majority of the Earth’s surface: approximately 57% in 1850, 75% in 1880, 95% in 1960, and 99.9% by 2015. It includes average temperatures in 1°x1° lat/lon grid cells for each month when available. It provides a global mean temperature record quite similar to records from Hadley’s HadCRUT4, NASA’s GISTEMP, NOAA’s GlobalTemp, and Cowtan and Way, and provides a 10 spatially complete and homogeneous temperature field. Two versions of the record are provided treating areas with sea ice cover as either air temperature over sea ice or sea surface temperature under sea ice, the former being preferred for most applications. The choice of how to assess the temperature of areas with sea ice coverage has a notable impact on global anomalies over past decades due to rapid warming of air temperatures in the Arctic. Accounting for rapid warming of Arctic air suggests ~0.1 ºC additional global-average temperature rise since the 19 th century than temperature series that do not capture the changes in the 15 Arctic. Updated versions of this dataset will be presented each month at the Berkeley Earth website (http://berkeleyearth.org/data/), and a convenience copy of the version discussed in this paper has been archived and is freely available at https://doi.org/10.5281/zenodo.3634713 (Rohde & Hausfather, 2020).

Both GISTEMP and GlobalTemp utilize NOAA's pairwise homogenization algorithm to detect and correct inhomogenities such as station moves or instrument changes in land stations (Menne and Williams 2009), though NASA applies an additional satellite nightlight-based urbanity correction (Hansen et al., 2010). GISTEMP and GlobalTemp both use NOAA's Extended Reconstruction Sea Surface Temperature (ERSST) version 4 (Huang et al., 2014) for SSTs, HadCRUT4 and Cowtan and Way use HadSST3 (Kennedy et al., 2011), and JMA uses COBE- SST (Ishii et al., 2005). HadCRUT4, GlobalTemp, and JMA include 40 no spatial interpolation outside of 5-by-5 latitude/longitude gridcells, while GISTEMP and Cowtan and Way spatially interpolate temperatures out to regions with no direct station coverage (GISTEMP using a simple linear interpolation technique, while Cowtan and Way uses Kriging).
Here we describe the global land/ocean surface temperature product from Berkeley Earth that combines the Berkeley Earth land 45 temperature data (Rohde et al 2013a;Rohde et al 2013b) with SST data from HadSST3 (Kennedy et al., 2011). It uses a Krigingbased spatial interpolation to provide the greatest possible spatial coverage for the period from 1850 to present. The land data utilizes significantly more land station data (over 40,000 stations) compared to the ~10,000 land stations used by some of the other groups. The land component also includes the novel homogenization technique of the Berkeley Earth temperature record that detects breakpoints through neighbor difference series comparisons, cuts land stations into fragmentary records at 50 breakpoints, and combines these fragmentary records into a temperature field. The ocean component of the land/ocean product uses an interpolated variant of HadSST v3, whose construction is described below. A version of this dataset has been publically available for some time, but has not been formally described.

Methods
The Berkeley Earth Land/Ocean temperature record combines the Berkeley Earth land record (Rohde et al 2013a) with SST data 55 from HadSST3 (Kennedy et al. 2011a, Kennedy et al. 2011b. The HadSST3 data is adjusted in several ways. The primary manipulation is to replace the gridded data with an interpolated field using a Kriging-based approach. The HadSST3 data set provides grid cell averages on a 5° by 5° grid and only reports monthly averages for cells where data was present during the month in question. HadSST3 often reports no data for ~40% of ocean grid cells. As described below, the interpolation produces a more complete field and reduces the component of uncertainty associated with incomplete coverage. While providing a more 60 complete field, the interpolation does not materially change the apparent rate of warming in the oceans. After interpolation, the ocean temperature anomaly field is merged with Berkeley Earth land anomaly field using the fraction of land / water in each grid cell (typically reported with a 1° by 1° latitude/longitude resolution). As described below, two versions are considered with respect to the role of sea ice. 65

Interpolation Method
The HadSST3 gridded fields provide several critical components, the temperature anomaly, the number of observations, and several estimates of the uncertainty (Kennedy et al. 2011a, Kennedy et al. 2011b). The grid cell uncertainties and observation counts allow one to treat some grid cells as having greater confidence than others. Unlike land surface station data, where each 70 monthly average represents many temperature observations, the ocean observation counts are a true measure of the number of instantaneous SST measurements. Analogous to Rohde et al 2013a, the core of the interpolation approach is to generate a Kriging-based field using an assumed distance-based correlation function. As with Rohde et al 2013a, a correlation-based approach is used rather than the more 75 common covariance-based approach to simplify the computational considerations, and should be adequate as long as the variance changes relatively slowly with changes in position. A review of both the HadSST data and climate model outputs suggested that the temperature to distance correlation function could be modeled effectively via the same spherical correlation function approach used for land surface temperatures: The empirically estimated distance parameter was found to have a value of 2,680 km based on the spatial variance of the HadSST monthly averages. This is similar to, though somewhat smaller than, the 3,310 km scale adopted in the land surface 85 temperature study (Rohde et al. 2013a). By contrast, the local correlation parameter 0 = 0.47 was estimated to be much lower in the oceans (compared to 0.86 on land). This is due to two factors. Firstly, ocean observations are individual measurements whereas land observations reflect monthly averages. Secondly, the typical monthly fluctuations in the oceanic environment are much smaller in than on land, causing a reduced signal-to-noise ratio. The estimation of 0 was based on a comparison of the variance in HadSST grid cells with a single measurement to those with > 100 observations. The latter condition provides a proxy 90 for cells where the random portion of measurement and sampling uncertainty could plausibly be neglected.  The distance correlation function gives rise to a Kriging formulation that: the effect of giving greater weight to grid cells with less uncertainty. For integer values of ( , ), the formulation of ( , ) is mathematically equivalent to having appear ( , ) independent times in the correlation matrix. Note also that any empty HadSST grid cells at time t are omitted from the matrix formulation for K.
120 is a free parameter at each time t and effectively represents the global ocean average temperature anomaly. Its value is found iteratively by insisting that the spatial average of ( , ) − = 0.
It is instructive to note that this Kriging formulation has the property that ( , ) → ( , ) in the limit that ( , ) → ∞, but will ordinarily produce a temperature estimate based on a weighted average of multiple HasSST grid points in the case that 125 ( , ) is small or moderate. The latter property can be useful in suppressing noise at grid locations with high uncertainty and/or very few measurements.
It is also important to recognize that that though the correlation function ( ) has a very long tail, this does not mean that average necessarily extends over a large area. In general, the Kriging coefficients ( , , ) constructed in this way will heavily 130 favor the nearest several data points. As long as nearby data is available, little weight will be given to distant grid cells.
However, the long-tail of the correlation function means that the Kriging will attempt to fill large holes using distant data if no nearby data is available.
An absolute value field was also created by applying a similar interpolation to the HadSST climatology. 135 latitude of x. It is described as a piece-wise cubic spline with 11 knots as free parameters equally spaced in the cosine of latitude.
These free parameters are chosen to minimize the spatial average of ( , ) − ( , ). By construction, ( , ) = ( , ) for all , and this construction merely provides a way of interpolating between grid cell centers.

145
In addition to the above description, a physical cutoff was applied to the absolute temperature ( , ) + ( , ) at a fixed minimum temperature of -1.8 C, which is freezing temperature of seawater. If the interpolation would suggest a value lower than this, ( , ) was adjusted accordingly to maintain the minimum value of -1.8 C. Such adjustments are rare.
Finally, one last interpolation is performed using an assumption of temporal persistence. Unlike land temperature anomalies, 150 where the temporal correlation is often only a couple weeks, ocean temperature anomalies typically have a temporal correlation measured in months. This can be exploited to estimate ocean temperatures based on adjacent months when no other information is available.
Here, t-1 and t+1 refer to the temperature field one month earlier and one month later, respectively. This adjustment allows for a modest reduction in uncertainty at early times when data is temporally sparse.

165
As described, this analysis is agnostic about the resolution used to sample the final temperature field. In practice, we generally use the same 15984-element equal-area grid as Rohde et al. 2013a to calculate ( , ), though with non-ocean elements masked out.

Ocean Uncertainty
The ocean-average uncertainty in our ocean reconstruction is estimated following essentially the same model as adopted by 170 HadSST3. HadSST3 estimates the total reconstruction uncertainty as the combination of measurement uncertainty, coverage uncertainty, and bias uncertainty (Kennedy et al. 2011a, Kennedy et al. 2011b. Bias uncertainty, , which reflects biases created due to variations over time in the ways that SST has been measured, is brought forward essentially unchanged by our analysis process ( Figure 2). Due to its slowly varying nature, this uncertainty remains the most important limitation of the detection of long-term averages. 175 The coverage uncertainty, , is the uncertainty in the large-scale average arising due to incomplete sampling of the spatial field. As with HadSST3, our estimate of the coverage uncertainty is constructed by sampling a known field, applying our interpolation procedure, and seeing how well we reproduce the underlying average of the known field. Following HadSST3, we used the SST fields provided by HadISST v2 as our target. The HadISST fields are spatially complete, observation-based 180 historical reconstructions of SST and sea ice concentration (Titchner and Rayner 2014). To estimate the coverage uncertainty associated with a specified HadSST sampling field, we mask every month of the HadISST dataset using that sampling field, interpolate the remaining data, and measure the error in the interpolated average relative to the true ocean-average of the whole HadISST field. The deviations in the ocean-average are then collected across all HadISST months and the uncertainty for that coverage mask is reported as the root-mean-square average of the deviations. Using this technique, which is directly analogous 185 to the HadSST3 coverage assessment technique, we estimate that the application of our interpolation approach typically reduces the coverage uncertainty by 20-40% (Figure 2). vary on timescales shorter than a month and spatial scales smaller than a grid box. Though interpolation does not change the underlying uncertainty associated with individual measurements, by adjusting the weight of individual observations in the overall average, we affect the way that individual measurement errors propagate into the global average. In particular, in the presence of sparse data, limited measurements may be extrapolated over a large area. In some circumstances, this can cause the effective uncertainty in the global average due to these uncertainties to increase. In essence, the interpolation may trade improvements in 195 coverage uncertainty against a greater impact for measurement uncertainty. This largely limits our ability to reduce the overall uncertainty by interpolation.
The impact of measurement uncertainty on a large-scale average depends on the error correlation. If the measurement uncertainties were uncorrelated, then the error would generally be expected to decline with the square root of the number of 200 measurements. In actuality, the measurement uncertainties are frequently correlated. In most cases, single ships report many measurements per month. Each of those measurements can have both random errors and a potential for systematic bias. We find that the measurement uncertainty reported by HadSST3 in the ocean-average is typically ~2.1 times larger than , with some variation over time.

220
We use this estimate as a benchmark to approximate the effect of error correlation on our analysis of measurement uncertainty. The total uncertainty in the ocean-average is then found by assuming the components are independent. 230 In the early part of the time series, we find that interpolation does significantly reduce the uncertainty in the ocean-average. At late times, though coverage uncertainty is improved, bias uncertainty plays a large role and the total uncertainty in the oceanaverage is little changed from the HadSST values. However, even if the ocean-average uncertainty is not changed, some users 235 will nonetheless benefit from having a more spatially complete interpolated SST field.

Land and Ocean Combination
The combined field is constructed by merging the Berkeley Earth Land Surface temperature with the interpolated SST field described above. Two versions are considered that differ only in their treatment of sea ice, using either the land air temperature (LAT) or the SST field to estimate the temperature anomaly at sea ice locations. From 1850 to near-present, the sea ice locations 245 are estimated using the ice concentration fields in HadISST v2 (Titchner and Rayner 2014).
To combine LAT and SST data, both data sets are expressed on the same grid. To simplify the combination at cells that are partland and part-ocean, we have taken to adding in the spatial climatology and doing the combination in absolute temperatures.
Where ( ) is the fraction of the grid cell at location x that is land, and and are respectively the LAT as estimated by 255 Rohde et al. 2013a and the interpolated SST as described above.
In the case where sea ice regions are treated as land: Where ( , ) is the ice fraction at location x at time t as reported by HadISST v2 (Titchner and Rayner 2014). For this purpose, HadISST is also regridded on to the same grid as LAT and SST. As HadISST is frequently delayed by a few months compared to other climate data, it is necessary to supplement this data set when producing near real-time estimates. For this purpose, the 265 Sea Ice Index of the National Snow and Ice Data Center (Fetterer et al. 2017) is used for months that are not yet available in HadISST. The modern ice distribution in both HadISST and the Sea Ice Index are based on satellite observations; however, we found that the Sea Ice Index tended to have systematically more partial melting than HadISST. To maintain consistency, a distribution transform was applied to the sea ice fractions provided in the Sea Ice Index based on comparing the 2014-2018 ice fields in each dataset. 270 It is useful to note that regardless of whether one is using SST or LAT to estimate temperatures in association with sea ice, most such estimates involve a considerable extrapolation. In the case of LAT, for example, conditions over sea ice will usually be extrapolated from Greenland, Canada, Scandinavia and Russia. Whereas, when using SST, one extrapolates from rare SST measurements that may be far removed from the sea ice edge. Or, in the case that analysis of the sea ice regions is excluded 275 entirely, some methods are effectively substituting the ocean-average temperature anomaly.
It is our belief that the anomaly field generated by extrapolating air temperatures over sea ice locations is a more sensible approach to characterizing climate change at the poles. The air temperature changes over the sea ice can be quite large even while the water temperatures underneath are not changing at all. In particular, over the last decades Arctic air has shown a very 280 large warming trend during the winter.
Regardless of the approach used, the spatial climatology can then be calculated and removed (differing from the original only in cells with a mix of land and water/sea ice). Then the long-term trend in the climate can be computed using the spatial average of the anomaly fields. 285 Uncertainties for the combined record are calculated by assuming the uncertainties in LAT and SST time series are independent and can be combined in proportion to the relative area of land and ocean. In the case that LAT is used over sea ice, the uncertainties for both LAT and SST have to be slightly recalculated by assuming that the time varying mask * ( , ) is applied the relevant spatial averages in the uncertainty estimations described in Rohde et al. 2013a and in the SST section above. Doing 290 this adjustment causes a slightly increase in LAT uncertainty (due to the extrapolation over sea ice), and similar small decrease in SST uncertanty.

Results and Conclusions
The global mean anomalies obtained from the Berkeley Earth land/ocean temperature record are quite similar to other published records, as shown in Figure 3. With the exception of some short periods prior to 1880 and before and after World War 2, all four 295 other temperature records examined lie within the uncertainty envelope of the Berkeley Earth record. Differences around World War 2 relate primarily to differences in adjustments to ERSST v4 and HadSST3 sea surface temperature records during that period. Berkeley Earth has the highest trend of any temperature record examined for the period from 1880 to 2015, largely due to lower 305 surface temperature estimates prior to 1900. These differences are driven both by increased spatial coverage from the inclusion of additional land records and the spatial interpolation of both land and ocean records (which is absent in NOAA and Hadley records). Similarly, Berkeley Earth has among the highest warming rates in the recent period  due primarily to greater Arctic coverage (where warming was unusually rapid during that period). The other record that provides robust arctic interpolation, Cowtan and Way, also shows higher trends during this period. 310 From 1955 to present (after the availability of data in Antarctica), Berkeley Earth provides globally complete coverage via spatial interpolation, similar to NASA's GISTEMP and Cowtan and Way. This contrasts with HadCRUT4 and NOAA GlobalTemp which exclude any grid cells lacking station coverage or SST measurements. As shown in Figure 2, the patterns of spatial anomalies between the different groups tend to be quite similar, apart from differences due to spatial coverage or gridded 315 field resolution. When constructing a global surface temperature record, sea ice produces a challenging edge-case. The water temperature under sea ice is tightly constrained by the freezing point of water, and can only change with changes in sea ice cover. Air temperatures over sea ice are less well constrained, and can vary significantly over time. Whether areas with sea ice coverage are estimated using sea surface temperatures or surface air temperatures will have a notable result on the record. While most groups 325 (GISTEMP, Cowtan and Way) that interpolate temperatures over areas with sea ice cover use air temperatures, Berkeley Earth has provided both variants to allow researchers to select the series that best supports their needs. Both variants of the Berkeley Earth record are shown in Figure 5 as well as the HadCRUT temperature series for comparison; the use of SSTs under sea ice leads to lower warming trends in recent years, as it excludes air temperatures in parts of the Arctic that have been warming rapidly over the past two decades. 330

335
Figure 5 also aids in understanding the difference between Berkeley Earth and HadCRUT. The interpolated SST field adopted here has a nearly identical trend to the HadSST field, differing by less than 0.01 ºC / century. Part of the difference between Berkeley Earth's global temperature series and HadCRU is due to differences in the amount of warming estimated ot have occurred over land. This is the primary source of difference when comparing the Berkeley Earth series with SST at sea ice to the HadCRUT series (blue line in Figure 5). While this difference is not insignificant, the larger difference is due to the treatment of 340 the Arctic and the extrapolation of land temperature over sea ice areas (red line in Figure 5). Inclusion of the rapid warming above Arctic sea ice suggests the global average has increased an additional ~0.1 C during the last 100 years compared to estimates that do not include the changes in this region.
In addition to monthly temperature anomalies, Berkeley Earth produces monthly absolute temperature fields. A climatology field 345 is estimated via Kriging observations, using elevation as a factor in the kriging process over land. Both absolute temperature https://doi.org/10.5194/essd-2019-259

Data Availability 365
The Land/Ocean temperature product will be updated monthly on the berkeleyearth.org website, and is freely available for use to all interested researchers. A convenience copy of the dataset available at the time this paper was created has been registered with Zenodo and is available at DOI:10.5281/zenodo.3634713 (Rohde & Hausfather 2020).