A new merge of global surface temperature datasets since the start of the 20th century

Global surface temperature (ST) datasets are the foundation for global climate change research. Several global ST datasets have been developed by different groups in NOAA NCEI, NASA GISS, UK Met Office Hadley Centre & UEA CRU, and Berkeley Earth. In this study, a new global ST dataset named China Merged Surface Temperature (CMST) was presented. CMST is created by merging the China-Land Surface Air Temperature (C-LSAT1.3) with sea surface temperature (SST) data from the Extended Reconstructed Sea Surface Temperature version 5 (ERSSTv5). The merge of C-LSAT and ERSSTv5 shows a high spatial coverage extended to the high latitudes and is more consistent with a reference of multi-dataset averages in the polar regions. Comparisons indicated that CMST is consistent with other existing global ST datasets in interannual and decadal variations and long-term trends at global, hemispheric, and regional scales from 1900 to 2017. The CMST dataset can be used for global climate change assessment, monitoring, and detection. The CMST dataset presented here is publicly available at https://doi.org/10.1594/PANGAEA.901295 (Li, 2019a) and has been published on the Climate Explorer website of the Royal Netherlands Meteorological Institute (KNMI) at http://climexp.knmi.nl/select.cgi?id=someone@somewhere&field=cmst (last access: 11 August 2018; Li, 2019b, c).


Introduction
The long-term trend of the global mean surface temperatures (GMST) is one of aA common measure in observing the change of climate. change is the long-term trend of the Global Mean Surface Temperatures (GMST). Therefore, the biases of the observed surface temperature (ST) dataset, particularly especially the sampling bias of thehigh latitudes stations at high latitudes, has received 5 much attention in the past few years (Cowtan and Way, 2014;Jones, 2016;Li et al., 2017;Simonds et al., 2017;Huang et al., 2017a). As a basis for climate change research and a verification benchmark for other climatic data products, the The optimization and improvement of observational observed climate data isare a long-term task, as a reference base for climate change research and verification benchmark for other climatic data products.  Hansen et al., 2010), and Berkeley Earth Surface Temperature group land temperature (Berkeley; Rohde et al., 2013). While the three global ST series are the Met Office Hadley Centre and Climatic Research Unit Temperature version 4 (HadCRUT4; Morice et al., 2012), Merged Land-Ocean Surface Temperature (MLOST; Vose et al., 2012b), and Goddard Institute for Space Studies Surface Temperature Analysis (GISTEMP; Hansen et al., 2010).
All these datasets all indicated that the Earth has experienced a "warming hiatus" period over from 5 1998 to-2012., whichand this issue has been attracted manythe attention of manyfrom the researchers around the world. However, by analyzing the sea surface temperature (SST) and global ST in from the National Oceanic and Atmospheric Administration / National Centers for Environmental Information (NOAA/NCEI), Karl et al. (2015) suggested that the "warming hiatus" is due to the artifact of the data processing. SimilarlyBesides that, Lewandowsky et al. (2015) noted that this short-term warming trend 10 "hiatus" is a conditional statistical artifact and it is, not a real scientific fact. After correcting the sampling biases of the temperature data in over the Arctic region, from several few studies reached the similar conclusion by using reanalysis data (Simonds et al., 2017), satellite remote sensing data (Cowtan and Way, 2014), and Arctic buoy data , their results have come to a similar conclusion. 15 These global ST data products have been updated over the past fewseveral years since the publication of IPCC (2013). For instance, the NOAA has updated the Extended Reconstructed Sea Surface Temperature (ERSST) version 3 to ERSSTv4  and ERSSTv5 , updated LSAT dataset GHCNm v3 to GHCNm v4 (Menne et al., 2018), and renamed MLOST to NOAA Global Surface Temperature (NOAAGlobalTemp). The GISTEMP has been updated its SST component to ERSSTv5 (Huang et al., 2017b). While CRUTEM has been updated to CRUTEM4.6. The Met Office has updated the Hadley Centre SST to version 3 (HadSST3) using the median of 100 ensemble members. And lastly, tThe Berkeley team uses used the median of the HadSST3 ensembles of 5 HadSST3 to form the Berkeley Earth Surface Temperature (BEST) dataset.
The products' updates of these products are based on the advanced knowledge of data analysis methodology or improved data availability. In general, the GMST has continuously been improved by the increased number and area coverage of observational data over the land (LSAT) and in the oceans (SST). There are two aspects to improve the LSAT datasets: . The first Firstly, is to increase the density 10 of stations and data coverage especially in the key areas with sparse observations. For example, the number of observations is increased in both C-LSAT (Xu et al., 2018) and GHCNm v4 (Menne et al., 2018) using a newly released International Surface Temperature Initiative (ISTI) datasets (Thorne et al., 2011) or datasets through regional cooperation with Asian countries such as Vietnam and South Korea.
Coverage of datasets increases with larger number of observations and hence reducing the sampling 15 biases,The larger number of observations increases the coverage of datasets and therefore reduces the sampling biases, especially particulary at for high latitudes area (Polar Regionsregion) and in the observation-sparse regions (such as South America and Africa). Next, The the second aspect is to improve improving the accuracy of regional climate changes. For example, the latest C-LSAT (Xu et al., 2018) has integrated more regional homogenization results, especially in over the China Xu et al., 2013), East Asia, Europe, Australia (Trevin, 2013, and Canada (Vincent et al., 2012).
On the other hand Similarly, there are also two aspects to improve the SST datasets: (1) integration of much better raw observational data and; (2) replacing a single analysis to multi-member ensemble 5 analyses. For instances, ERSSTv5 is using the most recently available International Comprehensive Ocean-Atmosphere Data Set release 3.0 (ICOADS R3.0; Freeman et al., 2017), optimized climate modes and more accurate buoy data in adjusting the ship data. Meanwhile, HadSST3 introduces a variety of bias correction models and median SST of the 100 ensemble members was used as the best estimation. 10 The first is to integrate better raw observational data. For example, the ERSSTv5 uses the most recently available International Comprehensive Ocean-Atmosphere Data Set release 3.0 (ICOADS R3.0; Freeman et al., 2017), uses more accurate buoy data to adjust ship data, and uses optimized climate modes. The second is to replace a single analysis with multi-member ensemble analyses. For example, HadSST3 introduces a variety of bias correction models and uses the median SST from the 100 15 ensemble members as the best estimate.
Among all the existing global ST datasets (e.g.,for example, HadCRUT and NOAAGlobalTemp), the merging methods on in combining the land and ocean datasets are basically is very similar to each other. The merging process of HadCRUT includes: First, the land and ocean data are were processed into 100 ensemble datasets according to the bias evaluation parameters, and in which the anomaly values are calculated separately for each grid box separately. Then, anomalies in the grid boxes of the land and ocean boundary are weighted by the fraction of land and ocean areas. If the land area covers less than 25 %, it is calculated as 25 %. If there is a measured SAT anomaly in a grid box covered with 5 sea-ice, The SAT will be used to represent the SST anomaly. The HadCRUT ensemble datasets has a reference period of range from 1961-to 1990 and with a resolution of 5° × 5° (Morice et al., 2012).
There are three steps of merging process in NOAAGlobalTemp. includes three steps: The first step is to identify First is the identification of the LSAT/ (or SST) low frequency changes by calculating the moving average of temperature anomaly data, followed by and the identification of the LSAT (or/ SST) 10 residual high frequency changes via the Empirical Orthogonal Teleconnection (EOT) modes. Then, the low-and high-frequency components are were integrated together. Finally, average the SST data at 2° × 2° resolution is averaged into the grid at 5° × 5° resolution, and the land and ocean reconstructions are were then merged into a global reconstruction similar to HadCRUT (Vose et al., 2012).
This study presents a new merged global ST dataset based on the recently developed C-LSAT and 15 the latest ERSSTv5 using a method which is similar to the HadCRUT and NOAAGlobalTemp, which provides providing a new reference to the climate or /climate change studies. The remainder of this paper is arranged into different sections as below as follows.: The land and ocean datasets and their updates are briefly introduced in section 2.; the The merging process of CMST is given in section 3; .Section 4 discussed the comparisons of CMST with other existing ST datasets. are discussed in section 4; the The availability of the resulting dataset  is reported in Sectsection. 5; and a summary of results are presented in section 6.

Land surface air temperature data
The C-LSAT1.0 dataset (Xu et al., 2018) processed the SAT data since 1900 from a total of 14 data sources, these including three global data sources (CRUTEM 4.6, GHCNv3, and BEST), three regional data sources from Scientific Committee on Antarctic Research (SCAR), Daily daily dataset for European Climate Assessment (ECA&D), and Historical Instrumental climatological Surface Time 10 series of the greater Alpine region (HISTALP), and eight national data sources from China, America, Russia, Canada, Australia, Korea, Japan, and Vietnam. Two steps have beenwere taken to ensure the homogeneity of the station time series: First(1), the data series from the existing national homogenized datasets have beenwere directly integrated into C-LSAT without any change, which are is approximately 50 % of the stations in C-LSAT;. Second,(2) the inhomogeneities in the rest of the station series have 15 beenwere detected and adjusted with the penalized maximal t-test method (Wang et al., 2007).
The C-LSAT version 1.3 dataset is used in this study. Compared with to the C-LSAT version 1.0 range from 1900 to 2014 (Xu et al., 2018), the data in version 1.3 have beenis updated to December 2017. According to Xu et al. (2018), national, regional and global datasets are ranked as higher, middle and lower priorities, respectively. Based on the priority of the data resources, a total of 4917 high priority stations with higher priority are were added, and while total of 1364 low priority stations with lower priority are were deleted. Most of the newly added raw data are were obtained from the 5 International Surface Temperature Initiative (ISTI) Projects, and have been homogenized through the same approach as Xu et al. (2018). The distribution of these extra 3553 stations is shown in Figure 1.
According to Xu et al. (2018), the C-LSAT version 1.0 had some advantages over the existing global LSAT datasets in station numbers and spatial coverage. ThusThus, the current C-LSAT version 1.3 has more station numbers than the existing datasets in many regions over the global land surface. Figure 1 10 shows the extra stations compared with to version 1.0 and Table 1 shows the comparison of the station numbers for different datasets, indicating an enhanced coverage and distribution/sampling of LSAT observations.
From a global and hemispheric perspective, the C-LSAT version 1.3 dataset has more stations than the other datasets in the Global and Southern Hemisphere (Table 1). Besides that, C-LSAT also have the largest stations number For for the seven regions -in Asia, Africa, Australia, South America, Europe, Antarctic, and Arctic as defined in Xu et al. (2018)., C-LSAT also have the largest stations number.
The only exception happens was in North America, where BE has the most stations number. But 5 However, in for BE dataset, the stations from North American account for 85.7 % of those from the Northern Hemisphere, which meansthis meaning that the stations from other parts of the Northern Hemisphere is only 14.3 %. While in for C-LSAT dataset, stations from North America account for 30.7 % of the Northern Hemisphere, and those from other parts of the Northern Hemisphere account for 69.3 % (Figures 2a and 2b). Furthermore, when the number of effective grid boxes in 5° × 5° grid 10 containing observations are calculated between 1900 and 2017, we find noticed that C-LSAT has more effective grid boxes even though although the Berkeley dataset has more stations number, C-LSAT has more effective grid boxes. In other words, although the Berkeley's stations number in the Northern Hemisphere is slightly higher than C-LSAT, the later one has better data coverage in the whole Hemisphere ( Figure 2c).

Sea surface temperature data
Currently, the following SST datasets are widely used in the corresponding community: HadSST3, ERSSTv5, Hadley Centre Sea Ice and Sea Surface Temperature dataset version 1 (HadISST1), and Centennial in situ Observation-Based Estimates of sea surface temperature version 2 (COBE2). The 20 HadSST3 was derived from ICOADS R2. 5 (1850-2006) and GTS (2007-present) observations . The ERSSTv5 dataset was developed by the NOAA NCEI, whose where their data sources include ICOADS R3.0 SST data (including ships and buoys), near-surface Argo buoy data, and HadISST2 sea ice data (Huang et al., 2017a). The HadISST1 was derived from the Met Office Marine Data Bank (MDB), supplemented by the ICOADS SST data where the MDB data were missing. 5 The two-stage narrowed space optimization interpolation method was used in HadISST1 to obtain the sea surface temperature dataset (Rayner et al., 2003). COBE2 was developed by the Japan Meteorological Agency (JMA), using the original SST data from ICOADS R2.5 and sea ice concentration data (Hirahara et al., 2014). A brief comparison between these datasets is shown in Table   2. 10 In general, only in in-situ observational data are were used when merging LSAT and SST for the commonly-used global ST datasets. For example, HadCRUT4 and BE used HadSST3 (the median of 100 ensemble datasets), whereas meanwhile the NOAAGlobalTemp and GISTEMP were usinguse ERSSTv5. Both HadSST3 and ERSSTv5 datasets use only in in-situ data only. Other datasets, such as COBE and HadISST that which is usinguse both in in-situ and satellite data, are were not used as a 15 source in the merging of global ST data, although they are frequently used in SST studies. Therefore, the HadSST3 and ERSSTv5 datasets are were selected and merged with the C-LSAT1.3. The twoWhile other two SST datasets with some satellite data previously merged (HadISST and COBE2) are were used for comparisons in this study.

Merging Schemes
Generally in previous studies,As in other studies, the global ST dataset is was merged with an LSAT and an SST dataset. In this study, C-LSAT1.3 is merged with HadSST3 and ERSSTv5 separately. 5 The final merged global ST dataset will be selected based on the comparison of the quality of the different merging schemes. These two SST datasets are reprocessed before the merging. Before the merging, those two SST datasets are reprocessed. The median of the 100-member ensemble datasets in HadSST3 are were calculated for each grid box . The ERSSTv5 has a value of -1.8 °C in many grid boxes in the Arctic and Southern Ocean, which refers to the areas where the sea ice 10 coverage is above 90 %. Therefore, some special treatment is needed for these grid boxes. If the anomalies are 0 °C and SSTs are -1.8 °C , then the value of -1.8 °C in ERSSTv5 will be replace with missing values. we replace these values of -1.8°C in ERSSTv5 with missing values. The reference periods for both HadSST3 and ERSSTv5 are were taken as 1961-1990.
The two merging schemes are described as follows: 15 (1) Merge1: C-LSAT1.3+HadSST3 (ensemble). Giving the resolution of both two datasets are 5° × 5°, these two datasets are were directly merged using the ratios of ocean and land surface areas in a specific grid box.
(2) Merge2: C-LSAT1.3+ERSSTv5. Since the resolutions of these two datasets are different, they are were unified onto the same resolution (1° ×x 1° resolution), and then merged using the ratios of ocean and land areas.
The merging process of C-LSAT1.3 and ERSST are described as follows: 5 (1) The anomalies are were calculated in each grid boxes in with respect to reference to the base period 1961-1990 base period for C-LSAT and ERSSTv5, respectively.
(2) For the ocean-land boundary part, the fraction of land and ocean areas is considered (see Figure   3, taking the January 2017 as an example). The detailed procedures are: (a) Downscaling Downscale the land (C-LSAT1.3) and ocean data to 1° ×x 1° resolution. The 10 resolution of the ocean data is 2° × 2°, which is distributed in 4 grids of 1° × 1°. The resolution of the land data is 5° × 5°, which is distributed in 25 grids of 1° × 1°.
(b) Using the ocean-land mask file to differentiate all grids in the worldglobally into land or ocean (download link: http://www.ncl.ucar.edu/Applications/Data/cdf/landsea.nc). The ocean-land mask file is based on Rand's global elevation and depth data, and the resolution of the ocean-land mask is modified 15 re-gridded to 1° × 1°. The ocean-land mask file contains five types of markers: 0 for ocean, 1 for land, 2 for lakes, 3 for islands, and 4 for ice sheets. Marine data are was used in parts of the ocean and ice sheets, and land data are was used in parts of land, lakes, and small islands.
(c) The 1° × 1° ocean grid data and the 1° × 1° land grid data are were spliced by the ocean-land mask to obtain 1° × 1° global ST grid data.
(d) The averaged surface temperature anomaly (STA) in each 5° × 5° grid is was calculated as: 5

Comparison of two merged schemes
Based on the above methods above, C-LSAT1.3 grid data is merginged with HadSST3 and ERSSTv5 data to form the C-LSAT+HadSST (Merge1) and C-LSAT+ERSST (Merge2) global ST datasets, respectively. In order toTo choose a better merging scheme in CMST, Merge1 and Merge2 are were compared in two aspects: spatial coverage and representativeness in high latitudes.

Global Coverage
The HadSST3 has not been interpolated, while the ERSSTv5 has beenwas interpolated by EOTs . Although the difference between the two in Coverage Max is not very large, the difference in Coverage 10 Means and Coverage Min between two merges is very large., which This suggests that the coverage is mostly smaller in Merge1 than Merge2. Therefore, although the original data coverage of HadSST3 and ERSSTv5 is similar with to each other, but with the interpolation of EOTs, the later increased its coverage greatly., Tthus from the perspective of overall coverage, the dataset Merge2 is superior to Merge1. (Figure 4). 15 Furthermore, Figure 5 shows the spatial coverage of the average temperature anomalies per over 20 years of for Merge1 and Merge2. The six panels in Figures 5a and Figures 5b correspond to the 20-year mean temperature anomaly distribution over 1900-1919, 1920-1939, 1940-1959, 1960-1979, 1980-1999 and 2000-2017, respectively. In the early of 20th century, it can beit is clearly seen that Merge1 lacked a large range of data in the equatorial region, the western region of the Southern Hemisphere and the high latitude zone of the Southern Hemisphere. In the middle of -20th century, Merge1 lacked so much data in the high latitudes of the Southern Hemisphere. Merge1 remained lacking data at the high latitudes of the Southern Hemisphere by the end of the 20th century. In contrast, 5 Merge2 exhibited data in global especially after 2000s. This is due to the rapidly increase in the number of observations from Argo5obs (Argo floats between 0-and 5-m depth) in between 2000 and to 2006.
Since 2006, the Argo5obs has maintained close to near-global coverage. In the high latitude region, the coverage of the Merge1 dataset is also smaller than that of Merge2, which may critically impact the assessment of climate over the Arctic. This is mainly because the spatial coverage of ICOADS R3.0 10 used in Merge2 is slightly higher than R2.5 used in Merge1, especially in the south of 60° S and north of 60° N . Therefore, the coverage of the Merge1 is clearly lower than that of Merge2, particularly in the equatorial region and Southern Hemisphere. Therefore, with respects to the spatial coverage of each period, Merge2 has a much better spatial coverage, especially in the early of 20th century.

Representativeness in high latitudes
To accurately compare the global and regional temperature changes between Merge1 and Merge2, we also introduce the COBE2 and HadISST1, which have satellite data integrated were introduced. First, the C-LSAT1.3 and COBE2, C-LSAT1.3 and HadISST1 datasets were merged in a similar way to form Merge3 (C-LSAT+COBE) and Merge4 (C-LSAT+HadISST) datasets. Second, the monthly temperature anomalies of Merge1-4 relatively to same baseline period ) are were calculated. The arithmetic mean of the four merged datasets was calculated for monthly temperature anomalies at each grid. As we know, each merging schemes would might have uncertainties caused by different SST 5 datasets, while the ensemble mean of all the merging datasets would could have the least uncertainties.
So Therefore, the annual mean time series was is calculated from the mean monthly temperature anomalies as a benchmark (reference series) for the two schemes. In summary, compared with to Merge1, Merge2 dataset is superior in terms of global coverage, spatial distribution and the temporal change with the reference series. The possible reason is that the ocean data used by the ERSSTv5 dataset are the latest ICOADS R3.0 data, whereas the ocean data used by the HadSST3 dataset are were obtained from ICOADS R2.5. Also, the ERSSTv5 data incorporate 15 with more observations (such as Argo5obs). Based on the above analysis above, Merge2 is was used as the final scheme in the later sections, which is named CMST (China Merged Surface Temperature) in the following sections.

Comparison of CMST with other existing datasets 4.1 Spatial Coverage
Spatial coverages may differ among the following products because due to the difference indifferent spatial smoothing or interpolation method are applied: .The HadCRUT4.6.0.0 is a non-interpolated observation dataset. NOAAGlobalTemp v4 is first interpolated by EOTs in both LSAT 5 and SST, and then masked based onaccording to the actual observation availability. GISTEMP v3 250 km-Smoothing (defined as GISTEMP1) is interpolated with a small scan radius. CMST is interpolated by EOTs in SST but no interpolation is applied in LSAT.
First, the monthly coverage is calculated by the ratio of the areas between valid grid boxes and total grid boxes in HadCURT4, NOAAGlobalTemp, CMST, and GISTEMP1 (Figure 7). Figure 7a   10 shows that the area coverage in CMST is larger than those in other datasets in aspects of Coverage Max, Coverage Min, and Coverage Mean,. pParticularly the Coverage Min in CMST is much larger than those in the other datasets ( Figure 7a). Second, the monthly coverage is averaged to obtain the annual average. Figure 7bIt is shown that the coverage of CMST is larger than those of the other three datasets at any time ( Figure 7b). Furthermore, the multi-year averaged coverage between 1900 and 2017 was 15 calculated, which is 76 %, 58 %, 71 %, and 70 %, respectively, in CMST, HadCRUT4, NOAAGlobalTemp, and GISTEMP1. In other words, the coverage in CMST is not only much larger than that in the dataset without interpolation (such as HadCRUT4), but also larger than those in theinterpolated dataset with interpolation (such as GISTEMP1 and NOAAGlobalTemp).
The reasons why the coverage of CMST is greater than those of the other datasets are as follows: The spatial coverage of land data (CRUTEM4) in HadCRUT4 is smaller than that of C-LSAT in CMST (Xu et al., 2018), and the spatial coverage of marine data (HasSST3) in HadCRUT4 is also smaller than that of ERSSTv5 in CMST. The higher coverage of marine data results from two aspects: (a) The ocean 5 data (ERSSTv5) used by CMST has additional sources of Argo data and uses using ICOADS R3.0 which containing more ship and buoy data. (b) The ocean data of HadCRUT4 has not been interpolated, while the ocean data used by CMST has beenwas interpolated. The spatial coverage of the land dataset (GHCNm v3) in NOAAGlobalTemp is less than that of C-LSAT in CMST. The spatial coverage of the marine dataset (ERSSTv4) is also less than that in ERSSTv5, as ERSSTv5 incorporated new ICOADS 10 data and added a decade of Argo floats data. Additionally, GISTEMP1 has the same land dataset as NOAAGlobalTemp so that Itsits coverage is less than that in CMST, and its marine dataset is the same as that of CMST. Therefore, the spatial coverage of GISTEMP1 is less than that of CMST.
It should be noted that, the data coverage of GISTEMP1 increases rapidly during the 1950s (Figure   7b), which is mainly due to the rapid increase in Antarctic (60° S --90° S; Figure 8b). As in CMST, the 15 station data of GISTEMP1 in Antarctic is mostly from SCAR (Hansen et al., 2010). The differences between these two datasets are that GISTEMP1 using the baseline period of from 1951-to 1980 while CMST was using the period of 1961-to 1990. So thatTherefore, GISTEMP1 reserved more short short-term stations within 1951-1980.
From Figure 7, we can see shown that HadCRUT4 and NOAAGlobalTemp have two minimum coverage in around 1918 and 1943/1944. However, CMST and GISTEMP1 do not have these minima.
Similar to the Section 3.3, we calculated the data coverage in five latitude zones and noticed that the data coverages of HadCRUT4 and NOAAGlobalTemp have the greater fluctuations in the latitude zones 5 of the 30° N --30° S and 30° S --60° S. To In order to find the latitude zone with the greatest impact on global coverage in 30° N --60° S, we divided these latitude zones into 20° N --10° S, 10° N --20° S, 0° --30° S, 10° S --40° S, and 20° S --50° S. It is found that the minimum value of 30°S-60°S coverage is the smallest, which has the greatest impact on global coverage. Therefore, the reason for small spatial coverage of HadCRUT4 and NOAAGlobalTemp is mainly due to the small coverage of 10 the latitude zone of 30° S --60° S.
Since the 30° S --60° S latitude zone is dominated by oceans, the change of ST coverage in the 30° S --6 0° S latitude zone is likely related to the change of SST coverage. This result is consistent with the study by Vose et al. (2012)., who Their study noted mentioned that from the early of twentieth century to the present day, the coverage of SST increased from 30 % to 70 % and the coverage of 15 marine data decreased significantly during the two World Wars. The decrease in the coverage of HadCRUT4 and NOAAGlobalTemp is very clear during the period of the two World Wars period. For CMST and GISTEMP, coverage is less affected during the two World Wars period because ERSSTv5 has been interpolated in many observation missing grid boxes.

Surface temperature trends
The study of Li et al. (2019) showeds that the recent global mean ST warming trend since 1998 derived from CMST was slightly increases slightly comparing with the existing datasets, anddatasets and is statistically significant. And In addition, it becomes closer among the newly developed global 5 observational data (CMST), remote sensed/Buoy network infilled dataset, and adjusted reanalysis data (Cowtan and Way, 2014;Simonds et al., 2017). Similar to Li et al. (2019) (Table 4).
Firstly, the ST trends in every region were compared. The temperature trend in the Northern Hemisphere high latitude is the largest (≥ 0.116 °C/decade), and) and becomes lower in the mid-latitudes of the Northern Hemisphere, the mid-latitudes of the Southern Hemisphere, and the low 15 latitudes. The lowest temperature trend foundis in the high latitudes of the Southern Hemisphere.
Secondly, the differences in the STs long-term trends of STs in different latitude zones are were compared. The temperature trends with largest difference occuroccurred in the high latitudes. At the high latitudes of the Southern Hemisphere, temperature trend is the highest in HadCRUT4 (0.114 ± 0.019 °C/decade), and is the lowest in NOAAGlobalTemp v4 (0.031 ± 0.011 °C/decade). The largest difference between the highest and the lowest temperature trends is 0.083 °C/decade. In the high latitudes of the Northern Hemisphere, the highest temperature trend is the highestwas found in GISTEMP2 (0.164 ± 0.014 °C/decade), and is the lowest in CMST (0.116 ± 0.012 °C/decade). The 5 maximum difference is 0.048 °C/decade. In between the middle and low latitudes, the biggest difference is was found in the low latitude (0.018 °C/decade).
Finally, the uncertainty range of the temperature trend of each dataset in different latitudes is were compared. The uncertainty of every dataset is very small in the middle and low latitudes, and the largest uncertainty was in the high latitudes. In the high latitudes of the Southern Hemisphere, the uncertainty 10 in CMST is the smallest. In the Northern Hemisphere high latitudes, the uncertainty in CMST is larger than that of BEST2 but smaller than other datasets. Thethe temperature anomalies showed a clear warming trend from 1900 to 2017. In theFor CMST, the highest temperature anomaly is 0.82 °C in 2016. There is a significant warming trend from the 1910s to the 1940s and from the 1960s to 2017. In contrast, there is a cooling trend of cooling from the during 1940s to the 1950s. These changes are were highly very consistent with the other datasets, and are related to the changes in of El Niño and La Niña events, volcanic eruptions, sea ice cover, and other factors (Simmons et al., 2017). Overall, the global ST changes in CMST and other datasets are similar over the period of 1900-2017. In the period overFrom 1920s to -1970s, CMST is slightly lower than other datasets, whereas HadCRUT4 is slightly higher than other datasets. The maximum difference in 5 between CMST andHadCRUT4 is in 1938 and1948, and 1908, 1909, and 1910  In the mid-latitude zone ( Figure. 5 In the high latitudes of the Southern Hemisphere ( Figure. 10d), the CMST is consistent with all the series derived from other datasets after 1960. There are many fewer less stations/grid boxes in the Antarctic/higher latitudes, hence and therefore larger variances were found before 1960.

Data Availability
The datasets used in CMST were derived from published data from theby NHMS (China, Russia, 10 USA, Canada, Australia, some Asian countries, etc.) or climate data research institutions (UK/CRU, NOAA/NCEI). Part of the data are exchanged from some countries or regions, and therefore will beis conditionally available to public. Details of the data sources are as follows: The C-LSAT we found: 1) The spatial coverage is become larger when C-LSAT1.3 and ERSSTv5 are merged., and It is smaller when C-LSAT1.3 and HadSST3 is are merged, especially particularly in the Polar Regions. And Besides that, the former (CMST) is also superior in terms of spatial distribution and the temporal change with the reference series (derived from average of mergeds of C-LSAT1.3 and four SST datasets).
2) The LSAT in CMST uses used a high-quality C-LSAT1.3. More than 6,000 stations are were 5 added to the previous version of C-LSAT1.0 (Xu et al., 2018), which has increased the data coverage.
The newly added stations are mainly from the ISTI dataset. The SST in CMST uses ERSSTv5 that uses using the ocean data from the latest ICOADS R3.0 and incorporates multiple types of observations.
Compared with other existing global ST datasets, the CMST increases the overall coverage over global land and ocean surface. 10 3) The time series in CMST in the global and mid-low latitudes are overall consistent with the other merged datasets at for both inter-annual and inter-decadal timescales. ThereforeTherefore, the CMST temperature trend of CMST from 1900 to 2017 is consistent with those of the other datasets. In the high-latitude zones where the differences of temperature trend isare usually large, the trend of CMST has the small uncertainty range, . which can enable This allow the researchers us to capture observe the 15 major climate changes in the high latitudes of the Northern and Southern Hemispheres. GYHY201406016), and the China Postdoctoral Science Foundation (Grant: 2018M640848). We thank many contributors who contribute to the establishment of this dataset.
Table1. Comparison of the station numbers of the LSAT dataset during 1900 -2017 (data length greater than> 15 years)