Local models reveal greater spatial variation than global grids in an urban mosaic : Hong Kong climate , vegetation , and topography rasters

The recent proliferation of high quality global gridded GIS datasets has spurred a renaissance of studies in many fields, particularly biogeography. However these data, often 1 km at the finest scale available, are too coarse for applications such as precise designation of conservation priority areas and species distribution modeling, or purposes outside of biology such as city planning and precision agriculture. Further, these global datasets likely underestimate local climate variation because they do not incorporate locally relevant variables. Here we describe a comprehensive set of 30 m resolution rasters for Hong 5 Kong, a small subtropical territory with highly variable terrain where intense anthropogenic disturbance meets a robust protected area system. The data include topographic variables, NDVI, and interpolated climate variables based on weather station observations. We present validation statistics that convey each climate variable’s reliability, and compare our results to a widely used global dataset, finding that our models consistently reflect greater climatic variation. To our knowledge, this is the first set of published environmental rasters specific to Hong Kong. We hope this diverse suite of geographic data will facilitate future 10 environmental and ecological studies in this region of the world, where a spatial understanding of rapid urbanization, introduced species pressure, and conservation efforts is critical. The dataset is accessible at https://figshare.com/s/3a5634e36e80dc33444c.


Distance to coast and water proximity
Water bodies adjacent to land areas can act as a temperature buffer, contribute to evaporative cooling (Lookingbill and Urban, 2003), and influence precipitation patterns (Heiblum et al., 2011;Paiva et al., 2011); therefore considering their distribution is important for climatic predictions.A distance to coastline raster was produced, using the distance() function in the raster package and a Hong Kong coastline shapefile.However, because of the complexity of Hong Kong's coastline, it appears simple distance to coast may not be the best representation of water proximity for climate predictions.Therefore, water proximity rasters at varying scales were also calculated, using a circular moving window approach similar to that described in other climate interpolation studies (Aalto et al., 2017).The radii used were 0.75 km, 1.5 km, 3 km, 6 km, and 12 km.A value of 1 means that the area within the given radius is entirely terrestrial, while 0 indicates it is entirely aquatic.

Urbanicity
In densely constructed areas, the urban heat island effect is expected to influence temperatures (Nichol et al., 2013;Shi et al., 2018).High rise buildings can influence temperature by blocking wind, creating shade, acting as heat sinks, and producing thermal pollution.These effects are particularly relevant for this study, as some of Hong Kong's weather observation stations are adjacent to or inside urban centers.To quantify the distribution of developed area, we used a 30 m resolution dataset of percent impervious surface (Brown de Colstoun et al., 2017), which we expect to strongly correlate with urban development.
However because bulk air temperature is not expected to vary at a granular (30 m) scale, this data was smoothed using a Gaussian moving window at three scales (sigma = 10, 50, 100) to create 'urbanicity' layers.

Climate variables
Climate interpolators are often faced with the challenge of estimating climate parameters over a large area with sparse weather station observations, at least in part of the region considered (e.g.Hu et al., 2016).In contrast, interpolation in Hong Kong is benefitted by a relatively small geographic area and a quite dense network of weather data provided by dozens of permanent weather stations.Here we use thin plate spline interpolation, which fits a curved surface to irregularly distributed points.
Weather station observation data and geographic coordinates were downloaded from the web portal of the Hong Kong Observatory (2018).As the goal was to produce a representation of long-term but modern climate, measurements over 20 years (1998 to 2017) were included.To ensure averages were reliable, weather stations were only included for interpolation of each variable if at least 8 years of complete data were available within the 20 year window.The minimum number of stations used for each model is provided in Table 2. Monthly observations of ten variables were obtained: maximum temperature, mean daily maximum temperature, mean daily temperature, mean daily minimum temperature, minimum temperature, mean dew point, mean relative humidity, mean wind speed, mean air pressure, and total rainfall.'northness' -the cross product of aspect and slope.The water proximity layers were products of combining multiple scales into fewer predictors: fine was the sum of 0.75 km, 1.5 km, 3 km scale rasters, while coarse was the sum of 6 km, and 12 km.All model predictors were tested for collinearity and no problems were found.All predictors were initially included, then pared down in each regression model using stepwise bidirectional selection based on AIC, using 4 degrees of freedom as a penalty to make predictor selection stricter than the default.The resulting model was used to calculate a climate value at each grid cell based on a linear relationship with the selected predictors.
Second, to adjust for local variation in climate that is not associated with topography, the linear model residuals at each station were calculated and interpolated using the thin plate spline approach implemented in the fields R package.The lambda smoothing parameter, which determines how closely the fitted surface matches input values, was set to 0.01.A fairly low lambda value was selected because of the relatively high confidence in the long-term averaged weather station values.This effectively produces a smoothed layer of local deviation from the linear model, which was used to additively adjust the results of the linear models and produce finalized climate rasters.
We measured the spatial predictive ability of models using ten-fold cross validation (Dobesch et al., 2007).In each validation round, 10% of weather stations were reserved as a test dataset and the remainder were used for training.Average root mean squared error of the test data subset from the final model prediction was used as an error measurement.To normalize these error measures across the climate variables, we adjusted them as a percentage of the standard deviation of the initial weather station values measured.
The monthly models were then summarized into raster layers that characterize yearly climatic means and variation.These include 19 "bioclimatic" variables using the biovars() function in the dismo R package (Hijmans et al., 2017), which are specifically suited for species distribution modeling and other ecological purposes.This also allows our data to be compared with other climate data products that use the same calculations.Because those calculations only use rainfall and average daily maximum and minimum temperatures in each month, we also produced yearly average layers of dewpoint, relative humidity, mean daily temperature, air pressure, and wind speed.Also provided are layers of highest and lowest average monthly extreme temperatures, and their difference (extreme temperature annual range).These two variables characterize temperature extremes experienced in a given location better than the bioclimatic variables.
For comparison with global climate data products, we resampled bioclimatic variables to the same (1 km) resolution as WorldClim using bilinear interpolation.Only pixels present in both data products were used for comparisons.areas, and is free of clouds.This was supplemented with an image from March 2018 after adjustment, so that all land areas of the region were included.NDVI calculations were completed using the standard equation:

Remote sensing data
Where NIR is near-infrared (Landsat band 5: 0.851 to 0.879 µm) and Red is visible red radiation (Landsat band 4: 0.636 to 0.673 µm).The resulting NDVI value varies between 1 and -1, where higher values correspond with denser vegetation.

Results and discussion
Results of this environmental analysis of Hong Kong include 48 rasters and one vector file.All rasters are provided at an identical 1 arc second ( 0.03 km) resolution and in the WGS84 geographic coordinate system.Summary values and filenames are provided in the data repository.

Topographic variables
Distance to coast results show that approximately 42% of Hong Kong's land area is within 1 km of the coastline.However it is apparent that inland areas often feature steep inclines, as half of Hong Kong's land is above 84 m elevation.
For variables like relative elevation, urbanicity, and water proximity, the ideal scale of raster calculation is dependent on the desired effect to be captured, and perhaps other characteristics of the landscape in question.For this reason, we provide the rasters at multiple scales.
Urbanicity results show that the majority of land in Hong Kong is not near urban areas, as the median raster value is below 4% urban at all scales calculated.It is also apparent that inferring urban development from impervious surface is not ideal, as sometimes bare soil or rock are sensed as impervious.Also, there is little ability for such a measure to differentiate between a dense urban core of high-rises, and large paved areas (such as parking lots or airports).Unfortunately, accessible data on the geographic distribution of the urban environment in Hong Kong is limited.For climate modeling, an urbanicity measure that takes into account building height or population density at a 30 m or finer scale could be preferable.

Climate variables
Minimally, 32,024 monthly weather station measurements over 20 years were used to construct climate models for all months and variables at finer resolution compared to global datasets (Fig. 2).High weather station density and availability of data on multiple candidate topographic climate-forcing factors allowed for high confidence in many climate variable models, especially those related to temperature (Figs. 3, 4).The climate interpolation results include monthly models of ten variables including temperature, precipitation, and humidity, making a total of 120 individual models produced (monthly models of three temperature variables are shown in Fig. 5).For all variables, the predictors included in monthly models are displayed in Figure 6, and the number of stations with data included is in

Temperature
Temperature was found to vary considerably across Hong Kong, with more than 6ºC difference in mean annual temperature between the highest mountain peaks (>900 m, <18ºC) and some low-lying urbanized areas (>24ºC).While mean and minimum temperature are highest in urban areas, maximum temperature shows a different pattern with a maximum in inland valleys in the northern New Territories.The high accuracy of temperature models is likely due to a strong association with elevation; elevation was by far the most commonly included predictor for temperature models (Fig. 6).Urbanicity was important for mean and minimum temperature, but not maximum temperature.Water proximity and coast distance were differentially included depending on the variable, while aspect*slope rarely had an effect.A few special notes on the limitations of predicted temperature values in dense urban areas: temperature predictions of our landscape-level climate models will not reflect the high spatial variation in temperature found in urban microclimates.Although the manned Kowloon HKO weather station is inside a densely populated area, as pointed out by Nichol and To (2012) it is still in a small parklike area surrounded by trees, and therefore is not representative of the most densely urbanized areas of Hong Kong.Other stations in urban areas are similarly near green spaces or otherwise open areas.Higher resolution (say 5 m or 1 m) studies of urban thermal distributions would strongly benefit from analysis of wind patterns, building height, thermal pollution, and other factors (e.g.Shi et al., 2018).
Therefore granular, ground-level temperatures in urban areas are likely substantially different than the broader air temperature values our models provide.One area of particular interest is the Hong Kong Airport, a massive area reclaimed from the ocean, north of Lantau Island.The weather station here has the highest urbanicity value, because the airport is mostly impervious surface.However it lacks properties of truly urbanized areas, with no permanent population lacking typical urban morphology.
Therefore climate variables in this area may be biased, especially for variables like wind speed that would be affected by the presence of tall buildings.The airport weather station often had the highest mean temperatures recorded, perhaps indicating that extensive impervious surface is more important than wind blockage or thermal pollution for maintaining high temperatures.

Rainfall
In our models, the highest annual rainfall (bio12) areas in Hong Kong (>2500 mm annually) are inland and at high elevations, presumably because of condensation from humid air as it passes over mountains.Areas near the coast, particularly small outlying islands and the eastern coast in Lung Kwu Tan receive the lowest amount of annual rainfall (<1600 mm).Precipitation of driest month (bio14) was uniformly low, ranging from 20 to 40 mm, but the relative pattern of high and low precipitation areas remained similar.The most commonly included model predictor was fine-scale water proximity.Elevation was predictive for 5 out of 12 months, but few other topographic predictors were useful.Seasonality of rainfall in Hong Kong is strong.
Averaged across all locations, 52% of total yearly rainfall was recorded in three months (June through August).Although rainfall models were informed by more weather stations than any other climate variable (Table 2), they have the highest relative standard error (Fig. 3) and therefore the lowest accuracy.Because they are influenced by both global and locally variable wind patterns, precipitation distributions are notoriously difficult to predict, especially in urban areas (Cristiano et al., 2017).Our relatively poor results may be explained by this, as well as lack of appropriate local predictors.We did not explore Zhuang, X. Y., and Corlett, R. T.: Forest and forest succession in Hong Kong, China. J. Trop. Ecol., 13, 857-866, 1997.
Spatial climate modeling consisted of two main steps.First, a generalized linear model was built for each climate variable for each month of the year.Six topographic climate predictors were used as model building candidates: elevation, log-transformed distance to coast, exponentially transformed fine and coarse water proximity, log-transformed urbanicity (sigma = 50), and Earth Syst.Sci.Data Discuss., https://doi.org/10.5194/essd-2018forjournal Earth Syst.Sci.Data Discussion started: 21 December 2018 c Author(s) 2018.CC BY 4.0 License.
Normalized difference vegetation index (NDVI) is a common metric of vegetation presence and density derived from satellite imagery.To calculate normalized difference vegetation index (NDVI), Landsat images (U.S. Geological Survey, 2018) of Hong Kong were obtained.We downloaded one image from March 2016 that covers much of Hong Kong except for the far eastern 6 Earth Syst.Sci.Data Discuss., https://doi.org/10.5194/essd-2018forjournal Earth Syst.Sci.Data Discussion started: 21 December 2018 c Author(s) 2018.CC BY 4.0 License.

Figure 1 .
Figure 1.Hong Kong geography.The three highest peaks in the territory, as well as the highest point on Hong Kong Island are marked.Areas protected as Country Parks are highlighted in green.

Figure 2 .
Figure 2. Comparison of average high of warmest month (bio5) model results for Hong Kong.(a) is from our newly interpolated climate models at 30 m resolution, while (b) is 1 km resolution data available as part of Worldclim 2(Fick and Hijmans, 2017).Not only is the resolution markedly improved, but also the temperature values are more varied, for instance on the large southern islands.

Figure 3 .
Figure 3. Adjusted r2 values of initial (pre-spline) regression models.Each boxplot includes 12 points, one for each monthly model.Temperature variation, especially mean temperature, was best explained by linear modeling, while rainfall was predicted the most poorly.

Figure 4 .
Figure 4. Relative magnitude of training and testing dataset errors, from 10 validation rounds of climate variable modeling.A value of 100indicates for that climate model, that the average difference between the value recorded at a given weather station and the value predicted by the model at that location, is equal to the standard deviation of the initial set of all values recorded at all weather stations for that climate variable.

Figure 5 .
Figure 5. Model results for three of ten interpolated climate variables.(a) Maximum temperature, (b) Mean temperature, and (c) Minimum temperature.

Figure 6 .
Figure 6.Regression predictors included in monthly models for 10 climate variables.Each predictor is represented by a different color.Minimum and mean temperature variables were most predictable, consistently including elevation and urbanicity.Rainfall patterns were most difficult, with the fewest predictors included.

Figure 7 .
Figure 7. Differences between results of this study and Worldclim 2 (Fick and Hijmans, 2017) values.(a) is average low temperature of coldest month (bio6), with red where the local model is warmer than WorldClim, and blue is colder.(b) shows annual precipitation (bio2),with blue where the local model predicts more rainfall than WorldClim, and tan is less rainfall.Our model results were resampled to 1 km resolution using bilinear interpolation to allow for these comparisons.

Figure 8 .
Figure 8. NDVI class composition over Hong Kong's elevational range.The majority of land area near sea level is below NDVI 0.1, while Hong Kong's highest elevation areas are between 0.1 and 0.2, indicating short vegetation.The elevation range with proportionally the most dense vegetation (0.4 to 0.5 NDVI) is 300 to 400 m.
review for journal Earth Syst.Sci.Data Discussion started: 21 December 2018 c Author(s) 2018.CC BY 4.0 License.

Discussions
Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 21 December 2018 c Author(s) 2018.CC BY 4.0 License.

Discussions
Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 21 December 2018 c Author(s) 2018.CC BY 4.0 License.

Discussions
Manuscript under review for journal Earth Syst.Sci.Data Discussion started: 21 December 2018 c Author(s) 2018.CC BY 4.0 License.