Articles | Volume 11, issue 3
Peer-reviewed comment
22 Jul 2019
Peer-reviewed comment |  | 22 Jul 2019

New 30 m resolution Hong Kong climate, vegetation, and topography rasters indicate greater spatial variation than global grids within an urban mosaic

Brett Morgan and Benoit Guénard

The recent proliferation of high-quality global gridded environmental datasets has spurred a renaissance of studies in many fields, including biogeography. However, these data, often 1 km at the finest scale available, are too coarse for applications such as precise designation of conservation priority areas and regional species distribution modeling, or purposes outside of biology such as city planning and precision agriculture. Further, these global datasets likely underestimate local climate variations because they do not incorporate locally relevant variables. Here we describe a comprehensive set of 30 m resolution rasters for Hong Kong, a small tropical territory with highly variable terrain where intense anthropogenic disturbance meets a robust protected area system. The data include topographic variables, a Normalized Difference Vegetation Index raster, and interpolated climate variables based on weather station observations. We present validation statistics that convey each climate variable's reliability and compare our results to a widely used global dataset, finding that our models consistently reflect greater climatic variation. To our knowledge, this is the first set of published environmental rasters specific to Hong Kong. We hope this diverse suite of geographic data will facilitate future environmental and ecological studies in this region of the world, where a spatial understanding of rapid urbanization, introduced species pressure, and conservation efforts is critical. The dataset (Morgan and Guénard, 2018) is accessible at

1 Introduction

Scale of analysis has long been considered a key concern in biogeographic research (Levin, 1992). Multiple types of scale are relevant to environmental data, including analysis grain, response grain, spatial structure, and study extent (Mertes and Jetz, 2018). Analysis grain, the minimum unit of spatial resolution in a spatial grid, is commonly referred to as a pixel or cell. In research that uses environmental raster data, the pixel size directly dictates the types of biogeographic questions that can be reasonably addressed.

This relationship between analysis grain and study suitability is complex, and higher resolutions are not always advantageous. For example, in global analyses excessively high-resolution data would be computationally cumbersome and unnecessary if the goal is to characterize broad patterns. However, as shown below, many studies have found notable benefits of higher-resolution climatic predictors. Unfortunately, regional analyses lacking local data are limited to using global datasets and the grain size at which they are available (e.g., Cheng and Bonebrake, 2017).

Species distribution modeling (SDM) is a common application of gridded environmental data, where the selected analysis grain has important consequences. In SDM, one or more geographic predictors are associated statistically with the location of known observations of a species (Peterson et al., 2011). The resulting statistical model can be converted to a geographic model: a spatially continuous measure of species occurrence likelihood across the landscape of interest. SDMs are used for many applications, including predicting potential ranges of invasive species, characterizing ecological constraints on species ranges, discovering biodiversity, and planning protected areas (Peterson et al., 2011). The effects of SDM grain size manipulation are an active area of research. Below, we summarize findings on four main effects: estimated distribution size, inclusion of fine-scale features, predictor variable selection, and model predictive ability.

Coarser environmental data consistently result in SDMs that predict larger areas of species presence (Connor et al., 2018; Franklin et al., 2013; Seo et al., 2009). Overestimation of SDMs is especially a concern for conservation purposes, where inferred size of suitable habitat is often used to inform extinction risk assessments. Mistakenly large calculated distributions could result in species that are assigned artificially low risk levels.

Coarse-resolution predictors can cause SDMs to omit small but important areas. Particularly of interest are microrefugia, climatically unique patches of land that can harbor rare species and are especially important for conservation as species distributions respond to climate change (Dobrowski, 2010). Meineri and Hylander (2017) demonstrated that because high-resolution climate models included such microrefugia, the resulting species distribution models predicted lower extinction rates for plant species than coarser predictors. Nezer et al. (2017) found that 10 or 100 m resolution SDMs can reveal other distribution features invisible at lower resolutions (1 km): movement corridors, isolated habitat patches, geomorphologic features, and anthropogenic effects on distributions.

SDM scale can also affect which predictors are selected for model calculation. Certain predictors may be excluded in SDMs because they lack explanatory power at the chosen scale of analysis (Mertes and Jetz, 2018). For example, vegetation measures like the Normalized Difference Vegetation Index (NDVI) in fragmented forests are unlikely to be relevant if the grain size is much larger than the forest patch size, because each grid cell will be a single averaged value. This means that coarse models might not only mischaracterize the distribution pattern itself, but they also may fail to explicate important environmental relationships that determine species occurrence. Indeed, Nezer et al. (2017) found that the most important predictors (vegetation, slope) in their highest-resolution models (10 m) were “early meaningless” at 1 km resolution. Another study found similar differences in predictor importance related to variation in scale (Lasseur et al., 2006). Of course, predictor importance is always relative and thus is subject to which predictors are included in model building. Therefore this pattern is not expected to be observed in all studies, but should not be overlooked as a potential source of bias.

Last, any consistent effects of SDM grain size on the overall predictive ability of SDMs are unclear. The most commonly used measure of SDM performance is area under curve (AUC), where a higher value indicates a greater ability to differentiate between area where the species is present or absent. Some studies found increased SDM resolution resulted in increased AUC (Seo et al., 2009; Nezer et al., 2017), while others found no effect (Pradervand et al., 2014) or mixed effects depending on dataset (Guisan et al., 2007). These studies used different species, predictors, scales, regions, and modeling algorithms, so further research is required to investigate any association between SDM grain size and AUC.

The above advantages of higher-resolution environmental data in SDM may be dependent on project-specific factors, such as the quality of species records available and the goals of the research. For example, using environmental grids of a smaller grain size than the locational accuracy of the available species records is untenable. Additionally, stationary species (e.g., lichens) may be more strongly affected by local factors, while highly mobile species (e.g., birds) may only be limited at broader scales. Indeed, it has been shown that plant (rather than bird or mammal) species models with the highest locational accuracy were those most improved by higher resolution (Guisan et al., 2007). Lastly, the utility of fine-grain environmental grids can depend on habitat; flat deserts may have less biologically relevant fine-scale spatial variation compared to mountainous forests or tropical areas fragmented by human activity, like Hong Kong.

In this study, a new series of rasters for Hong Kong is introduced that is particularly suited for SDM. The layers produced focus on long-term climate averages, topography, and vegetation. We asked how the new 30 m scale rasters provide new information on climatic variables in Hong Kong in comparison to a global dataset already available. We hypothesize that our new climate data will indicate greater variation (measured as raster standard deviation) in climate variables. The development of high-resolution environmental rasters is particularly important in tropical regions where species exhibit small distribution ranges (as predicted by Rapoport's rule: Stevens, 1989) and where understanding interactions between organisms and their changing habitats is paramount.

2 Study area: Hong Kong

Geographic data of appropriate resolution are critically important for conducting research within the Hong Kong Special Administrative Region of China, because of its complex landscape. Hong Kong exhibits dramatically variable topography, fitting numerous small islands, dozens of mountain peaks over 500 m, 733 km of coastline, and a human population of over 7 million into a land area of only 1104 km2 (Fig. 1). Seasonally variable monsoon winds deliver equatorial heat and torrential precipitation in summer, while northerly winds carry chilly dry air from continental Asia during the winter (Dudgeon and Corlett, 1994). However, daily temperature fluctuations are attenuated by the surrounding South China Sea and Pearl River estuary. Hong Kong's terrain typically exhibits a stark bifurcation between some of the most densely constructed areas in the world (Lau and Zhang, 2015) and steep, vegetated slopes. Uninhabited expanses are protected as part of 24 Country Parks and additional special areas that cover over 40 % of the territory's land (Agriculture, Fisheries and Conservation Department, 2017). Even within these more natural areas, a strong disturbance gradient encompasses grasslands, shrublands, evergreen secondary forests, and old-growth feng shui woods that have been protected from deforestation. Historically Hong Kong has been largely stripped of its trees, and only since the end of World War II and later the establishment of the Country Park system have large swathes of forest begun to regenerate (Zhuang and Corlett, 1997). However, this process is frequently reset by human-induced hill fires, which maintain predominantly upland areas as shrubland or grassland (Marafa and Chau, 1999). Hong Kong harbors several unique and restricted habitats, including mangroves in coastal areas and freshwater wetlands in the far northwest.

Figure 1Hong Kong geography. The three highest peaks in the territory as well as the highest point on Hong Kong Island are marked. Areas protected as Country Parks are highlighted in green.

Hong Kong climate data are available within a variety of global gridded climate datasets (WorldClim 2 – Fick and Hijmans, 2017; MerraClim – Vega et al., 2017; CHELSA – Karger et al., 2017), but none of these has a resolution higher than 1 km. We suspect those global climate models underestimate variation in local climate values, even after consideration of the coarser scale. Local studies of Hong Kong meteorology have largely focused on characterizing and mitigating the effects of urbanization (e.g., Shi et al., 2018; Wang et al., 2018; Nichol et al., 2014; Liu and Zhang, 2011; Ng, 2009; Giridharan et al., 2004). Unfortunately, it appears the climate of Hong Kong's landscape as a whole has been given little notice, and we are unaware of long-term averaged climate rasters available for the region. Relevant studies that do exist include limited variables, and the data appear to be publicly unavailable. We are additionally unaware of Hong Kong data publicly available for vegetation indices such as the NDVI or topographic data other than elevation. Therefore Hong Kong is in dire need of a comprehensive suite of accessible environmental geographic information system (GIS) data, at a resolution finer than 1 km, suitable for species distribution modeling and other local applications. To this end, we developed new, 30 m resolution rasters of topography, NDVI, and 10 interpolated climate variables for each month of the year.

3 Methods

All data manipulation and geographic analyses were conducted in the R statistical computing environment (v3.3.2, R Core Team, 2016) using RStudio (v1.0.136, RStudio Team, 2015) unless otherwise noted. Analyses are divided into three broad categories of data products, detailed in the sections below: topographic variables, climate variables, and remote sensing variables. The variables developed were selected based on their utility in environmental research, especially SDM, as well as the availability of appropriate source data. An overview schematic of the data workflow is available in Fig. S1 in the Supplement.

3.1 Topographic variables

Data on the physical characteristics of Hong Kong's landmass were assembled from remote sensing inputs, crowdsourced coastline polygons, and a digital terrain model. The topographic variables developed are coastline, elevation, slope, aspect, terrain roughness, relative elevation, distance to coast, water proximity, and urbanicity.

3.1.1 Coastline

As reclamation of land from the ocean in Hong Kong is ongoing, obtaining current data for the coastline can be challenging. Natural coastline and reservoir vectors were downloaded from OpenStreetMap (2018) and merged in QGIS (v3.01, QGIS Development Team, 2018) to produce a shapefile of polygons representing Hong Kong land area as of January 2018. All output rasters were masked to this area.

3.1.2 Elevation, slope, aspect, and roughness

A 5 m resolution Hong Kong digital terrain model (Lands Department, 2017) was upscaled using bilinear resampling. The resulting 30 m digital elevation model (DEM) was used as the elevation data throughout the study. Four other topographic predictor layers were derived directly from this DEM: aspect, slope, aspect × slope, and a roughness index. These were calculated using the Hong Kong elevation raster with the terrain() function in the raster R package (Hijmans, 2017), using all eight neighboring cells (queen case). Aspect was transformed from degrees to a measure of north–south exposure (“northness”) by cos(aspect×π/180).

3.1.3 Relative elevation

Relative elevation is a measure of the difference in elevation between the pixel of interest and the lowest pixel within a given radius. A pixel on a mountain peak has a high relative elevation, while a pixel on a flat plain has a relative elevation of 0 (regardless of its elevation above or below sea level). A set of relative elevation layers for Hong Kong was calculated at multiple scales, following the moving window approach of Bennie et al. (2010). The radii used were 60, 120, 240, 480, and 960 m. These layers are expected to be most applicable as measures of surface water drainage, and therefore soil moisture as well. Relative elevation has been used as a covariate in climate interpolation as a proxy for cool air draining (Bennie et al., 2010; Ashcroft and Gollan, 2012), but was not included here as a predictor as Hong Kong lacks large valleys and other sheltered areas where this effect would be most relevant.

3.1.4 Distance to coast and water proximity

Water bodies adjacent to land areas can act as temperature buffers, contribute to evaporative cooling (Lookingbill and Urban, 2003), and influence precipitation patterns (Heiblum et al., 2011; Paiva et al., 2011); therefore, considering their presence is important for climatic predictions. Here, two different methods were used to quantify water body distribution in Hong Kong: distance to coast and water proximity. A distance to coast raster, measured in meters, was produced using the distance() function in the raster R package (Hijmans, 2017) with the Hong Kong coastline shapefile described in Sect. 3.1.1. Distance to coast did not incorporate inland water bodies. Second, water proximity (including inland water bodies) was calculated as the percent of the area surrounding a given pixel covered by land. A value of 1 means that the area within a given radius is entirely terrestrial, while 0 indicates it is entirely aquatic. Multiple water proximity rasters were calculated with varying radii using a circular moving window approach like that described by Aalto et al. (2017), to represent buffering processes at different scales. The radii used were 0.75, 1.5, 3, 6, and 12 km.

3.1.5 Urbanicity

Urbanicity rasters were developed because in densely constructed areas, urban heat island effects are expected to influence temperatures (Nichol and To, 2012; Shi et al., 2018), and therefore urbanicity may be an important predictor in climate interpolation. High-rise buildings can influence temperature by blocking wind, creating shade, acting as heat sinks, and producing thermal pollution. These effects are particularly relevant for this study, as some of Hong Kong's weather observation stations are adjacent to or inside urban centers. To quantify the distribution of developed area, we used a 30 m resolution dataset of percent impervious surface (Brown de Colstoun et al., 2017), which we expect to strongly correlate with urban development. For use in climate predictions these data were smoothed using a Gaussian moving window, because bulk air temperature is not expected to vary at a granular (30 m) scale at three buffer scales (σ= 10, 50, 100), using the focalWeight() and focal() functions in the raster R package (Hijmans, 2019), where type is “Gauss”. The resulting “urbanicity” layers were later used as climate predictors. In these rasters, completely impervious locations have a value of 100, while vegetated areas have a value of 0.

3.2 Climate variables

Climate interpolators are often faced with the challenge of estimating climate parameters over a large area using sparse weather station observations, at least in part of the region considered (e.g., Hu et al., 2016). In contrast, interpolation in Hong Kong is benefitted by a relatively small geographic area and a quite dense network of weather data provided by dozens of permanent weather stations (Hong Kong Observatory, 2018; see Fig. S2). Here we use multiple linear regression to predict geographic climate patterns using weather station training points and raster covariates. This is followed by thin plate spline (TPS) interpolation (see Wahba, 1979) of the regression model residuals. TPS is a widely used approach in climate interpolation (e.g., New et al., 2002; Fick and Hijmans, 2017), which fits a curved surface to irregularly distributed points. This two-step interpolation (regression followed by TPS) was based on the approach of Meineri and Hylander (2017).

Weather station observation data and geographic coordinates were downloaded from the web portal of the Hong Kong Observatory (2018). As the goal was to produce a representation of long-term but modern climate, measurements over 20 years (1998 to 2017) were included. To ensure averages were reliable, weather stations were only included for interpolation of each variable if at least 8 years of complete data were available within the 20-year window. The minimum number of stations used for each model is provided in Table 2. Monthly observations of 10 variables were obtained: maximum temperature, mean daily maximum temperature, mean daily temperature, mean daily minimum temperature, minimum temperature, mean dew point, mean relative humidity, mean wind speed, mean air pressure, and total rainfall.

Climate interpolation consisted of two main steps. First, a linear model was built for each climate variable for each month of the year. Independent variables were selected by searching the literature for similar studies and choosing predictors we expected to have an influence on climate at this regional scale. When necessary, each predictor was statistically transformed to approach a normal distribution. The six topographic predictors used as model building candidates were elevation, log-transformed distance to coast, exponentially transformed fine and coarse water proximity, log-transformed urbanicity (σ= 50), and “northness” – the cross product of aspect and slope. The water proximity layers were products of additively combining multiple scale rasters into fewer predictors: fine water proximity was the sum of 0.75, 1.5, and 3 km scale rasters, while coarse was the sum of 6 and 12 km. The six model predictors were tested for collinearity using vifstep() in the usdm R package (Naimi et al., 2014) with a variance inflation factor threshold of 6, and no problems were found. Linear models were built using the lm() R function. All predictors were initially included, and then, using the step() function, pared down in each regression model using stepwise bidirectional selection based on the Akaike information criterion, using 4 degrees of freedom as a penalty to make predictor selection stricter than the default. The resulting regression model was used to calculate a climate value at each grid cell based on a linear relationship with the selected predictors.

Second, to adjust for local variation in climate that is not associated with topography, the linear model residuals at each station were calculated and interpolated using the thin plate spline approach implemented in the fields R package (Nychka et al., 2017). The lambda smoothing parameter, which determines how closely the fitted surface matches input values, was set to 0.01. This low lambda value was selected because of the relatively high confidence in the long-term averaged weather station values (based on at least 8 years of data). This effectively produces a smoothed layer of local deviation from the linear model, which was used to additively adjust the results of the linear model predictions and produce finalized climate rasters.

We measured the spatial predictive ability of models using 10-fold cross-validation (Dobesch et al., 2007). In each validation round, 10 % of weather stations were reserved as a test dataset and the remainder were used for training. While randomly selected test points may be subject to spatial sampling bias (Hijmans, 2012), this may be less of a concern for this study because in Hong Kong the weather stations are fairly stratified (Fig. S2). The average root mean squared error of the test data subset from the final model prediction was used as an error measurement. To normalize these error measures across the climate variables, we adjusted them as a percentage of the standard deviation of the initial weather station values measured. This cross-validation procedure was used only to produce these validation measurements. The finalized monthly climate rasters described above were trained using all available data.

The finalized monthly rasters were then summarized into layers that characterize yearly climatic means and variation. These include 19 “bioclimatic” variables using the biovars() function in the dismo R package (Hijmans et al., 2017) which are specifically suited for species distribution modeling and other ecological purposes. This also allows our data to be compared with other climate data products that use the same calculations. Because those calculations only use rainfall and average daily maximum and minimum temperatures in each month, we also produced yearly average layers of dew point, relative humidity, mean daily temperature, air pressure, and wind speed. Also provided are layers of the highest and lowest average monthly extreme temperatures, and their difference (extreme temperature annual range). Because they are derived from monthly extremes rather than averaged daily extremes, these variables represent the full range of temperatures experienced in a given location better than the bioclimatic variables.

For comparison with global climate data products, we resampled bioclimatic variables to the same (1 km) resolution as WorldClim using bilinear interpolation. Only pixels present in both data products were used for comparisons.

3.3 Remote sensing data

The Normalized Difference Vegetation Index (NDVI) is a common metric of vegetation presence and density derived from satellite imagery. To calculate the NDVI, Landsat 8 images (U.S. Geological Survey, 2018) of Hong Kong were obtained. We downloaded one image from March 2016 that covers much of Hong Kong except for the far eastern areas and is free of clouds. This was supplemented with an image from March 2018 after adjustment, so that all land areas of the region were included. NDVI calculations were completed using the standard equation (Pettorelli et al., 2005):

(1) NDVI = ( NIR - Red ) / ( NIR + Red ) ,

where NIR is near-infrared (Landsat band 5: 0.851 to 0.879 µm) and Red is visible red radiation (Landsat band 4: 0.636 to 0.673 µm). The resulting NDVI value varies between 1 and −1, where higher values correspond to denser vegetation.

4 Results and discussion

Results of this environmental analysis of Hong Kong include 48 rasters and one vector file. All rasters are provided at an identical 1 arcsec (30 m) resolution and in the WGS84 geographic coordinate system. Summary values and filenames are provided in the data repository.

4.1 Topographic variables

Distance to coast results show that approximately 42 % of Hong Kong's land area is within 1 km of the coastline. However, it is apparent that inland areas often feature steep inclines, as half of Hong Kong's land is above 84 m elevation.

For variables like relative elevation, urbanicity, and water proximity, the ideal scale of raster calculation is dependent on the desired effect to be captured and perhaps other characteristics of the landscape in question. For this reason, we provide these rasters calculated at multiple buffer scales.

Table 1Raster product descriptions, units, and 5th, 50th, and 95th percentile values.

Download Print Version | Download XLSX

Urbanicity results show that the majority of land in Hong Kong is not near urban areas, as the median raster values are below 4 % urban at all scales calculated (Table 1). This shows that although Hong Kong has extremely dense urban cores, most of its mountainous terrain is unpopulated.

4.2 Climate variables

Minimally, a total of 32 024 monthly weather station measurements over 20 years (1998 to 2017) were used to construct climate models for all months and variables, at finer resolution compared to global datasets (Fig. 2). High weather station density and availability of data for multiple candidate topographic climate-forcing factors allowed for high confidence in many climate variable models, especially those related to temperature (Figs. 3, 4). The climate interpolation results include monthly models of 10 variables including temperature, precipitation, and humidity, making a total of 120 individual models produced (monthly models of three temperature variables are shown in Fig. 5). As an example, one of these models represents minimum temperatures recorded in all Januaries with data available from 1998 to 2017. For all variables, the predictors included in monthly models are displayed in Fig. 6, and the number of stations with data included is in Table 2.

Figure 2Comparison of the average high of warmest month (bio5) model results for Hong Kong. Panel (a) is from our newly interpolated climate models at 30 m resolution, while panel (b) is 1 km resolution data available as part of WorldClim 2 (Fick and Hijmans, 2017). Not only is the resolution markedly improved, but the temperature values are also more varied, for instance on the large southern islands.

Figure 3Adjusted r2 values of initial (pre-spline) regression models. Each boxplot includes 12 points, 1 for each monthly model. Temperature variation, especially mean temperature, was best explained by linear modeling, while rainfall was predicted the most poorly.


Figure 4Relative magnitude of training and testing dataset errors, from 10 validation rounds of climate variable modeling. A value of 100 indicates, for that climate model, that the average difference between the value recorded at a given weather station and the value predicted by the model at that location is equal to the standard deviation of the initial set of all values recorded at all weather stations for that climate variable.


Figure 5Model results for 3 of 10 interpolated climate variables. (a) Maximum temperature, (b) mean temperature, and (c) minimum temperature.

Figure 6Regression predictors included in monthly models for 10 climate variables. Each predictor is represented by a different color. Minimum and mean temperature variables were most predictable, consistently including elevation and urbanicity. Rainfall patterns were most difficult, with the fewest predictors included.


Table 2Number of weather stations that contributed data for each climate model.

Download Print Version | Download XLSX

4.2.1 Temperature

Temperature was found to vary considerably across Hong Kong, with a more than 6 C difference in mean annual temperature between the highest mountain peaks (> 900 m, < 18 C) and some low-lying urbanized areas (> 24 C). While mean and minimum temperatures are highest in urban areas, maximum temperature shows a different pattern, with a maximum in inland valleys in the northern New Territories. This pattern may be explained by urban heat retention: buildings act as heat sinks which absorb solar radiation during the day, and slowly release heat at night, causing increased minimum temperatures (see Oke, 1982). The high maximum temperatures in inland valleys may be due to reduced air circulation in sheltered locations and lack of complex vegetation or urban structures providing shade. The high accuracy of temperature models (Figs. 3, 4) is likely due to a strong association with elevation; elevation was by far the most commonly included predictor for temperature models (Fig. 6). Urbanicity was important for mean and minimum temperature, but not maximum temperature. Water proximity and coast distance were differentially included depending on the variable, while aspect × slope rarely had an effect.

4.2.2 Rainfall

In our models, the highest annual rainfall (bio12) areas in Hong Kong (>2500 mm annually) are inland and at high elevations, presumably because of condensation from humid air as it passes over mountains. Areas near the coast, particularly small outlying islands and the eastern coast in Lung Kwu Tan, receive the lowest amount of annual rainfall (<1600 mm). Precipitation of the driest month (bio14) was uniformly low, ranging from 20 to 40 mm, but the relative pattern of high- and low-precipitation areas remained similar. The most commonly included model predictor was fine-scale water proximity (Fig. 6). Elevation was predictive for 5 out of 12 months, but few other topographic predictors were useful. Seasonality of rainfall in Hong Kong is strong. Averaged across all locations, 52 % of total yearly rainfall was recorded in 3 months (June through August). Rainfall models were informed by more weather stations than any other climate variable (Table 2), but they have the highest relative standard error (Fig. 3) and therefore the lowest accuracy. Because they are influenced by both global and locally variable wind patterns, precipitation distributions are notoriously difficult to predict, especially in urban areas (Cristiano et al., 2017).

4.2.3 Dew point, humidity, pressure, and wind speed

Dew point exhibits a similar pattern to other temperature variables, with mean annual dew point ranging from 15.5 C at mountain peaks to around 19 C on small islands and lower areas. Mean annual relative humidity reaches a maximum of about 90 % at Tai Mo Shan, while many urban areas in Kowloon, Tuen Mun, and Yuen Long are between 70 % and 75 %. Surprisingly, mean annual air pressure has a positive correlation with elevation; the highest values (reaching 1014 hPa) are at mountain peaks, and particularly low values (as low as 1012.5 hPa) in coastal areas of southern and western Hong Kong. Mean annual wind speed is also strongly associated with elevation, with mean annual values above 30 km h−1 on Lantau Island mountain peaks, down to below 5 km h−1 in interior low-elevation areas of the New Territories.

4.2.4 Comparisons with global climate data

Our new climate models are compared with a recent global climate dataset to identify differences in predictions of Hong Kong climate values (Fig. 7). WorldClim 2 was produced using a similar interpolation approach with regression modeling and thin plate spline interpolation, but also included satellite-derived covariates in addition to topography (Fick and Hijmans, 2017). Because WorldClim incorporates vast amounts of data from multiple databases covering overlapping geographic and political entities, it is difficult to ascertain exactly which individual weather stations were included, and we were unable to determine whether any Hong Kong weather stations were included or if the datasets are completely independent. However, the model predictions differ substantially (Figs. 2, 7; Table 3). Our models generally indicate greater spatial variation than WorldClim, with cool areas colder, warm areas hotter, and wet areas wetter. For example, in the average low temperature of the coldest month (bio6), high-elevation areas could be more than 2 C lower and urban areas more than 2 C higher than WorldClim indicates (Fig. 7a). To further quantify differences in values between these two datasets, for each of the 19 bioclimatic variables we calculated the standard deviation of raster values (Table 3). All of our interpolated climate rasters had a higher standard deviation than their WorldClim 2 counterparts. Though there is a temporal discrepancy between weather station data used in WorldClim 2 (1970–2000) and this study (1998–2017), climate change is unlikely to explain the observed differences in temperature variability. Evidence suggests that, if anything, mountains are experiencing climate warming faster than low-elevation areas (Pepin et al., 2015), which would give the opposite results of our findings where Hong Kong's mountains are cooler than WorldClim indicates (Fig. 7a). Unless global climate models increase in resolution and accuracy, regional models will remain critical for local applications.

Figure 7Differences between results of this study and WorldClim 2 (Fick and Hijmans, 2017) values. Panel (a) is average low temperature of the coldest month (bio6), with red where the local model is warmer than WorldClim, and blue is colder. Panel (b) shows annual precipitation (bio12), with blue where the local model predicts more rainfall than WorldClim, and tan is less rainfall. Our model results were resampled to 1 km resolution using bilinear interpolation to allow for these comparisons.


Table 3Comparisons of variation between bioclimatic variables, measured as raster value standard deviation. All new rasters are more variable than their corresponding WorldClim 2 layers. Increases in standard deviation range from 1.4x to 3.4x. Calculations may appear inaccurate due to rounding.

Download Print Version | Download XLSX

4.3 Remote sensing variable

The NDVI data represent vegetation quality and density based on two merged satellite images, both in March of their respective years. Although this is only an instantaneous representation of NDVI, we expect it to correlate strongly with the spatial pattern of vegetation density throughout the year. Certain plant species shed and regenerate their leaves during specific months ranging from winter through mid-summer, but Hong Kong's woody vegetation is overall evergreen (Dudgeon and Corlett, 1994), so seasonal changes in NDVI are not expected to be drastic. NDVI values above 0.4 include Hong Kong's densest forests, while unvegetated or urbanized areas are well below 0.1. The densest vegetation (>0.4 NDVI) in Hong Kong tends to be on slopes between 100 and 400 m elevation (Fig. 8), and is distributed between Hong Kong Island, Lantau Island, and the New Territories. The verdant mangrove forests, at sea level, are an exception. The patchy distribution of high-density vegetation likely reflects the effects of historical deforestation. The largest patches are found on the southeastern slopes of Tai To Yan in the New Territories. The relative distribution of NDVI classes along Hong Kong's elevational gradient is shown in Fig. 8. Future work could determine to what extent NDVI changes over time, in response to seasonality or recent weather. The limiting factor is the availability of data of adequate temporal resolution, as many satellite images of Hong Kong are obscured by cloud cover or degraded by poor air quality.

Figure 8NDVI class composition over Hong Kong's elevational range. The majority of land area near sea level is below NDVI 0.1, while Hong Kong's highest-elevation areas are between 0.1 and 0.2, indicating short vegetation. The elevation range with proportionally the most dense vegetation (0.4 to 0.5 NDVI) is 300 to 400 m.

4.4 Value and utility

These new data will benefit environmental research, and specifically SDM studies, in two main ways. First, it will enable finer-scale analyses than previously possible. For SDM, this means improved detection of climatic microrefugia (Meineri and Hylander, 2017) and the ability to differentiate between human-altered habitat and natural areas. Rampant development and a shifting climate make this knowledge of local species persistence more important than ever. Additionally, this is especially relevant in Hong Kong, where topography varies dramatically and where urban areas form a complex mosaic with undeveloped expanses.

Second, we provide a diverse array of rasters derived from multiple independent data sources, but in a single resolution and format to facilitate further analysis and synthesis of meaning. For SDM, these layers have distinct advantages over datasets that only contain climate data. Compared to climate data alone, diverse predictors including topographic characteristics have been shown to be important variables for accurate SDM results, such as predicting the spread of invasive species in new ranges (Peterson and Nakazasa, 2008). However, benefits of non-climate data may only be evident in finer-scale SDMs (Luoto et al., 2007).

Finally, such high-quality, diverse geographic data are especially uncommon in tropical regions, where improved knowledge for environmental research and biological conservation is most needed. According to Rapoport's rule, tropical species are more likely to have smaller distributions (Stevens, 1989), and therefore future execution of local SDM studies to understand their ranges is particularly important.

4.5 Limitations and next steps

Here we outline how shortfalls of the presented data may be improved in the future. First, though we inferred Hong Kong's pattern of urban development from impervious surface data, this is less than ideal because in addition to concrete, bare soil and rock are sensed as impervious. Also, it cannot differentiate dense urban cores of high rises from large paved areas. For climate modeling, an urbanicity measure that considers building height or population density at a 30 m or finer scale could be preferable.

Second, while our temperature rasters should accurately represent air temperature in open areas, they do not reflect the high spatial variation in temperature found in urban microclimates. For example, although the manned Kowloon HKO weather station is inside a densely populated area, as pointed out by Nichol and To (2012), it is still in a small parklike area surrounded by trees and therefore is not representative of the most densely urbanized areas of Hong Kong. Other stations in urban areas are similarly near green spaces or otherwise open areas. Higher-resolution (say 5 or 1 m) studies of urban thermal distributions would strongly benefit from analysis of wind patterns, building height, thermal pollution, and other factors (e.g., Shi et al., 2018). Therefore granular, ground-level temperatures in urban areas are likely substantially different than the broader air temperature values our models provide.

Similar to other climate interpolation studies, bias in the physical locations of automatic weather stations may be of concern. Weather stations are often intentionally placed in flat, open areas with the goal of measuring weather that is relevant to a broad geographic area, rather than locations that may experience unique local climate. It may be for this reason that slope × aspect was infrequently useful for model construction, as few stations are on steep slopes. Elevational distribution of stations may also be a source of bias; although a weather station operates at the highest point in Hong Kong (Tai Mo Shan, 955 m), there are only two other stations above 600 m.

Finally, while we used cross-validation to measure the spatial predictive ability of the climate models, this method is only able to test models against locations where weather stations are present; validation based on an independently collected dataset would be ideal. One common validation method is to use weather data loggers placed across elevational and land-use gradients (Meineri and Hylander, 2017). Such an approach would allow for explicit testing and comparing of predictiveness of climate products for different areas of Hong Kong.

Important gaps in Hong Kong geographic data remain. Projections of future climate scenarios could complement historical data to enable predictions of biodiversity change. Additional variables like cloud cover and solar radiation would especially benefit studies of photosynthetic taxa. A discrete classification of habitat type would be useful for ecological research, and quality soil type data are lacking. Availability of such data for Hong Kong would complement the findings of this project, which significantly advances our understanding of geographic heterogeneity in this complex tropical region.

5 Data availability

GeoTIFF raster and shapefile documents (Morgan and Guénard, 2018) can be downloaded from figshare: A document in the repository includes file names, descriptions, and summary statistics for all provided rasters. Individual monthly rasters for each of the 10 climate variables are available as a compressed zip file.

6 Conclusions

This diverse set of 30 m resolution topography, climate, and remote sensing data includes the first published interpolation of long-term climate averages specific to Hong Kong. Our findings suggest that global interpolated climate datasets are limited by their resolution and underestimate local climate variability. Therefore the availability of such local data will remain critically important for the foreseeable future. These new data will allow for a new generation of studies in Hong Kong and enable connections between environmental data and biotic patterns at a much finer scale than previously possible. Aside from clear uses in conservation, ecological, and biogeographic research, we also expect this freely accessible dataset to be broadly applicable for many sectors, including tourism, hydrology, recreation, agriculture, mapmaking, and real estate.

Appendix A: Glossary of variable definitions
Maximum temperature (tmax) the highest temperature observed within a month
Mean daily maximum temperature (mtmax) the mean of all daily high temperatures within a month
Mean daily temperature (tmean) the mean of all temperatures within a month
Mean daily minimum temperature (mtmin) the mean of all daily low temperatures within a month
Minimum temperature (tmin) the lowest temperature observed within a month
Mean dew point (dewp) the mean of all dew point observations within a month
Mean relative humidity (humid) the mean of all relative humidity observations within a month
Mean wind speed (windsp) the mean of all wind speed observations within a month
Mean air pressure (press) the mean of all air pressure observations within a month
Rainfall (prec) the total of all rain recorded within a month
Relative elevation the difference in elevation between the pixel of interest and
the lowest pixel within a given radius
Distance to coast geometric distance between the pixel of interest and the
nearest oceanic coastline
Water proximity percent of area that is terrestrial within a given radius
of the pixel of interest
NDVI Normalized Difference Vegetation Index
Urbanicity measure of area that is impervious surface within a given
radius of the pixel of interest

The supplement related to this article is available online at:

Author contributions

BM acquired initial data, conducted modeling, and prepared the dataset. BM and BG prepared the manuscript.

Competing interests

The authors declare that they have no conflict of interest.


This project would not have been possible without the Hong Kong Observatory, which works tirelessly to maintain their weather station network and ensure the resulting data are accessible. We also thank Eric Meineri for comments and advice while planning our analyses.

Financial support

This research has been supported by the Ocean Park Conservation Foundation (grant no. OPCFHK 2016/17-OT16).

Review statement

This paper was edited by Giulio G. R. Iovine and reviewed by six anonymous referees.


Aalto, J., Riihimäki, H., Meineri, E., Hylander, K., and Luoto, M.: Revealing topoclimatic heterogeneity using meteorological station data, Int. J. Climatol., 37, 544–556,, 2017. 

Agriculture, Fisheries and Conservation Department: The Government of the Hong Kong Special Administrative Region: Annual Report 2016–2017, Country and Marine Parks, available at: (last access: 20 September 2018), 2017. 

Ashcroft, M. B. and Gollan, J. R.: Fine-resolution (25 m) topoclimatic grids of near-surface (5 cm) extreme temperatures and humidities across various habitats in a large (200 × 300 km) and diverse region, Int. J. Climatol., 32, 2134–2148,, 2012. 

Bennie, J. J., Wiltshire, A. J., Joyce, A. N., Clark, D., Lloyd, A. R., Adamson, J., Parr, T., Baxter, R., and Huntley, B.: Characterising inter-annual variation in the spatial pattern of thermal microclimate in a UK upland using a combined empirical–physical model, Agr. Forest Meteorol., 150, 12–19,, 2010. 

Brown de Colstoun, E. C., Huang, C., Wang, P., Tilton, J. C., Tan, B., Phillips, J., Niemczura, S., Ling, P.-Y., and Wolfe, R. E.: Global Man-made Impervious Surface (GMIS) Dataset from Landsat, NASA Socioeconomic Data and Applications Center (SEDAC), Palisades, NY, USA,, 2017. 

Cheng, W. and Bonebrake, T. C.: Conservation effectiveness of protected areas for Hong Kong butterflies declines under climate change, J. Insect Conserv., 21, 599–606,, 2017. 

Connor, T., Hull, V., Viña, A., Shortridge, A., Tang, Y., Zhang, J., Wang, F., and Liu, J.: Effects of grain size and niche breadth on species distribution modeling, Ecography, 41, 1270–1282,, 2018. 

Cristiano, E., ten Veldhuis, M.-C., and van de Giesen, N.: Spatial and temporal variability of rainfall and their effects on hydrological response in urban areas – a review, Hydrol. Earth Syst. Sci., 21, 3859–3878,, 2017. 

Dobesch, H., Dumolard, P., and Dyras, I. (Eds.).: Spatial interpolation for climate data: the use of GIS in climatology and meteorology, John Wiley and Sons,, 2007. 

Dobrowski, S. Z.: A climatic basis for microrefugia: the influence of terrain on climate, Glob. Change Biol., 17, 1022–1035,, 2010. 

Dudgeon, D. and Corlett, R.: Hills and streams: an ecology of Hong Kong, Hong Kong University Press, Hong Kong, 1994. 

Fick, S. E. and Hijmans, R. J.: WorldClim 2: new 1 km spatial resolution climate surfaces for global land areas, Int. J. Climatol., 37, 4302–4315,, 2017. 

Franklin, J., Davis, F. W., Ikegami, M., Syphard, A. D., Flint, L. E., Flint, A. L., and Hannah, L.: Modeling plant species distributions under future climates: how fine scale do climate projections need to be?, Glob. Change Biol., 19, 473–483,, 2013. 

Giridharan, R., Ganesan, S., and Lau, S. S. Y.: Daytime urban heat island effect in high-rise and high-density residential developments in Hong Kong, Energ. Buildings, 36, 525–534,, 2004. 

Guisan, A., Graham, C. H., Elith, J., Huettmann, F., and NCEAS Species Distribution Modelling Group: Sensitivity of predictive species distribution models to change in grain size, Divers. Distrib., 13, 332–340,, 2007. 

Heiblum, R. H., Koren, I., and Altaratz, O.: Analyzing coastal precipitation using TRMM observations, Atmos. Chem. Phys., 11, 13201–13217,, 2011. 

Hijmans, R. J.: Cross-validation of species distribution models: removing spatial sorting bias and calibration with a null model, Ecology, 93, 679–688,, 2012. 

Hijmans, R. J.: Package 'raster'. Geographic Data Analysis and Modeling. R package version 2.8-19, available at: access: 20 September 2018), 2017. 

Hijmans, R. J., Phillips, S., Leathwick, J., and Elith, J.: Package ‘dismo'. Species Distribution Modeling. R package version 1.1-4, available at: (last access: 20 September 2018), 2017. 

Hong Kong Observatory:, last access: 20 September 2018. 

Hu, Z., Hu, Q., Zhang, C., Chen, X., and Li, Q.: Evaluation of reanalysis, spatially interpolated and satellite remotely sensed precipitation data sets in central Asia, J. Geophys. Res.-Atmos., 121, 5648–5663,, 2016. 

Karger, D. N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R. W., Zimmerman, N. E., Linder, H. P., and Kessler, M.: Climatologies at high resolution for the earth's land surface areas, Scientific Data, 4, 170122,, 2017. 

Lands Department: The Government of the Hong Kong Special Administrative Region: Digital Terrain Model (DTM), available at: (last access: 15 March 2018), 2017. 

Lassueur, T., Joost, S., and Randin, C. F.: Very high resolution digital elevation models: Do they improve models of plant species distribution?, Ecol. Model., 198, 139–153,, 2006. 

Lau, S. S. and Zhang, Q.: Genesis of a vertical city in Hong Kong, Int. J. High-Rise Build., 4, 117–125,, 2015. 

Levin, S. A.: The problem of pattern and scale in ecology: the Robert H. MacArthur award lecture, Ecology, 73, 1943–1967,, 1992. 

Liu, L. and Zhang, Y.: Urban heat island analysis using the Landsat TM data and ASTER data: A case study in Hong Kong, Remote Sens., 3, 1535–1552,, 2011. 

Lookingbill, T. R. and Urban, D. L.: Spatial estimation of air temperature differences for landscape-scale studies in montane environments, Agr. Forest Meteorol., 114, 141–151,, 2003. 

Luoto, M., Virkkala, R., and Heikkinen, R. K.: The role of land cover in bioclimatic models depends on spatial resolution, Global Ecol. Biogeogr., 16, 34–42,, 2007. 

Marafa, L. M. and Chau, K. C.: Effect of hill fire on upland soil in Hong Kong, Forest Ecol. Manag., 120, 97–104,, 1999. 

Meineri, E. and Hylander, K.: Fine-grain, large-domain climate models based on climate station and comprehensive topographic information improve microrefugia detection, Ecography, 40, 1003–1013,, 2017. 

Mertes, K. and Jetz, W.: Disentangling scale dependencies in species environmental niches and distributions, Ecography, 41, 1604–1615,, 2018. 

Morgan, B. and Guénard, B.: Hong Kong climate, vegetation, and topography rasters, Figshare,, 2018. 

Naimi, B., Hamm, N., Groen, T. A., Skidmore, A. K., and Toxopeus, A. G.: Where is positional uncertainty a problem for species distribution modelling?, Ecography, 37, 191–203,, 2014. 

New, M., Lister, D., Hulme, M., and Makin, I.: A high-resolution data set of surface climate over global land areas, Clim. Res., 21, 1–24,, 2002. 

Nezer, O., Bar-David, S., Gueta, T., and Carmel, Y.: High-resolution species-distribution model based on systematic sampling and indirect observations, Biodivers. Conserv., 26, 421–437,, 2017. 

Ng, E.: Wind and heat environment in densely built urban areas in Hong Kong, Glob. Environ. Res., 13, 169–178, available at: (last access: 15 March 2018), 2009. 

Nichol, J., Hang, T. P., and Ng, E.: Temperature projection in a tropical city using remote sensing and dynamic modeling, Clim. Dynam., 42, 2921–2929,, 2014. 

Nichol, J. E. and To, P. H.: Temporal characteristics of thermal satellite images for urban heat stress and heat island mapping, ISPRS J. Photogramm., 74, 153–162,, 2012. 

Nychka, D., Furrer, R., Paige, J., and Sain, S.: Package 'fields' Tools for spatial data. R package version 9.6, available at: (last access: 3 September 2019), 2017. 

Oke, T. R.: The energetic basis of the urban heat island, Q. J. Roy. Meteor. Soc., 108, 1–24,, 1982. 

OpenStreetMap:, last access: 13 Febrauary 2018. 

Paiva, R. C. D., Buarque, D. C., Clarke, R. T., Collischonn, W., and Allasia, D. G.: Reduced precipitation over large water bodies in the Brazilian Amazon shown from TRMM data, Geophys. Res. Lett., 38, L04406,, 2011. 

Pepin, N., Bradley, R. S., Diaz, H. F., Baraer, M., Caceres, E. B., Forsythe, N., Fowler, H., Greenwood, G., Hashmi, X. D., Liu, J. R., Miller, L., Ning, L., Ohmura, A., Palazzi, E., Rangwala, I., Schöner, W., Severskiy, I., Shahgedanova, M., Wang, M. B., Williamson, S. N., and Yang, D. Q:. Elevation-dependent warming in mountain regions of the world, Nat. Clim. Change, 5, 424–430,, 2015. 

Peterson, A. T. and Nakazawa, Y.: Environmental data sets matter in ecological niche modelling: an example with Solenopsis invicta and Solenopsis richteri, Global Ecol. Biogeogr., 17, 135–144,, 2008. 

Peterson, A. T., Soberón, J., Pearson, R. G., Anderson, R. P., Martínez-Meyer, E., Nakamura, M., and Araújo, M. B.: Ecological niches and geographic distributions, Princeton University Press, Princeton, NJ, USA, 2011. 

Pettorelli, N., Vik, J. O., Mysterud, A., Gaillard, J. M., Tucker, C. J., and Stenseth, N. C.: Using the satellite-derived NDVI to assess ecological responses to environmental change, Trends. Ecol. Evol., 20, 503–510,, 2005. 

Pradervand, J. N., Dubuis, A., Pellissier, L., Guisan, A., and Randin, C.: Very high resolution environmental predictors in species distribution models: Moving beyond topography?, Prog. Phys. Geog., 38, 79–96,, 2014. 

QGIS Development Team: QGIS Geographic Information System, Open Source Geospatial Foundation Project, available at:, last access: 4 March 2018. 

R Core Team: R: A language and environment for statistical computing, available at: (last access: 28 January 2017), 2016. 

RStudio Team: RStudio: integrated development for R, RStudio, Inc., Boston, MA, 42, available at: (last access: 28 January 2017), 2015. 

Seo, C., Thorne, J. H., Hannah, L., and Thuiller, W.: Scale effects in species distribution models: implications for conservation planning under climate change, Biol. Letters, 5, 39–43,, 2009. 

Shi, Y., Katzschner, L., and Ng, E.: Modelling the fine-scale spatiotemporal pattern of urban heat island effect using land use regression approach in a megacity, Sci. Total Environ., 618, 891–904,, 2018. 

Stevens, G.: The latitudinal gradient in geographical range: how so many species coexist in the tropics, Amer. Nat., 133, 240–256,, 1989. 

U.S. Geological Survey: Landsat-8,, 2018. 

Vega, G. C., Pertierra, L. R., and Olalla-Tárraga, M. Á.: MERRAclim, a high-resolution global dataset of remotely sensed bioclimatic variables for ecological modelling, Scientific Data, 4, 170078,, 2017.  

Wahba, G.: How to Smooth Curves and Surfaces with Splines and Cross-Validation, Ft. Belvoir: Defense Technical Information Center,, 1979. 

Wang, R., Ren, C., Xu, Y., Lau, K. K. L., and Shi, Y.: Mapping the local climate zones of urban areas by GIS-based and WUDAPT methods: A case study of Hong Kong, Urban Clim., 24, 567–576,, 2018. 

Zhuang, X. Y. and Corlett, R. T.: Forest and forest succession in Hong Kong, China, J. Trop. Ecol., 13, 857–866,, 1997. 

Short summary
Hong Kong is poised to become a model region for understanding the effects of urbanization, biotic invasions, and protected areas in the tropics. However, until now there have been few suitable GIS layers to address these issues on a landscape scale. This set of 30 m resolution vegetation, topography, and interpolated climate rasters will enable a new generation of spatial studies in Hong Kong. Compared to global datasets, these local models consistently indicate greater climatic heterogeneity.