A History of Open Weather in New Zealand (HOWNZ): an open access 1-km resolution monthly 1910-2019 time-series of interpolated temperature and rainfall grids with associated uncertainty

Long time-series of weather grids are fundamental to understanding how weather affects environmental or ecological patterns and processes such as plant distributions, plant and animal phenology, wildfires, and hydrology. Ideally such weather grids should be openly available and be associated with uncertainties so that users can understand any data 10 quality issues. We present a History of Open Weather in New Zealand (HOWNZ) that uses climatological aided natural neighbour interpolation to provide monthly 1-km resolution grids of total rainfall, mean air temperature, mean daily maximum air temperature, and mean daily minimum air temperature across New Zealand from 1910 to 2019. HOWNZ matches the best available temporal extent and spatial resolution of any open weather grids that include New Zealand, and is unique in providing associated spatial uncertainty in appropriate units of measurement. The HOWNZ weather and 15 uncertainty grids capture the dynamic spatial and temporal nature of the monthly weather variables and the uncertainty associated with the interpolation. We also demonstrate how to quantify and visualise temporal trends across New Zealand that recognise the temporal and spatial variation of uncertainties in the HOWNZ data. The HOWNZ data is openly available at https://doi.org/10.7931/zmvz-xf30 (Etherington et al., 2021). 20


Introduction
Climatologies such as WorldClim2 (Fick and Hijmans, 2017) and Chelsa (Karger et al., 2017) provide spatial grids of climatic variables such as temperature and rainfall that have underpinned thousands of environmental and ecological studies.
These climatologies represent long-term ≈ 30-year climate averages, but long time-series of weather grids are also highly desirable as they can be correlated with long-term environmental or ecological data to explore how weather influences such 25 processes. For example, monthly time-series data of weather conditions can be used to improve the understanding of plant distributions (Stewart et al., 2021), plant and animal phenology (Gordo and Sanz, 2005), wildfire histories (Girardin and Wotton, 2009), and hydrology (Remesan et al., 2014). sharing data, source code, and knowledge provides exciting opportunities for scientific discovery (de Vos et al., 2020). In New Zealand there are limited open access options for national-scale weather grids (Table 1), and none of these meet all the currently optimal criteria amongst these open access options of spatial (≤ 1 km 2 ) and temporal (monthly) resolution over the last century. Producing such historical weather data is challenging because weather station data are often sparse, especially in remote areas and for earlier time periods. However, this challenge should not preclude the creation of historical weather 35 data, if we accept that we cannot be consistently successful for all locations and dates and we prioritise providing weather data with associated uncertainty data.
Much emphasis has been placed on quantifying (Foley, 2010) and visualising (Retchless and Brewer, 2016) the uncertainty of future climates. Quantifying and visualising uncertainty is also critical for the judicious use of historic weather data, as 40 even when following open science practices that involve sharing data and code it can still be difficult to ensure that an enduser is aware of and understands the quality of the data (de Vos et al., 2020). Historical climate and weather data are often used predictively, but these predictions must recognise the underlying uncertainties so that their reliability can be understood and communicated. The uncertainty (or reliability) of weather and climate spatial data is often quantified with metrics such as the root-mean-square error (RMSE) or the mean absolute error (MAE) derived from cross-validation that iteratively 45 excludes each weather station and then interpolates a value for the weather station using the remaining data (Willmott and Matsuura, 2006). However, these are global metrics that describe uncertainty with a single spatially averaged value and provide no information about how the uncertainty will vary across space (Zhang and Goodchild, 2002). When estimates of spatial uncertainty accompany climate and weather data, they typically simply indicate areas that are reliable (Abatzoglou et al., 2018;Harris et al., 2020b), which limits how uncertainty can be incorporated into analyses as estimates from individual 50 cells are not matched with an uncertainty in the units of measurement.
To facilitate understanding of how long-term weather patterns have been changing and potentially affecting environmental and ecological processes in New Zealand, we have produced a History of Open Weather in New Zealand (HOWNZ).
HOWNZ is a new openly available history of temperature and rainfall weather in New Zealand that matches the best 55 available spatial and temporal extent and resolution of any currently available open data and is unique in providing associated spatial uncertainty grids in appropriate units of measurement (Table 1).

Weather and climatology data
Monthly weather station data are freely available from New Zealand's National Climate Database (NIWA, 2020). For New 60 Zealand's three main islands and their associated near-shore islands we compiled all reliable data with sub-kilometre spatial https://doi.org/10.5194/essd-2021-303 precision from 1900 to 2019 for monthly statistics of: total rainfall, mean air temperature, mean daily maximum air temperature, and mean daily minimum air temperature. The amount of data available in any given month varied through time, across space, and between weather variables, but there were some consistencies. There was much more rainfall data than temperature data, there was generally more data in recent times (with a peak around 1980), and there was always less 65 data in mountainous interiors of both islands and in the more remote southerly and westerly regions of New Zealand ( Figure   1). Openly available New Zealand climatology grids for each weather variable giving the average weather in each month for the period 1950-1980 at 100 m grid cell resolution (McCarthy et al., 2021;Leathwick et al., 2002) were reprojected and aggregated to 1-km 2 grid cell resolution.

Interpolation with uncertainty
Climatologies have formed part of successful previous weather interpolations in New Zealand (Tait et al., 2006). Therefore, we used a climatologically aided interpolation approach by interpolating monthly weather as an anomaly from a climatic normal (Willmott and Robeson, 1995) as this technique has been successfully applied elsewhere (Abatzoglou et al., 2018;Harris et al., 2020b;Hofstra et al., 2008). The basic premise of climatologically aided interpolation is that rather than 75 directly interpolating weather station data in each month, by using an underlying long-term climatology grid for that month, monthly anomalies can be calculated as the difference between the weather station value and the climatology grid value.
These anomalies are then interpolated and added to the climatology grid to estimate how the weather in that month deviated from the climatological normal.

80
We selected natural neighbour (or Sibson) interpolation (Sibson, 1981) to interpolate the anomalies. This decision was based on evidence that natural neighbour interpolation performs well for interpolating weather data (Hofstra et al., 2008;Keller et al., 2015;Lyra et al., 2018), but more specifically because natural neighbour interpolation is: (i) an exact interpolator, meaning it will retain the original data values in the interpolated grid and will only interpolate within the range of the original data and so cannot produce wildly unrealistic interpolations; (ii) a local method, which interpolates for a 85 location only using data from that location's immediate surrounds; (iii) spatially adaptive, so automatically adapting to localised data distribution and density; (iv) not based on fitting statistical trends and so does not require large sample sizes (Etherington, 2020). These properties of natural neighbour interpolation are desirable given our interpolations will need to adapt to the increasingly sparse and irregular data that occurs further back in New Zealand history.

90
We applied a discrete (or digital) form of natural neighbour interpolation (Park et al., 2006) that simultaneously calculates uncertainty as a cross-validation error-distance field (Etherington, 2020). This interpolation method works by defining a grid of cells for which interpolated values will be calculated; data cells are given the mean value of any weather stations they contain, and all other cells are given the value of the nearest data cell (Figure 2a). The interpolated value for a grid cell is the mean of the grid cell values that are as close or closer to the interpolation cell than a data cell (Figure 2b), which, when 95 repeated across all grid cells, results in a smooth interpolation grid ( Figure 2c). Uncertainty is calculated by using the same discrete natural neighbour interpolation process to interpolate the distances to the data cells to produce natural neighbour distances ( Figure 2d) and to interpolate the cross-validated error rate, as the ratio of the cross-validated absolute error to natural neighbour distance, for each data cell ( Figure 2e). The product of the natural neighbour distances and cross-validated error rates produces a cross-validation error-distance field (Figure 2f) that yields a grid of interpolation uncertainties in the 100 same units as the interpolation and is highest in areas that are more distant from data cells and are in areas that areas that are harder to interpolate accurately.
The resulting weather grids appeared to capture the heterogeneous and dynamic nature of the monthly weather variables and the uncertainty associated with the interpolation. For example, the May total rainfalls and uncertainties ( Figure 3) illustrate 105 the dynamic nature of both the monthly weather variables and the uncertainty. There are clear differences in total rainfall in May over time and shifts in the location of the wettest regions. Our emphasis on quantifying uncertainty is justified by the uncertainty grids being extremely dynamic, with the magnitude and location of uncertainty changing dramatically through time. We interpret this dynamic uncertainty as being due to the data limitations of interpolation, as higher uncertainty occurs in locations more distant from weather data ( Figure 1). However, uncertainty can be high in regions with weather data 110 available, which we interpret as arising from the spatial variability of individual monthly weather patterns, as when spatial variability of weather patterns increases over shorter distances, interpolation becomes increasingly difficult (Etherington, 2020).

Interpolation validation 115
The MAE provides a reasonable estimate of the actual error rates for discrete natural neighbour interpolation (Etherington, 2020). Therefore, even though each monthly weather grid has a matching uncertainty grid, we also used cross-validation to calculate the MAE for each monthly interpolation to validate the method and facilitate comparisons with other weather interpolations (Willmott and Matsuura, 2006).

120
While the number of data cells (which equates closely to the number of weather stations) used during interpolation varied over time, the MAEs associated with all the weather variables remained reasonably constant around 20 mm for total rainfall and 1 °C for all temperature variables (Figure 4). This consistency suggests that the interpolation methodology was robust to reductions in available data. As might be expected, the MAEs were lowest when interpolating data during  which is the period the climatologies aiding the interpolation were created for. There was also an increase in MAE moving 125 https://doi.org/10.5194/essd-2021-303 The spatial variability in monthly uncertainty (Figure 3) explains the variable pattern of MAE over time ( Figure 4); while the 130 number and location of weather stations may vary from month to month, the weather patterns may change dramatically making some months easier to interpolate than others. This variability of uncertainty further emphasises the need to provide spatially and temporally relevant estimates of uncertainty for all interpolations, and that globally aggregated uncertainty estimates may not provide users with a full picture of the data's limitations.

Using the uncertainty data
When using interpolated weather data users need to be able to make decisions specific to their requirements (de Vos et al., 2020). Therefore, in HOWNZ we have endeavoured to match every interpolation with an individual measure of uncertainty in the relevant units. However, we recognise that it is unusual for historical weather data to be presented alongside such detailed uncertainty estimates. These uncertainty estimates provide a new analytical opportunity and challenge for potential 140 data users, so to enable data users to leverage the uncertainty data we feel it is important to demonstrate how to incorporate the uncertainty data into an analytical workflow.
A simple example of weather time-series analysis would be to use a Spearman's rank (rs) correlation (Gregory, 1978;Spearman, 1904) to detect the directionality of any trends in weather patterns over time (Girardin and Wotton, 2009). 145 To incorporate uncertainty into this process, we adopt a Monte Carlo approach in which we produce many equally possible weather histories by randomly sampling each month's weather as a random value from a uniform distribution with a range equal to the interpolation uncertainty (though limiting rainfall to a minimum of 0 mm). Many trends can then be calculated, with their distribution used to infer the reliability of the analysis. The importance of incorporating uncertainty is evident when comparing locations that have differing uncertainty levels. For a location with high uncertainty the possible weather 150 histories can vary widely around the single interpolated weather history resulting in a wide distribution of possible trends ( Figure 5a). In other instances, the trend may be stronger than the uncertainty meaning that while the strength of the trend varies its direction can be clearly established (Figure 5b). At locations where there is little uncertainty and hence minimal variation in possible weather histories the trend can be established precisely (Figure 5c).

155
If this trend analysis with uncertainty process is repeated for every location, spatial trends can be analysed. The challenge then becomes how to best visualise the spatial pattern of uncertainty. Based on guidance relating to the cartographic https://doi.org/10.5194/essd-2021-303  Retchless and Brewer, 2016), we selected a diverging colour scheme to show trends as the median rs of the possible weather histories, and a value-by-alpha approach (Roth et al., 2010) to mask locations of increasing uncertainty measured as the 5 th to 95 th percentile range of the rs of the possible weather histories. The resulting 160 map indicates that in some regions there are clear trends in weather patterns over time, but in other regions the uncertainty is too large to reliably make inferences from them ( Figure 6).
These results clearly demonstrate that it is possible to produce a long-term history of interpolated weather and emphasise the importance of quantifying the uncertainty of all interpolations. It is obviously impossible for us to give guidance on how to 165 incorporate uncertainty into all possible applications, but the Monte Carlo approach we present could be adapted to many situations. For example, the trend analysis shown here could be extended for those users interested in comparing long-term environmental data to weather data to identify associations between an environmental process and weather (Girardin and Wotton, 2009). 170

Limitations and future recommendations
Some aspects of the temporal and spatial scales of HOWNZ could be improved in subsequent refinements of this dataset.
One limitation is that the monthly temporal resolution does not capture extreme but short duration weather events. For example, in July of 1996 there was an extreme cold snap that had significant effects on vegetation in southern New Zealand (Bannister, 2003) but is not evident in our data that averages minimum temperatures over the whole month. We only had 175 access to monthly climatologies on which to base our interpolations, but if openly available weekly, or even daily, climatologies were created then it would be possible to interpolate historical weather at a finer temporal resolution to better capture extreme weather events of short durations.
While improving temporal resolution would require the creation of new climatologies, the spatial resolution could be 180 improved up to 100 m with the climatologies used here. This improvement could be beneficial in the mountainous areas of New Zealand where temperatures can vary considerably within the 1-km resolution of our grids. Future improvements in temporal and spatial resolution would benefit from a more efficient computational workflow. Our computational workflow limited our processing to a 1-km resolution, but future versions could either leverage high performance computing, or to maintain a high degree of openness could continue to use desktop computing but with discrete natural neighbour 185 interpolation leveraging the power of graphics processing units that are well suited to this method (Park et al., 2006). reliably estimate historical weather using our approach before the 20 th century, and that palaeo-environmental techniques 190 may provide a better option for longer-term weather, or perhaps more realistically, climate information (Cook et al., 2006;Duncan et al., 2010). There was a subtle increase in MAE towards the present that may be a function of a reduction in weather station data, particularly rainfall data, and perhaps a growing temporal mismatch between the 1950-1980 climatologies used to aid the interpolation (Figure 4). This indicates that continuing to use interpolation methods may become less feasible in the future. More recent climatologies could be produced to provide more temporally relevant climate 195 data, but satellite data may provide a more useful data source for more recent weather history (Funk et al., 2015). We believe our interpolation approach is most valid for the pre-satellite era, and that by producing data that span part of the satellite era we provide a useful overlap for comparative purposes that could allow for a transition between historical weather data sources.

Data availability
The resulting HOWNZ data

Conclusions
We view HOWNZ as a, rather than the, history of New Zealand weather; it would be possible to repeat the process using equally defensible quantitative methods and obtain different results. Likewise, changes in the spatial or temporal resolution 215 will result in different patterns, but there is no single best resolution as this will vary depending on the desired application.
Nevertheless, as HOWNZ matches the best available spatial and temporal extent and resolution of any currently available  (Table 1), we believe that in creating HOWNZ we have significantly improved the ability for environmental and ecological scientists in New Zealand to understand how changing weather patterns have affected various environmental and ecological processes. Even with the spatially and temporally complex patterns of uncertainty, that are 220 sometimes large, it is still possible to find consistent trends in weather ( Figure 6). We also hope our efforts to produce interpolation estimates with associated uncertainty, and examples of how to build that uncertainty into any analyses, will encourage the quantification and visualisation of uncertainty in other weather and climate data sets.

Author contributions 225
TRE, GLWP, and JMW conceived and developed the idea. TRE developed the code and performed the data processing. TRE, GLWP, and JMW wrote the manuscript.

Competing interests
The authors declare that they have no conflict of interest. 230 in South Otago and Southland, New Zealand, July 1996, N Z J Bot, 41, 555-569, 10.1080/0028825X.2003.9512869, 2003 Cook   temperature (that are similar to mean daily maximum air temperature and mean daily minimum air temperature) and total rainfall. Cells with ≥ 12 records will usually mean at least one weather station is present will records for all 12 months of the year, but in some instances many weather stations may be present resulting in tens or hundreds of records per grid cell. grid of cells (a) each cell is given the value of the nearest data cell d, (b) for any interpolation cell i the interpolated value is the mean of cell values that are as close or closer to the interpolation cell than a data cell, that (c) when repeated for all grid cells produces a natural neighbour interpolation. Natural neighbour interpolation is also used to interpolate (d) the distances to the data cells, and (e) the cross-validated error rate of each data cell. The interpolation uncertainty is then (f) the cross-validation error distance field that is the product of the natural neighbour distances and error rates.