the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
HISDAC-ES: historical settlement data compilation for Spain (1900–2020)
Johannes H. Uhl
Dominic Royé
Keith Burghardt
José A. Aldrey Vázquez
Manuel Borobio Sanchiz
Stefan Leyk
Download
- Final revised paper (published on 26 Oct 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 02 Mar 2023)
- Supplement to the preprint
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2023-53', Tracy Kugler, 31 Mar 2023
HISDAC-ES is a valuable dataset for the study of a variety of dynamic processes in the development of the built environment in Spain. The authors have transformed information from several cadastral datasets into a comprehensive dataset that is far easier to use and directly represents variables likely to be of interest to researchers. The long temporal extent and complete coverage of all of Spain, including urban and rural areas are particularly valuable. The authors have done a laudable job of validating their data to the extent possible, given the dearth of comparable data sources.
I have just a few relatively minor suggestions and questions on the manuscript:
- For readers unfamiliar with the intricacies of Spanish geography, it would be helpful to include a brief background section describing the unique features. This section should highlight the Basque country and Navarra, noting their locations and why they are unique. It should also mention the islands and exclaves that are part of Spain’s territory. This section would serve to orient readers when these areas are mentioned later in the manuscript.
- Section 2.1.2 mentions in that a “common building function classification scheme” was applied. More details about this scheme would be helpful. What building function categories were included in the source datasets, and how were they harmonized into the common scheme? This could be addressed by a table in an appendix.
- Section 4.3 on the long-term trajectory evaluation points out that correlations are highest in teh Southern region. Another striking feature of Fig. 14 is that the correlations in Madrid peak earlier in the time series than for other regions. Is this possibly also attributable to survivorship bias and more and earlier redevelopment around Madrid?
- In Fig. 18, discussed in section 4.6, the completeness of the number of dwellings attribute is notably lower than other attributes nationwide. Should users be concerned about this gap?
- In the video supplement, animations 1-3 use a different set of municipalities than animations 4-6. Why is this?
- See also the attached manuscript with minor edits for clarification in the text.
In spot checking the many available data layers, I discovered a few issues:
- In the HISDAC-ES_All_LAEA subset, the phys_dwel_sum_v1_100 and phys_dwel_mean layers.
- Also in the HISDAC-ES_All_LAEA subset, the phys_bufa_mean_v1_100 appears to have many more 0 cells than expected and to be generally inconsistent with the evol_bufa_v1_100_2020 layer. (See overlay of these two layers at the end of the attached PDF file.)
-
RC2: 'Comment on essd-2023-53', Anonymous Referee #2, 25 Apr 2023
General Comments
This is a very well written paper outlining a very interesting high resolution data set on built-up areas in Spain going backwards in time to 1990 that is also rich in detail, i.e., several variables that correspond to four components related to the state and evolution of the built environment. The introduction is well written and make a clear case for the need for such a data set. The authors have undertaken a considerable evaluation process of the data set using many different sources and acknowledge the limitations, in particular, the survivorship bias. The data are readily available with a doi and are well documented, so it was easy to download and view them. The animated gifs are a nice addition. Overall, this is a really valuable data set with many different potential applications, some of the which the authors refer to in the paper. Having such a data set for all of Europe would be amazing.
Specific Comments
- Line 136, which attributes were retained with the centroids?
- Lines 138/139, what was the common building function classification scheme used? Or is this what you refer to later, i.e., residential, commercial, industrial, agricultural, public services, offices?) Where would a building like a church or museum fall?
- Line 159, spatial aggregation into 100m grid – does this match the CORINE 100m grid?
- Line 190, Why did you not compare with the Copernicus Urban Atlas product? It would also have been interesting to use the Copernicus soil sealing product as an additional evaluation even if this is only possible for more recent years.
- Line 215, you mention that you compared GHS-BUILT with the World Settlement Footprint and there was good agreement but a full evaluation with the latter product would have been useful because it performs better in rural areas than GHS-BUILT, which is what you highlight in your results section. Hence an agreement in urban probably doesn’t reflect this better performance in rural areas.
- Line 222, you refer to Corine being at an original resolution of 30 m but this should be 100m or is this a higher resolution Spanish product that was then provided to the EEA to be harmonized into the 100 m Corine product? There is also a 30m time series product recently produced for CORINE, but you should then reference this.
Technical Corrections
- Line 59, change ‘allow to mitigate’ to ‘all these two shortcomings to be mitigated’
- Line 62, ‘for example, (Uhl and Leyk, 2022a)’ should be ‘for example, Uhl and Leyk (2022a)’
- Line 74, change ‘on over’ to ‘of over’
- Line 85, change ‘European Union’ to ‘EU’
- Lines 113 to 116, numbering of sections described in these lines doesn’t match numbering of the actual sections, e.g., outlook is section 8
- Line 124, there is no section 2.3
- Line 130, change ‘allow accessing’ to ‘allow the building data to be accessed’
- Line 133, add ‘a’ before Web Feature Service
- Line 405, remove space before full stop
- Line 465, moves from section 4 to section 6 so no section 5
Citation: https://doi.org/10.5194/essd-2023-53-RC2 -
RC3: 'Comment on essd-2023-53', Anonymous Referee #3, 26 Apr 2023
Review „HISDAC-ES: Historical Settlement Data Compilation for Spain (1900 - 2020)“
The paper is very interesting, well written and results are clearly presented and evaluated. The dataset presented in this paper, the HISDAC-ES, is a valuable contribution to several fields, from demographic studies to urban planning. I do have some general comments/questions and minor comments that I would like the authors to address.
Comments for the authors:
- What do the authors mean with “built-up intensity” (lines 25, 78, 281)? is it the same as built-up density? I would suggest to briefly define the concept the first time it is mentioned, so there is a common understanding of the concept.
- Line 48: I think it is important to include in the introduction a recent published paper on the effort to homogenize European cadaster data (Milojevic-Dupont, N., Wagner, F., Nachtigall, F. et al. EUBUCCO v0.1: European building stock characteristics in a common and open database for 200+ million individual buildings. Sci Data 10, 147 (2023). https://doi.org/10.1038/s41597-023-02040-2)
- Line 96: please, include the name of the countries that were compared with available open cadaster data (see Milojevic-Dupont et al paper).
- Line 130 & 140: There are four different UTM Zones in Spain (28-31). Why did not the authors work with geodesic coordinates, WGS84, for the spatial intersection of building centroids and the grid? Besides, I wonder if the grids are created in 25830 or do the authors use an existing grid, such as the one from EEA mentioned? If the grid was created in 25830, how does this affect the size of the grids for UTM zones 28, 29 and 31?
- Line 162: “we calculated the sum and the mean of the building units (BUNITS) per dwelling (DWEL) over all buildings within a given grid cell”, I am not sure if I understand well what is it calculated here. Building units are not-residential units, while dwellings are residential units, isn’t? since line 152: “the number of dwellings describes the number of housing units in residential buildings, whereas the number of building units counts the number of units within non-residential buildings”.
- Figure 2: what statistics are calculated at the municipality level? The same ones as for the grid?
- Lines 209-211: Why is the evaluation performed using the municipality boundary and not using the grid? That would show better the urban-rural gradient.
- Since the authors evaluate their dataset against other datasets, I think it would be important to mention the accuracy of those datasets.
- Section 3.5: how are the statistics derived? using the grids whose centroids are within the municipality? why is it not done by using the centroids of the buildings similarly to the grid approach?
- It is unclear why the evaluation with different datasets is done by different spatial units, NUTS, municipalities, etc. and not using, for instance, the level of the datasets that is being compared to the HISDAC-ES or the grid itself.
- The created dataset is “evaluated” against the RS-derived and modeled datasets, but “compared” to the historical maps and orthophotos. Do the authors refer to two different evaluations? Is yes, please, clarify.
- Lines 442: what does “the building density in the small, rural communities around Hornillos del Camino is similar to the densities in the center parts of the large cities” mean? That the density is high in the rural areas? Or low in the cities?
- Line 460: I am aware that the number of floors is in fact available in the cadaster from Spain. One can check this in the official web map https://www1.sedecatastro.gob.es/cartografia/mapa.aspx. The information can be obtained from the CAT files ( See file: Tipo 14: Registro de Construcción – Planta, see: https://www.catastro.minhap.es/documentos/formatos_intercambio/catastro_fin_cat_2006.pdf. Regarding the ATOM files, based on the following document is also available: https://www.catastro.minhap.es/webinspire/documentos/Conjuntos%20de%20datos.pdf. The field “bu-ext2d:numberOfFloorsAboveGround” from “BuildingPart”.
Minor comments:
- Line 37, 43: citing style. The commas are missing.
- Line 53-54: revise the commas, is an “and” missing in “, or semantic inconsistencies, incompatibilities”?
- Line 60: I suggest to use a more recent reference, for example, Milojevic-Dupont et al., (2023).
- Line 61: regarding the demographic applications of cadaster data, a recent study compared the performance of different methods and datasets, RS-derived data versus cadaster information, and the latter produced better results. HISDAC-ES could be used in many applications, for example in the field of population estimations (see: Sapena M, Kühnl M, Wurm M, Patino JE, Duque JC, Taubenböck H (2022) Empiric recommendations for population disaggregation under different data scenarios. PLoS ONE 17(9): e0274504. https://doi.org/10.1371/journal.pone.0274504)
- Line 66: I would avoid the use of “we” when describing previous studies even if they are from the authors. Line 73, for example: “Specifically, in previous work, the Zillow Transaction and Assessment Dataset was employed…”. Line 140: also for “we decided”.
- Line 94: unclosed parenthesis.
- I suggest reducing the use of “INSPIRE-conforming” when referring to the cadaster buildings, since once is explain is not necessary information and without it the readability is better.
- Line 164: are the sum and the mean calculated for both, BIA and BUFA? With “respectively” it seams that the sum is for BIA and mean for BUFA.
- Line 236: please, add the level of the NUTS.
- Line 285: I would remove “surfaces” since the authors are referring to the building density, which is not a surface, and BUFA already implies surface in the building footprint.
- Lines 362-366: I wonder if INSPIRE land uses or INSPIRE building is the right term to refer to the Spanish cadaster buildings following INSPIRE.
- Line 391: typos: “HSDAC” and “sme”
- Figures:
- Figure 9: I think the maps could be improved by combining the information into one. For example: adding 3 classes, developed land in 1900, in 2020, and not developed for each region.
- “Fig.”10 = Figure 10.
- Figure 12: As I understand (b) shows the metrics per municipality aggregated by date, what is (c) showing? The global metrics for the entire country?
- Figure 13: since the authors added “columns” I would also add “rows” for the Corine classes in the caption.
- Tables:
- Table 2: “Building indoor” without capital letter. “surface name” since not all parameters are surfaces, I wonder if there is a better way to call this column.
- Table 3: I would include all the dates that are available: 1975, 1990, 2000 and 2014, instead of 1975-2014, otherwise might seem like an annual product.
- Table 4: avoid two times “digitized”.
- Appendices: I suggest to give a brief description/title to each appendix A, B, etc.
- Figure A1: I would combine the 2015 and 1990 map into one, to show better the growth and the differences between these datasets.
- Figure B2: Similar to the comment above, I think that combining this 4-time-step maps into one per city will show better the evolution.
Citation: https://doi.org/10.5194/essd-2023-53-RC3 -
CC1: 'Comment on essd-2023-53', F. J. Goerlich, 26 Apr 2023
HISDAC-ES is a dataset with great potential, both for its coverage and for the period it covers (1900 - 2020). One of the major contributions is the integration of the 5 cadastres of Spain. Four of them cover only one of the 52 provinces and have -each of them- a different data model from the cadastre of the rest of Spain -which covers the remaining 48 provinces-.
In my opinion many of the details of the data models of the different cadastres should be briefly explained somewhere, since -as seems natural- the criteria guiding the elaboration of the database -functional categories or the distinction between dwellings and building units, for example- are determined by the cadastre with the largest coverage, which results in a lower representativeness of certain variables in the Basque Country and Navarre. In fact, the only totally homogeneous variable is the footprint of buildings (bufa).
The validation effort is enormous, although limited by the arguments put forward by the authors.
As described in the title, this is more a compilation than a harmonisation. The effort to include the cadastres of the Basque Country and Navarre is important, but there is still an effort to harmonise variables of the type being carried out by databases such as EUBUCCO v0.1 (https://www.nature.com/articles/s41597-023-02040-2) with the development of methodologies to complete variables (https://dx.plos.org/10.1371/journal.pone.0242010) based on urban morphology. Clearly this is outside the scope of the paper, but it represents the next step given the enormous amount of information contained in the cadastres.
Analysis of a small part of the huge amount of information provided reveals small discrepancies which, while probably not affecting the underlying trends in the data, are difficult to understand from the point of view of the user who wants to make use of the data.
The numbers below come from an attempt to generate population grids for census years since 1900 with a methodology similar to that used in the GHSL-POB from the information provided. Additional details are available if required. All calculations mentioned below use the contours provided by the database, which interestingly has a lot of slivers -slivers that are not present in the boundary line database of the National Geographic Institute (Centro Nacional de Información Geográfica)-.
Minor inconsistencies in the information
- It is not true that the zonal statistics provide information on the 8,131 municipalities currently existing in Spain (section 3.5). What the analysis of this information reveals is that there is only information on 8,124 municipalities, those existing on 01/01/2018.
- Furthermore, in the "hisdac_es_municipality_stats_completeness_v1" files, there are 8,169 records, as there are 45 records -only 6 of them with buildings- which correspond to territories not belonging to municipalities - -condominiums or “territories mancomunados”- all of them in Navarre. There are other territories in Spain with these characteristics in other provinces, which, however, they do not appear in the database. Note that the cadastral databases also have information on municipal boundaries, which do not coincide exactly with the boundary lines of the National Geographic Institute (Centro Nacional de Información Geográfica).
- CatastRo package, mentioned in section 7, only allows the download of the Cadastre of the General Directorate of Cadastre, 48 provinces, but not of the provinces of the Basque Country and Navarre. CatastRoNav package (https://ropenspain.github.io/CatastRoNav/) can be used by R users to download data from the cadastre of Navarra. There is no such facility for the cadastres of the Basque Country.
- There are some numerical discrepancies between the raster information and that of the descriptive statistics files at the municipality level, at least for the bufa variable. In the descriptive statistics files, we always find more built-up area (bufa_sum) than in the raster files. These discrepancies are about 5% at the beginning of the period, but exceed 11% by the end of the period, which is not negligible, and has no clear explanation.
- The analysis of internal consistency (completeness of attributes in section 4.6) relies on the visual impression of figure 18, but it is likely that tables aggregated to province or regional level would be more illustrative here. These tables reveal clear problems in some variables in the cadastres of the Basque Country and Navarra, with more heterogeneity within these cadastres.
- Also, the number of floors of the building (floors) could have been used to estimate the indoor area (bia), as there is a clear complementarity between these variables in terms of missingness.
- The number of dwellings is much less representative than the other variables in the database. At the national level the percentage of buildings with no value for this variable (dwellings) is 28%. This fact contrasts with the high completeness for the variable building units (bunits). However, from my point of view, it is not clear from the text (line 151 and 152, page 6, and then line 162) how these two variables are calculated from the original information (which classifies a building according to its use and, given that, the number of dwellings and the number of building units are stated- this for the General Directorate of Cadastre).
- Note, in passing, that in the General Directorate of Cadastre, INSPIRE ATOM services, there is information on the number of floors in the “BuildingPart” files. So, this information exists generally, but in another place.
- In 1900 there are 183 municipalities without buildings (in the file of zonal statistics, in the rasters it happens only in 172 municipalities). All municipalities have built-up area (bufa) only from 1970 onwards. Numerical analyses of this style may shed more light on the survival bias, mentioned by the authors, and the quality of the data at the beginning of the 20th century. An Excel file with some of this information is attached.
- In the statistics by municipality appears the variable municipal area (muni_area_sqm). One would expect this variable to be invariant over time. However, there are some municipalities with value 0 in some years, which coincide exactly with the municipalities and years with no buildings. It is not clear where this variable comes from. In addition, this variable is superfluous, as the municipal area can be calculated from the vector layer.
Potentially useful additional information
- It would be useful to know the date of download of the data. The General Directorate of Cadastre updates the INSPIRE Cadastre data twice a year.
- Since the code generating the information is public (https://github.com/johannesuhl/hisdac-es), it would be useful to make the original data available. Although the summary of the information is adequate, another treatment of the original data might be more suitable for certain purposes. For example, for the generation of historical population grids by dasymetric methods, it would be useful to have the built-up area (bufa) by residential use -currently this variable is only available in density format- or the building height -bia- by years and/or use.
- AC1: 'Comment on essd-2023-53', Johannes Uhl, 12 Jul 2023