Mid-19th-century building structure locations in Galicia and Austrian Silesia under the Habsburg Monarchy

. We produced a reconstruction of mid-19th-century building structure locations in former Galicia and Austrian Silesia (parts of the Habsburg Monarchy), which are located in present-day Czechia, Poland, and Ukraine and cover more than 80 000 km 2 . Our reconstruction was based on a homogeneous series of detailed Second Military Survey maps (1 : 28800) that were the result of a cadastral mapping (1 : 2880) generalization. The dataset consists of two types of building structures based on the original map legend – residential and out-buildings (mainly farm-related buildings). The dataset’s accuracy was assessed quantitatively and qualitatively by using independent data sources and may serve as an important input in studying long-term socioeconomic processes and human–environmental interactions or as a valuable reference for continental settlement reconstructions. The dataset is available at https://doi.org/10.17632/md8jp9ny9z.2 (Kaim et al., 2020a).


Introduction
Although the human impact on Earth has been ongoing for millennia (Stephens et al., 2019), it has accelerated since the mid-19th century with the development of industry, transport infrastructure, and land use changes (Fischer-Kowalski et al., 2014).In many regions of Europe, this has been a time of minimal forest cover due to high use from both agriculture and industry (Gingrich et al., 2019;Jepsen et al., 2015).Although many land use reconstructions have covered this period, they have usually focused on the dominant land uses (Fuchs et al., 2013;Lieskovský et al., 2018) or, if they are global, have offered a generalized view of settlements (Hurtt et al., 2011;Klein Goldewijk et al., 2010).Detailed, largescale historical settlement data are either missing or highly uncertain (Lieskovský et al., 2018).Only recently have largescale, long-term, and highly accurate settlement reconstructions become available to scholars (Leyk and Uhl, 2018).As human impacts on the landscape may result in long-lasting legacies (Fuchs et al., 2016;Munteanu et al., 2017), highquality, spatially explicit historical settlement data need to be produced and shared.In the past, housing structures impacted the appearance of invasive species (Gavier-Pizarro et al., 2010) and increased the demand for forest litter, which resulted in reduced soil carbon pools (Gimmi et al., 2013) or triggered the development and persistence of the wildlandurban interface over time (Kaim et al., 2018).These examples show that the existence of easily accessible, high-quality data on historical settlements may contribute to a better understanding of future human impacts on the environment.
In this paper, we introduce a dataset that includes more than 1.3 million1 building structure locations up to the mid-19th century detected in parts of what is now Poland, Ukraine, and Czechia, which were formerly parts of the Habsburg Monarchy (Austrian Empire).The dataset contains the exact locations of residential and farm buildings in a territory that covers more than 80 000 km 2 .Our database captures the situation just before rapid industrialization (Frank, 2005), massive inter-continental migration (Praszalowicz, 2003), and profound land use changes, which were a result of societal and political changes in the region (Munteanu et al.,-century building structure locations 2014).Our database can be used as a stand-alone dataset for a variety of human-related analyses in the environmental and social sciences or as reference data for broad-scale (i.e.continental) reconstructions.

Study area
The data were collected for parts of what is now Poland, Ukraine, and Czechia that belonged to the Habsburg Monarchy (Austrian Empire) in the mid-19th century.These areas were called Austrian Silesia and Galicia at the time (Fig. 1).Austrian Silesia (with more than 80 % of its area in present-day Czechia and less than 20 % in Poland) was the small southernmost part of the Silesia region, and it remained in the Habsburg Monarchy after Silesia's division in 1742.It consisted of two historical parts -Tesin Silesia and Opava Silesia -where Opava (Troppau) was the largest city (16 608 inh., inhabitants, 1869;Bevölkerung, 1871).Galicia was an Austrian name introduced for part of the Crown of the Kingdom of Poland territory when it was annexed by Austria in 1772 (∼ 40 % of its area is in present-day Poland and the rest is in Ukraine)2 .Galicia was one of the largest and most populated crown lands in the Austrian Empire and Austria-Hungary, and agriculture dominated its economy.Two prominent cities in Galicia were Lviv (87 109 inh., 1869) and Kraków (49 835 inh., 1869;Bevölkerung, 1871).The regions, as neighbouring areas, were closely connected based on social and economic reasons, which makes it rational to present them together, especially taking into account their legacies in later decades.

Historical maps
The reconstruction was based on a homogeneous set of Second Military Survey maps, which were acquired from the Austrian State Archives in Vienna in the form of scanned .tiffiles (at 300 DPI).The maps for Austrian Silesia (42 map sheets) were published in the period of 1837-1841, and the maps for Galicia (412 map sheets) were published in the period of 1861-1864.One map sheet located in the northeastern part of Galicia was not available in the archive and could not be used in the study.The scale of the map is 1 : 28 800, and it was produced as a result of a generalization and update of cadastral maps (1 : 2880) for military purposes (Konias, 2000).Cadastral maps were prepared in the periods of 1824-1830and 1833-1836(Silesia) and 1824-1830and 1844-1854 (Galicia).The Second Military Survey was the first empire-wide topographic mapping initiative based on a proper map projection (Affek, 2015;Skaloš et al., 2011;Timár et al., 2010).Due to the high quality of the maps, their relatively low positional errors, and their large catalogue of land use categories, they are often used in land use reconstructions for different parts of the Habsburg Empire (Feurdean et al., 2017;Kaim et al., 2016;Munteanu et al., 2015;Pavelková et al., 2016).

Building images in the Second Military Survey
Although the map scales of cadastral mapping and military mapping differ substantially, the images of the buildings on the Second Military Survey are detailed (Fig. 2).However, to assess the differences between the maps, we first conducted a systematic comparison of the maps by comparing the building information presented by the Second Military Survey to that in the cadastral maps.The procedure aimed at assessing the impact of map generalization on the potential number of structures that we could acquire.We selected 10 case study areas located in the different parts of Galicia (eight cases) and Austrian Silesia (two cases) that represent different landscape conditions.The selection was determined by the availability of the cadastral maps, which was a true obstacle, especially for the Galician part of the study area.Finally, we used the resources available at http: //www.szukajwarchiwach.gov.pl(last access: 28 July 2020), an official website for documents stored in the Polish national archives, and at https://www.geshergalicia.org(last access: 28 July 2020), a non-profit organization that supports Jewish genealogical and historical research on Galicia.The maps presented on the website were originally stored in the national archives in Poland and Ukraine.The cadastral maps from Austrian Silesia were consulted on the website of the State Administration of Land Surveying and Cadastre of Czechia at https://archivnimapy.cuzk.cz(last access: 28 July 2020).We selected easily identifiable parts of villages and towns, counted the building structures on the cadastral maps, and compared them to the Second Military Survey maps (Table 1).We found that although the structure images in the Second Military Survey maps are very detailed, historical cartographers had to employ generalization procedures.On average, the number of buildings presented on the Second Military Survey maps was nearly 85 % of the number of buildings presented on the cadastral maps (Table 1); however, the results in towns were lower, and the results in rural areas were higher due to the generalization procedures.Despite the differences, we decided that the building structures are presented with very high quality, which makes it possible to obtain reasonably accurate structures (Fig. 2).

Geometric correction and georeferencing
Different referenced data were employed to georeference the maps.In the case of the Polish part of Austrian Silesia and  Galicia, Polish topographic maps from the 1970s at a scale of 1 : 25 000 were used.Maps elaborated in the Polish 1965 coordinate system based on the Pulkovo-42 reference frame were obtained as raster images transformed to the PL-1992 coordinate system based on the ETRF-89 reference frame (explanations for the terms used in this paragraph can be found in Appendix A).In the case of the Ukrainian part of Galicia, high-resolution World Imagery and DigitalGlobe imagery and Soviet military topographic maps at scales of 1 : 25 000, 1 : 50 000 and 1 : 100 000 were used.The Soviet maps elaborated in the 1942 coordinate system based on the Pulkovo-42 reference frame were transformed to the proper zone of the UTM coordinate system.In the case of the Czech part of Austrian Silesia, the local-level administrative boundaries were used as georeferenced information.
Original map sheets from the Second Military Survey were cropped along the map frame.Each cropped image was processed separately by using at least 20 control points per sheet.The points were chosen from triangulation points, historical buildings (e.g.churches), recognizable crossroads, bridges, viaducts, and local administrative boundaries.If such points were lacking, then river/stream connections were also used.Geometric correction and georeferencing to the PL-1992 or UTM coordinate system were obtained with 2nd-order polynomial transformation.For the map sheets with a low cover-age along the borderland, a 1st-order polynomial transformation was applied.The total root mean square error (RMSE) for most sheets reached values between 10 and 30 m and occasionally exceeded 30 m, which indicates the level of geometric accuracy of the final dataset.

Building structure acquisition
The maps show two main categories of buildings in different colours.Red (German Wohngebäude) indicates mainly residential buildings (but also includes some churches, monasteries, town halls, and railway stations; in total, approximately 1 % of the "red" buildings were non-residential).Black (German Wirtschaftsgebäude) includes farm or agricultural buildings (but also includes similar exceptions as those mentioned above) (Zaffauk, 1889).Although we are aware that among residential buildings, there are some non-residential structures and that among farm-related buildings, there are some craft-related structures or warehouses, we wanted to be consistent with the map content, and we decided to acquire all the structures according to these two main categories.A validation of the more specific information on building usage would require using different methods, consulting local independent sources, which is beyond the scope of our work, and taking into account the area under study compared to, for example, counting the number of houses (for details, see Sect.3.2 "Completeness -reference to census data at the district level").We present, however, some potential exceptions to show the reader what might also be found in the database (Fig. 3).The division of the two main building categories was not related to the materials used to construct the building (e.g.wood, bricks, and stones), as was presented on the cadastral maps, apart from the black structures surrounded by the red border, which mean outbuildings built of stone or brick.We used a semi-automatic, colour-based method to acquire residential buildings (Fig. 4) and a manual vectorization for farm buildings.In the first step, the training data were manually digitized for 12 randomly selected map sheets.The training polygons included two classes, namely, buildings and all other objects.Based on the training data, signatures for red, green, and blue raster bands were produced and used for map sheet classification.Classified raster images were then converted into vectors out of which the initial building structures were identified.These initial structures were then classified as buildings based on the typical size and shape of the map symbols that represent building structures on the map.The initially classified structures were filtered to remove all objects with an area of less than 65 m 2 and with a length-to-area ratio higher than 0.6.The threshold values used in the procedure were based on the values found on the map and partly depended on the map scale (i.e. the minimal structure size could not be lower than 0.5 mm, which is ∼ 15 m in the map scale).Only the shapes that met the criteria of a defined size and shape were further processed.The procedure described above was then performed for the other map sheets by using a loop; however, the initial training data had to be defined separately several times due to differences in the map sheets' quality in different regions.After this semi-automatic procedure, each map sheet was also verified manually to eliminate commissions (e.g. the points along roads marked with red) and omissions (e.g.missing structures in high-density housing areas); on the one hand, this was a time-consuming process, but on the other hand, it assured the high quality of the final product.As black was a widespread colour on the map, we acquired all the farm buildings through manual vectorization, as visually inspecting the errors was a timehttps://doi.org/10.5194/essd-13-1693-2021 Earth Syst.Sci.Data, 13, 1693-1709, 2021 consuming process.All the buildings were finally combined into one layer and attributed a function -residential (originally red) or farm (originally black).Additionally, we assigned each building a date based on when the map sheet was published.The final layer was transferred to the Lambert azimuthal equal-area (LAEA) coordinate system.The acquisition work was performed with ArcMap classification and spatial analysis tools.

Technical validation
The data presented in the paper were subject to several accuracy assessments.We assessed the acquisition accuracy and referred to the data with census data at different administrative levels.Additionally, we verified the number of buildings with the textual information presented on the original map sheets in the form of building number summaries and used auxiliary data such as cadastral maps as needed.

Relation between the mapped structures and the vectorized structures
The relation between the structures presented on the Second Military Survey maps and the structures captured in our database was assessed by comparing the numbers of both values in randomly selected, non-overlapping circles (300 m ra- tio; area -28.27 ha) located across the study area.First, we selected 1000 circles, verified them visually, and counted the structures found on the map.Then, we removed from the next steps of the comparison the circles in the places where there were no buildings either on the map or in the database.The final number of test circles was thus reduced to 311, which contained 4791 structures from the database.This number resulted in a 1.86 % margin of error for the entire study area with a 99 % confidence level (population size -1 305 233).
After comparing the number of structures, we calculated the root mean square error (RMSE) and the correlations (Pearson's r) between the structures' sums on the map and in the database for all 311 test areas.The RMSE was based on the following formula (Eq.1): where P i signifies predicted values (vectorized building structures), O i observed values (building structures on the maps), and n sample size (number of test areas).
The procedure was employed for the three conditions of residential buildings, farm-related buildings, and all buildings.Additionally, the results of the accuracy assessment were represented by a confusion matrix that compared the user's accuracy, producer's accuracy, overall accuracy, the kappa coefficient of agreement, and the F score, which is a harmonic mean of the producer's and user's accuracy (Fawcett, 2006;Leyk and Uhl, 2018).
The results show that the number of buildings present on the maps were very similar to the numbers that we acquired.The RMSE values were equal to 1.30 for the condition with all buildings, 0.87 for the condition with only residential buildings, and 1.19 for the condition with only farmrelated buildings (the mean values of the structures found in the test circles on the maps were 15.4, 10.3, and 5.1, respectively).The correlations between the structures presented on the maps and the buildings that we acquired were also very high, specifically, r = 0.999 for the total number of buildings, r = 0.998 for residential buildings, and r = 0.994 for farm-related structures (Fig. 5).
The overall accuracy for all buildings that we acquired was 95.03 %; however, it was higher for residential buildings (97.46 %) than for farm-related structures (96.66 %).Similarly, the slightly higher quality of the residential building class was supported by the F score (Table 2).The kappa coefficient for the classification procedure was 0.89.

Completeness -reference to census data at the district level
To verify the total number of houses acquired in our procedure with the independent source, we compared the number of vectorized structures with the information from the census data at the district level for the entire study area (n = 99).The censuses closest in time to the publication of the maps were organized in 1857 for Austrian Silesia (n = 23) and in 1869 for Galicia (n = 76).Although there is a time difference between the maps and the census data (∼ 18 years in Austrian Silesia and ∼ 7 years in Galicia), there was no better option to compare the number of buildings for these regions due to the timing of the censuses.Additionally, we could verify in the sources only the number of residential buildings (which account for 69 % of our structures) as the census did not contain information on farm-related structures.The respective district map with additional attribute information including the year of the census, year of map creation (the dominating value for the district unit), time difference between the map and census dates, and number of houses according to the census was attached to the dataset to help define the potential uncertainties responsible for the differences.Apart from the statistical information based on the censuses, the abovementioned layer also consists of district-level information on main road accessibility based on the Second Military Survey road network (Kaim et al., 2020b) and information on topography based on the SRTM digital elevation model (DEM; Farr et al., 2007).A full list of the attributes can be found in the "Data availability" section, and some of the variables are also presented in the form of maps in Appendix B.
The results show that the number of houses recorded in the census data and captured in our database differed, but the difference was not great.The censuses indicated 914 107 structures, whereas we acquired 897 020 buildings of a similar type for the entire study area.However, regionally, the differences were diverse.A comparison at the district level indicated that on average, we acquired 99.4 % of the houses recorded by the census data, but the differences among the districts were substantial (Fig. 6).We found that the number of vectorized residential buildings for the districts located in Austrian Silesia was usually higher than the number recorded in the census.At the same time, in Galicia, the differences were wider ranging as both overestimations and underestimations were found.In one district (Staremiasto; Staryj Sambir, Ukrainian ) the number of houses that we vectorized was less than 70 % of the houses recorded by the census.Interestingly, however, when we compare all the structures that we acquired from the maps (residential and farm-related buildings together), their sum accounts for 98.9 % of the houses recorded in the census for this district.This may suggest that the building division presented on the map might have been understood in a different way by different cartographers.However, this hypothesis can only be confirmed through additional research and deeper study, which is only partly conducted within this paper (see Sect. 3.4 below).Unfortunately, the original map instructions for the Second Military Survey are not available in the archives and cannot be consulted.Using a set of uncertainty-related variables, we produced a set of correlations between the percentage difference in the number of houses in the database and the census for the population density, road accessibility, time difference between the map and census publication, mean elevation, and mean slope.The only statistically significant correlation (p < 0.05) was the correlation with the time difference between the map and census publication, but it was relatively low -r = 0.217.

Completeness -reference to map frame information
Each map sheet (approx.15 × 15 km) of the Second Military Survey has additional textual information in the frame, where the basic statistics that are important from a military point of view are presented.The statistics, which are usually presented at the village level, include the number of houses, number of stables, and number of people and horses that could be stationed there.We used this information to verify the number of houses that we captured in the database https://doi.org/10.5194/essd-13-1693-2021 Earth Syst.Sci.Data, 13, 1693-1709, 2021 by choosing 10 evenly distributed map sheets (2 from Austrian Silesia, 4 from the western part of Galicia, and 4 from the eastern part of Galicia; Fig. 7a) that represent different landscape conditions (e.g.lowlands, foothills, and mountainous areas) and to compare the number of vectorized building structures within them at the village level (Fig. 7b).Since the number of stables was not fully comparable with the number of farm buildings in our database, we compared only the number of houses.In some cases, the villages were split into neighbouring map sheets, and corrections, including adding or removing some buildings located within the specified villages, had to be implemented (Fig. 7c).In two cases, however, we found that the number of houses for the village was not listed as the two neighbouring map sheets each informed that the information was available on the other sheet.Altogether, information from 283 towns and villages on the 10 selected map sheets was summarized.
The comparison showed that in most cases (7 out of 10), the number of houses that we captured in the database was higher than the number presented in the map frame.The differences ranged from 0.4 % to 54.7 %, with an average difference of 14 % (Table 3).A more detailed explanation of the potential reasons for this is partly presented in Sect.3.4, where the local level analyses are presented.

Completeness -reference to census data and map sheet information at the local level
The comparisons with census data at the district level and map frame information at the map sheet level showed that although on average our database captured information on houses relatively well, local differences were substantial.To better understand the nature and potential explanations of these local differences, we present a few situations below Earth Syst.Sci.Data, 13, 1693-1709, 2021 https://doi.org/10.5194/essd-13-1693-2021where we address the underestimation or overestimation between our structures and the reference data (Table 4).

Underestimation
The analyses at the district level showed that in extreme cases, the number of houses that we covered in the database was more than 30 % lower than this number in the census data.We chose two villages -Jaworki and Milcza -as examples to analyse in detail.In both cases, the differences were substantial; in Jaworki, we captured slightly more than 70.5 % of the number of houses in the census, and in Milcza, we captured 43 % of the number of houses in the census data.The example of Jaworki shows that the map frame information gave very similar values to those presented in the census.At the same time, the map frame provided information about the relatively low number of people who could be housed there.The number is very low when compared to this number in other villages that we analysed, although some of the villages were located in similar mountainous conditions.The ethnological research performed in the village in the first half of the 20th century confirmed a very low standard of living there, even compared to the standard of living in neighbouring areas (Reinfuss, 1947).Jaworki had an unusual system of seasonal farm buildings located higher in the mountains that were inhabited by shepherds in the summer season.Our data show that in the village, the number of farm-related structures was even higher than the number of houses, which is also unusual.We hypothesize that some of the inhabited buildings were classified as farm-related on the map but were residential in reality.This could explain the difference between our data and the census data.
In Milcza, both the census data and the map frame information confirmed a much higher number of houses than we captured in the database.However, in this case, the percentage of buildings in the database was even lower than the values observed in extreme situations at the district level (Fig. 6) as we captured only 43 % of the number of buildings in the census.Since the map frame information confirmed the values from the census, which were substantially different from the values in our database, we consulted the original cadastral maps to compare them with the Second Military Survey maps.The comparison showed that although the cadastral maps (1851) indicated 99 buildings, the Second Military Survey maps (1861/1862) showed a total of only 49 buildings, including 37 residential buildings.This confirms an unprecedented level of map generalization here when compared to that in other areas (Table 1; Fig. 8).It also explains the level of underestimation that we noticed in the database when compared to other, independent sources.

Overestimation
The analysis conducted for the village of Milówka, located in the western part of the Carpathians (western Galicia), showed that our database captured a slightly higher number of structures than that indicated in the 1869 census, and it captured a substantially higher number than that indicated in the map frame summary (Table 3).In the map frame, however, Milówka also contained the hamlet of Sucha Góra, which formally belonged to the main village, although the statistics for the hamlets were kept separately in some cases, potentially for strategic reasons.The census data were published for the commune level, and only some of the hamlets were indicated separately.Adding the numbers from the main vilhttps://doi.org/10.5194/essd-13-1693-2021 Earth Syst.Sci.Data, 13, 1693-1709, 2021 lage and the hamlet together makes the difference between our database and the census versus map frame statistics substantially smaller (Table 3).Potentially, the hamlets could have been moved from one village to another over time, which in some cases, makes comparisons over longer time periods difficult (Ostafin et al., 2020).
In the same region, relatively close to Milówka (< 15 km), we also found that compared to the data in the census and map frame information, our dataset substantially overestimated the number of houses in the village of Trzebinia (our data had more than 170 % of the number listed in the 1869 census).Although we consulted cadastral maps (1844: 169 buildings, including houses) and the 1880 census data (89 houses) and verified the potential administrative boundary changes, we could not find any objective reason for such a large difference.We must bear in mind that the mid-19th century was a time of dramatic political movements, natural disasters, diseases, and famine in Galicia, which resulted in the most dramatic population decrease in over 100 years (Zamorski, 1989).It is difficult to determine whether these events were responsible for the reduction in the number of houses over such a short period of time.It is also beyond the scope of the data descriptor to explain the socioeconomic background in detail at the local level.Although the differences that we observed were on average much lower than in this extreme case, we provide this example to show potential database users that such situations are possible.

Data availability
The dataset is available at https://doi.org/10.17632/md8jp9ny9z.2 (Kaim et al., 2020a).The data are stored in an open, widely used shapefile (.shp) format, which may be opened in GIS software (including open-source software, such as QGIS).The shapefile format consists of three mandatory files (.shp, .shx,.dbf)and a set of non-mandatory files.In the case of our file, the complete set of files include the following.comment -if map sheet production dates were not specified, then we analysed the dates of neighbouring sheets and added it here as the most probable period An additional layer that addresses the uncertainty-related attributes is also included in the dataset as a separate layer -uncertainty_metadata.shp.It contains a map of all the districts of the study area and covers a set of attributes that are helpful to any uncertainty analysis.The statistical census data for Austrian Silesia are based on the 1857 census and for Galicia, on the 1869 census.The list of attributes includes the following.
District -name of the district according to the census Man -number of men Women -number of women M_and_W -number of men and women combined census_h -number of houses according to the census database_h -number of houses according to the database per_of_cen -houses in the database as a percentage of houses recorded in the census cens_date -census date map_date -map date (if the map was produced for more than 1 year, then the date closest to the census was taken) time_diff -number of years between the census and map publication area_ha -the area of the district based on shapefile polygon geometry (hectares) pop_dens -population density based on "M_and_W" and "area_ ha" attributes (people/km 2 ) mean_dist -mean distance to the main roads -four categories of roads based on Kaim et al. (2020b) mean_elev -mean elevation (metres above sea level based on SRTM DEM) mean_slope -mean slope (degrees based on SRTM) slope_rg -slope range in the district (degrees based on SRTM) farm_b -number of farm-related structures based on the database The files are compressed in .7zformat and can be unpacked by using, for example, 7-Zip (https://www.7-zip.org/,Kaim et al., 2020a).

Conclusions
The data descriptor presents the complete coverage of the mid-19th-century building structure locations in the historical regions of Galicia and Austrian Silesia in Central Europe.The dataset covers more than 1.3 million objects, including houses and farm-related buildings.This is the first such large and detailed database in the region.The dataset is based on the Second Military Survey maps (1 : 28 800), which were the result of a cadastral mapping (1 : 2880) generalization for military purposes and thus offered a much higher level of detail than earlier (e.g. the First Military Survey, 1 : 28 800) or later (e.g.black and white, BW, editions of the Third Military Survey, 1 : 25 000) mapping sources in the area.This is also the only source of information on the number and location of farm-related structures at the time as they were not included in other, independent datasets.
The technical validation of the database showed a high level of object completeness when compared to different independent sources.Nevertheless, there were some discrepancies in the number of houses that we acquired and the number according to the census data, map frame information, and cadastral mapping.However, we attempted to explain the types and reasons for the potential differences.Considering the size of the study area and the number of structures that we acquired, local differences cannot be explained here as they go beyond the scope of the data descriptor.We hope, however, that making this dataset available and adding a set of uncertainty-related variables will enable further analysis and improve the knowledge of the differences among the datasets.Our dataset may serve as a valuable source of information not only for scientists who study the drivers and legacies of land use changes but also for scholars who study changes in the standard of living, which potentially influences decisions on migrations.Environmental scientists may also be able to use the data, especially when combining them with other land use and environmental variables.Since the time that we capture in the database shows the moment just before the important social and economic processes of industrialization and urbanization, we believe that it may also contribute to a broad range of studies on the Anthropocene.https://doi.org/10.5194/essd-13-1693-2021 Earth Syst.Sci.Data, 13, 1693Data, 13, -1709Data, 13, , 2021

Figure 1 .
Figure 1.Study area; lower left corner presents a small portion of the maps used in the study (source: Austrian State Archives).

Figure 3 .
Figure 3. Examples of non-residential structures marked in red on the source map (a, b) and, less common, outbuildings marked in black (c, d); (a) monastery, (b) church, (c) sheepfold, and (d) railway station.Please note that in (c), the textual information (Schloss) is related to the building marked in red (source: Austrian State Archives).

Figure 4 .
Figure 4. Semi-automatic procedure for acquiring residential building structures (originally marked in red) from historical maps.

Figure 5 .
Figure 5. Pearson's correlations r between the number of buildings shown on the maps and the number of structures acquired in the dataset for all buildings (a), residential buildings (b), and farm-related buildings (c), n = 311.The lower panel is the location of the test areas.

Figure 6 .
Figure 6.Number of vectorized residential structures compared to the census data at the district level for the entire study area (n = 99), Austrian Silesia (n = 23), and Galicia (n = 76).

Figure 7 .
Figure 7. Verification of the building structure numbers acquired from the maps with map frame information.Location of 10 evenly distributed map sheets (a); building locations and map frame information (b); and information noting that the statistics for the selected villages are available on the neighbouring map sheet (c) (source: Austrian State Archives).

Figure B2 .
Figure B2.Uncertainty-related variables presented at the district level: mean distance to the main road (a), mean elevation (b), and mean slope (c).

Table 1 .
Comparison of the number of building structures presented on the cadastral maps (1 :

Table 3 .
Comparison of the number of houses with the statistics presented in the map frames (number of houses according to the map frame = 100 %).

Table 4 .
Number of houses captured in our database related to the census data and map frame information -examples of underestimation and overestimation.* Summary of two cadastral villages, Jaworki I Theil and Jaworki II Theil; * * houses for Milówka, and Sucha Góra also covered.
Figure 8.Comparison of a cadastral map (1851) and Second Military Survey map (1861/1862) that indicates a very high level of map generalization (source: National Archive in Przemyśl, Austrian State Archives).