the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Countrywide Digital Surface Models and Vegetation Height Models from Historical Aerial Images
Abstract. Historical aerial images, captured by film cameras in the previous century, are valuable resources for quantifying Earth’s surface and landscape changes over time. In the post-war period, these images were often acquired to create topographic maps, resulting in the acquisition of large-scale aerial photographs with stereo coverage. Photogrammetric techniques applied to these stereo images enable the extraction of 3D information to reconstruct digital surface models (DSMs) and orthoimages.
Here, we present a highly automated photogrammetric approach for generating countrywide DSMs of Switzerland, at a 1 m resolution, from approximately 40,000 scanned aerial stereo images acquired between 1979 and 2006, with known exterior and interior orientation. We derived four countrywide DSMs for the epochs 1979–1985, 1985–1991, 1991–1998, and 1998–2006. From the DSMs, we generated corresponding countrywide vegetation height models (VHMs). We assessed the quality of the historical DSMs at the country scale and within six representative study sites, evaluating the vertical accuracy and the completeness of image-matching across different land cover types.
Mean completeness ranged from 64 % for ‘glacial and perpetual snow’ to 98 % for ‘sealed surfaces’, with a value of 93 % for the ‘closed forest’ class. Across Switzerland, the median elevation accuracy of the historical DSMs compared with a reference digital terrain model (DTM) on sealed surface points ranged from 0.28 to 0.53 m, with a normalised median absolute deviation (NMAD) of around 1 m and a maximum root mean square error (RMSE) of 3.90 m. The same analysis between geodetic points and historical DSMs showed higher accuracies, with median values of ≤ 0.05 m and an NMAD < 1 m.
The VHMs generated in this study enabled the detection of major changes in forest areas due to windstorm damage, forest dynamics, and growth. This work demonstrates the feasibility of generating accurate, very high-resolution DSM time series (spanning three decades) and VHMs from historical aerial images of the entire surface of Switzerland in a highly automated manner. The VHMs are already being used to estimate countrywide biomass changes. The countrywide DSMs and VHMs for the four epochs, along with auxiliary data, are available online at https://doi.org/10.16904/envidat.528 (Marty et al., 2024) and can be used to quantify long-term elevation changes and related processes across different surfaces.
- Preprint
                                        (13643 KB) 
- Metadata XML
- BibTeX
- EndNote
Status: closed
- RC1: 'Comment on essd-2024-428', Anonymous Referee #1, 01 Feb 2025
- 
                     RC2:  'Comment on essd-2024-428', Anonymous Referee #2, 06 May 2025
            
            
            
            
                        This paper is about the usage of historic aerial images of entire Swiss country to derive digital 
 surface models (DSMs) which are then used to produce vegetation height models.
 Four epochs are considered ranging from 1979 to 2006. The images are
 used with existing orientation data. The focus of the paper lies on the accuracy analysis
 of the derived DSMs using a reference DTM, geodetic check points and manual stereoscopic
 measurements.
 General comments:The paper is well organized and well written. The topic is interesting, showing what problems 
 and which accuracies one might expected when using historic aerial images to derive
 3D information.All in all the content is good. However, a few details are missing 
 from my point of view. I raise concerns regarding the sign of differences and, in
 particular, the statistical analysis (although intended to be robust), which is not done
 in an optimal way (see major detailed comments).
 Also a clear statement regarding the relation between the obtained accuracies and the GSD
 of the images should be given.
 Therefore the major revision.Major detailed comments: - Sign of differences 
 On row 196 you say: "The elevation differences were calculated consistently as the
 reference data minus the historical DSMs."
 --> If there is no meaning attached to A or B, then the differences "A - B" and "B - A"
 are equally good. However, if one of A and B is a "reference", then the reference
 should always be subtracted, because the reference defines where the "origin" should lie.
 Thus if X is the data and the reference is X0 the difference should be computed as X - X0.
 So, if the reference is zero, you end up getting your original data X - X0 = X - 0 = X and
 not -X (as in your case).
 In your case the reference is a terrain model; thus even more motivating to compute the
 differences as: historic_DSM minus reference_DTM. Thus your differences can easily be compared
 with potential object heights (causing these differences), because the signs fit. And the sign
 of a non-zero offset (mean or median) is easily understandable; thus in a vegetated area a
 positive offset can be expected.Luckily the effect on your paper is not big: 
 In your statistics RMS, NMAD and STD will not change; only the mean and median will change. The color
 codings will only swap their colors.
 Still I think it is crucial that the differences are correctly computed as X - X0.
 - Robust statistics
 It is good that you consider the possibility of outliers and that you want to remove them and their
 influence. However, the current approach is suboptimal.In order to remove outliers from your data you must know the distribution (meaning 
 its type and the values of the defining parameters). In many
 cases the normal distribution is a good assumption for the type.If you would know in beforehand the mean and std of the normal distribution behind your corrupted data, 
 then the outliers could be filtered out by [mean +- 3*std] (because > 99.7% of the distribution would be
 inside this interval).However, if you do not know the parameter 
 values (i.e. mean and std), then it is a bad idea to compute mean and std from the data
 that are corrupted by outliers, because mean and std would be computed totally wrong. Instead, these parameters
 need to be computed by robust methods from these corrupted data.Under the assumption of normal distribution the median is a robust estimate for the mean, and the NMAD 
 is a robust estimate for the std. Consequently, outliers from these corrupted data should be detected
 by considering the interval [median +- 3*NMAD]. Often, the validity of (a single!) normal distribution
 is not fully met, then a larger interval should be chosen e.g. +- 5*NMAD.Remark: Your statement on row 242 is not correct: "The NMAD is defined as 1.4826*MAD (mean absolute deviation)." 
 The MAD is the median of absolute deviations. And these deviations are computed with respect to the median of the
 data. There is no "mean" used at all when computing the NMAD, since you are confronted with outliers.
 https://en.wikipedia.org/wiki/Median_absolute_deviationThere are several locations in your paper where you apply the outlier removal wrong by considering 
 the interval [mean +- 3*std]; e.g. in Tab. 2 and Fig 7: For epoch 4 (geodetic points) you obtain
 median = 0.00 and NMAD = 0.98 m. If the assumption of normal distribution is correct, then the RMS
 of this distribution should also be close to 1 m. However, >>after<< outlier removal you obtain
 RMS = 3.33 m.
 The histogram in Fig 7 (bottom, right) clearly shows that almost all data is within the interval +- 5 m.
 Thus an outlier removal based on the interval [median +- 5*NMAD] would do the trick.
 You, however, picked [mean +- 3*std] where the (unreported) std from the corrupted data will be somewhat in the range of
 the reported RMS of 3.33 m and consequently you accept everything on the interval of roughly +-10 m.Another example are the sealed surfaces in Tab. 3, where median = 0.19 m, NMAD = 0.63 m and RMS (after 
 outlier removal) = 1.24 m also do not fit together since under the normal distribution assumption the
 RMS should be much closer to sqrt(median^2 + NMAD^2).My last remark on this behalf concerns the vertical correction on page 9, row 167: 
 "Outliers were filtered out based on mean ± 1 standard deviation".
 From my outline above it should become obvious why this is a very odd method for outlier removal.
 The choice of "+- 1 std" is statistically unjustified (since this interval corresponds to only 68%).
 The reason why you chose it, is quite likely due to the outliers in your data which yield a wrong large std
 that alone (i.e. 1*std instead of 3*std) is sufficient for your task.It is further odd, that you compute the mean >>before<< outlier removal (and use it for this), but afterwards 
 you compute the "median of the remaining elevation differences" (row 168).
 In case of robust statistics it should always be the other way: First compute the median and afterwards
 compute the mean.
 Since the median is quite robust, it does not benefit much from outlier removal (at least if the outlier
 percentage is not extreme). Thus I could imagine, that the entire computation of the correction surface could be
 done by simply computing the median of the original (corrupted) data.
 - Relation to GSD
 The nice thing about photogrammetry is that the accuracy of the result can be linked to the GSD.
 Of course, the accuracy will depend on the object type: rough surfaces like forests will be worse than
 smooth objects. Currently, this link is not emphasized at all. One would expect that
 the accuracy of certain well defined objects (e.g. smooth surfaces) should be around one GSD.
 Maybe the age of the images can play some role here, but on the other hand the images are not from WWII and
 were captured with high-end photogrammetric cameras back in the day. Also the accuracy of the reference
 data will have some effect.You are deriving DSMs with grid width three times the GSD. So the accuracy expectation above may not be 
 obtainable everywhere when comparing height models. However, smooth horizontal terrain regions should
 not be affected by the grid width. Thus reporting the obtained accuracies there (as multiples of the GSD)
 in the conclusion chapter would show the optimum accuracy. This value also would ease the comparison
 with other publications that use different data/methods.Actually, most of this is already done (but just not emphasized); e.g. Fig. 8(a) clearly shows that the 
 NMAD of the horizontal regions is way below 1 m. A quick and dirty estimate from the figure makes one
 think the NMAD is around 0.7 m, which would correspond to 2*GSD, but I am sure you can report proper
 numbers.Minor detailed comments: row 87: "Representative study sites of approximately 210 km2" 
 --> in total or is this the area of each site?r89: "consists of approximately 40,000 panchromatic images" 
 --> it would be interesting to report the number of images per epoch. The histograms
 in Fig. 2 only allow for a rough estimate.Also, it would be helpful to report the format of the swisstopo tiles. I was not able to 
 find the format using Google (in acceptable time). A rough estimation makes me think
 it is 17.5 x 12 km².Some information on the orientation quality would be helpful. So far the only figure in 
 this respect is "resulted in accurate horizontal positioning but biases in
 the Z direction, ranging from 1 to 5 m" in row 123. How accurate is the horizontal positioning?
 Perhaps swisstopo provides information like the reference standard deviation of the
 respective aerial triangulations (comparable to mean reprojection error)?
 At least, the dedicated publication on the orientation of
 epoch 2 should provide some values, that can be cited.r140: "two strategies were adopted, namely urban and low contrast." 
 --> the reader might wonder why an "urban" strategy is used, although the main focus
 are forested areas. I think it has to do with the anticipated height changes, which are
 similar in urban and vegetation area, but a sentence in this respect would be good.
 Actually, an information comes later (r148), but it would be good if this would right after
 introducing the strategies.r154: "this correction was also applied to the DSM from epoch 2, resulting in neglectable 
 changes due to the already precise horizontal and vertical orientation."
 --> It is interesting that epoch 2 fits much better from the very start. Probably, the
 ground control information used for the AT of epoch 2 (Heisig & Simmen, 2021) is also based
 on the very same reference DTM like the here presented correction strategy?r161: "which were generalized using the median height" 
 --> Perhaps, "merged" is a better term than "generalized"? Also, did all the overlapping
 DSMs from the individual stereo pairs have the same raster layout - meaning same grid width (yes, 1 m)
 but also same offset (?). If not, some accuracy might get lost during the necessary interpolation for
 the median-based fusion.r166-r170: 
 The description of the correction work flow is not entirely clear to me.
 Usually, moving windows have an odd number of rows and columns, because this allows to center the window
 in the central pixel, analyze the data within the window and store the outcome in that central pixel.
 You, however, use a 50 x 50 moving window. How do you place it on the data raster?
 How does one get to the number "1200" for the regular grid cells per map sheet?Fig.4: "Point classified as sealed surface and grass/herb" --> "sealed surface or grass/herb" r180: The word "existing" in "existing objects" might be confusing. Either drop it or 
 restrict it to the historical epochs, so that it can not be confused with meaning "existing today".r185: "nDSM raster cells with values > 60 m were considered outliers" 
 --> buildings in larger cities may not be larger than 60 m?r187: "To achieve this, a binary non-vegetation mask (0 = non-vegetation and 1 = vegetation) was created" 
 --> Wouldn't setting non-vegetation to NaN be a better choice in order to keep proper vegetation
 differences, which are exactly 0? i.e. after the multiplication with this mask (row 192) a pixel with value == 0
 can not be told apart from a proper vegetation pixel with difference == 0 and a masked non-vegetation pixel.r190: "To retain vegetation close to buildings and streets in the VHMs, single trees 
 and hedges from the TLM were buffered by 15 m"
 --> Following this 15m-buffering, some parts of buildings may end up as being considered as vegetation?r220: "For each forest plot, 25 elevation measurements were extracted on the same stereo pair 
 as used for generating the historical DSM"
 --> But the historic DSM in the end is created from a median based fusion, where the DSM from this "single" stereo
 pair is no longer "available". So, manual image stereo
 measurements from one image pair are not really compared with automatic heights from the very same image pair?r224: Regarding "completeness": A differentiation regarding how a pixel got its height is not performed, meaning 
 from building strategy or from low contrast strategy or from median-based fusion?Fig 6: I am confused by the total number of grid points reported for the six classes. They add up in total 
 to 226606 points (=6101+43654+77963+13488+74065+11335). However (if I am not mistaken), the six considered
 tiles of size 210 km² would result only in 6 * 210 / 0.1^2 = 126000 points, where the spacing of the points is
 100 m (according to row 201). How can this huge difference be explained?r281 "and an NMAD of approximately 1.5 m." 
 --> The respective number in table 2 is actually 1.41 m. So rounding would be 1.4 m, although I'd prefer to
 cite the exact number (i.e. 1.41). (This may seem a bit nitpicking, but given the many numbers in different
 figures and tables it would be helpful to keep the very same digits in order to help the reader pricely located quoted values.)Table 2: Interestingly, the smallest error values result for epoch 1. Which appears a bit surprising, considering 
 that this is the oldest epoch, thus being effect by long storage harm, etc. the most (maybe even being captured by
 the least advanced camera?). Any explanations for this? Maybe epoch 1 has better GSD or better forward/side overlap?
 But maybe the picture changes after adopting the robust statistics mentioned in the beginning?Fig. 8: Please, add (°) to the two aspect captions. r341 "between epoch 4 and 2022 (Fig. 10)." 
 --> Do you mean a certain part in area 4? then perhaps add the location along the "length", i.e. from 370 m to 870 m.Fig. 11. The heading of the center figure says "1994", thus coming from epoch 3 not epoch 2? r382: The completeness of the image-matching shows a steady increase over the four epochs (Fig. A2), particularly 
 in areas classified as grasses and herbs.
 --> Is this increase really so dramatic, and worth mentioning? Some values actually decrease. Or do you mean with
 respect to epoch 1 (which, however, is not included in the referenced Fig. A2).r383: "This improvement is strongly related to advancements in camera technology ..." 
 --> Since the error values for epoch 1 are the smallest in Tab. 2 (currently), this claim stands on weak feet or
 at least needs some clarification.r386: "at sealed surface points shows metric to sub-metric accuracy with an uncertainty" 
 --> Somewhere here it should be repeated that the GSD is around 35 cm and that the DSMs were generated with 1 m
 grid width.
 I am not a native speaker, but I am not sure, if the term "sub-metric" exists. Perhaps, "meter to sub-meter" is more correct.r405: "The larger bias in historical data might be due to the more demanding visual detection of objects during 
 manual stereo measurements on panchromatic imagery and the coarser resolution of the images."
 --> Actually, the considered historical images in your study have a GSD of approx. 0.35 m, which (nominally) is
 better then the 0.5 m of the mentioned ADS images.r412: "the low quality of scanning film negatives" 
 --> Just the wording is maybe odd: The scanning itself is not the problem, but the quality of the original films.r422: "vertical correction of about 20,000 DSMs" 
 --> Maybe add "(stereo footprints)".Citation: https://doi.org/10.5194/essd-2024-428-RC2 
- 
                     AC1:  'Comment on essd-2024-428', Mauro Marty, 02 Jul 2025
                        
                                
                        
            
            
            
            
                        Dear Topic Editor Alexander Gruber, Dear reviewers We herewith submit a point by point reply for the manuscript essd-2024-428 "Countrywide Digital Surface Models and Vegetation Height Models from Historical Aerial Images". We thank the reviewers for their valuable comments and suggestions. We are convinced that these revisions, as detailed in the point-by-point reply, address all the comments and questions raised by the reviewers and significantly improves the quality of our study. Sincerely Mauro Marty, on behalf of the author team 
Status: closed
- 
                     RC1:  'Comment on essd-2024-428', Anonymous Referee #1, 01 Feb 2025
            
            
            
            
                        Interesting work although it would benefit to have some more technical details to boost understanding and replicability inn other areas, such as: - how were the DSM generated in the various epochs? I mean, not sw-wise...for the epoch 1979–1985, did you generate DSM for each year and then average it to have a DSM for the epoch? - in the photogrmmetric processing, how did you handle the information of the fiducial marks? - how did you produce the VHM from the available DSM? how was the filtering done? did you use an available DTM? if so, which one? - how did you assess the quality of the historical DSMs? You mention "evaluating the vertical accuracy and the completeness of image-matching" but what was the ground truth? The reported % are very optimistic Citation: https://doi.org/10.5194/essd-2024-428-RC1 
- 
                     RC2:  'Comment on essd-2024-428', Anonymous Referee #2, 06 May 2025
            
            
            
            
                        This paper is about the usage of historic aerial images of entire Swiss country to derive digital 
 surface models (DSMs) which are then used to produce vegetation height models.
 Four epochs are considered ranging from 1979 to 2006. The images are
 used with existing orientation data. The focus of the paper lies on the accuracy analysis
 of the derived DSMs using a reference DTM, geodetic check points and manual stereoscopic
 measurements.
 General comments:The paper is well organized and well written. The topic is interesting, showing what problems 
 and which accuracies one might expected when using historic aerial images to derive
 3D information.All in all the content is good. However, a few details are missing 
 from my point of view. I raise concerns regarding the sign of differences and, in
 particular, the statistical analysis (although intended to be robust), which is not done
 in an optimal way (see major detailed comments).
 Also a clear statement regarding the relation between the obtained accuracies and the GSD
 of the images should be given.
 Therefore the major revision.Major detailed comments: - Sign of differences 
 On row 196 you say: "The elevation differences were calculated consistently as the
 reference data minus the historical DSMs."
 --> If there is no meaning attached to A or B, then the differences "A - B" and "B - A"
 are equally good. However, if one of A and B is a "reference", then the reference
 should always be subtracted, because the reference defines where the "origin" should lie.
 Thus if X is the data and the reference is X0 the difference should be computed as X - X0.
 So, if the reference is zero, you end up getting your original data X - X0 = X - 0 = X and
 not -X (as in your case).
 In your case the reference is a terrain model; thus even more motivating to compute the
 differences as: historic_DSM minus reference_DTM. Thus your differences can easily be compared
 with potential object heights (causing these differences), because the signs fit. And the sign
 of a non-zero offset (mean or median) is easily understandable; thus in a vegetated area a
 positive offset can be expected.Luckily the effect on your paper is not big: 
 In your statistics RMS, NMAD and STD will not change; only the mean and median will change. The color
 codings will only swap their colors.
 Still I think it is crucial that the differences are correctly computed as X - X0.
 - Robust statistics
 It is good that you consider the possibility of outliers and that you want to remove them and their
 influence. However, the current approach is suboptimal.In order to remove outliers from your data you must know the distribution (meaning 
 its type and the values of the defining parameters). In many
 cases the normal distribution is a good assumption for the type.If you would know in beforehand the mean and std of the normal distribution behind your corrupted data, 
 then the outliers could be filtered out by [mean +- 3*std] (because > 99.7% of the distribution would be
 inside this interval).However, if you do not know the parameter 
 values (i.e. mean and std), then it is a bad idea to compute mean and std from the data
 that are corrupted by outliers, because mean and std would be computed totally wrong. Instead, these parameters
 need to be computed by robust methods from these corrupted data.Under the assumption of normal distribution the median is a robust estimate for the mean, and the NMAD 
 is a robust estimate for the std. Consequently, outliers from these corrupted data should be detected
 by considering the interval [median +- 3*NMAD]. Often, the validity of (a single!) normal distribution
 is not fully met, then a larger interval should be chosen e.g. +- 5*NMAD.Remark: Your statement on row 242 is not correct: "The NMAD is defined as 1.4826*MAD (mean absolute deviation)." 
 The MAD is the median of absolute deviations. And these deviations are computed with respect to the median of the
 data. There is no "mean" used at all when computing the NMAD, since you are confronted with outliers.
 https://en.wikipedia.org/wiki/Median_absolute_deviationThere are several locations in your paper where you apply the outlier removal wrong by considering 
 the interval [mean +- 3*std]; e.g. in Tab. 2 and Fig 7: For epoch 4 (geodetic points) you obtain
 median = 0.00 and NMAD = 0.98 m. If the assumption of normal distribution is correct, then the RMS
 of this distribution should also be close to 1 m. However, >>after<< outlier removal you obtain
 RMS = 3.33 m.
 The histogram in Fig 7 (bottom, right) clearly shows that almost all data is within the interval +- 5 m.
 Thus an outlier removal based on the interval [median +- 5*NMAD] would do the trick.
 You, however, picked [mean +- 3*std] where the (unreported) std from the corrupted data will be somewhat in the range of
 the reported RMS of 3.33 m and consequently you accept everything on the interval of roughly +-10 m.Another example are the sealed surfaces in Tab. 3, where median = 0.19 m, NMAD = 0.63 m and RMS (after 
 outlier removal) = 1.24 m also do not fit together since under the normal distribution assumption the
 RMS should be much closer to sqrt(median^2 + NMAD^2).My last remark on this behalf concerns the vertical correction on page 9, row 167: 
 "Outliers were filtered out based on mean ± 1 standard deviation".
 From my outline above it should become obvious why this is a very odd method for outlier removal.
 The choice of "+- 1 std" is statistically unjustified (since this interval corresponds to only 68%).
 The reason why you chose it, is quite likely due to the outliers in your data which yield a wrong large std
 that alone (i.e. 1*std instead of 3*std) is sufficient for your task.It is further odd, that you compute the mean >>before<< outlier removal (and use it for this), but afterwards 
 you compute the "median of the remaining elevation differences" (row 168).
 In case of robust statistics it should always be the other way: First compute the median and afterwards
 compute the mean.
 Since the median is quite robust, it does not benefit much from outlier removal (at least if the outlier
 percentage is not extreme). Thus I could imagine, that the entire computation of the correction surface could be
 done by simply computing the median of the original (corrupted) data.
 - Relation to GSD
 The nice thing about photogrammetry is that the accuracy of the result can be linked to the GSD.
 Of course, the accuracy will depend on the object type: rough surfaces like forests will be worse than
 smooth objects. Currently, this link is not emphasized at all. One would expect that
 the accuracy of certain well defined objects (e.g. smooth surfaces) should be around one GSD.
 Maybe the age of the images can play some role here, but on the other hand the images are not from WWII and
 were captured with high-end photogrammetric cameras back in the day. Also the accuracy of the reference
 data will have some effect.You are deriving DSMs with grid width three times the GSD. So the accuracy expectation above may not be 
 obtainable everywhere when comparing height models. However, smooth horizontal terrain regions should
 not be affected by the grid width. Thus reporting the obtained accuracies there (as multiples of the GSD)
 in the conclusion chapter would show the optimum accuracy. This value also would ease the comparison
 with other publications that use different data/methods.Actually, most of this is already done (but just not emphasized); e.g. Fig. 8(a) clearly shows that the 
 NMAD of the horizontal regions is way below 1 m. A quick and dirty estimate from the figure makes one
 think the NMAD is around 0.7 m, which would correspond to 2*GSD, but I am sure you can report proper
 numbers.Minor detailed comments: row 87: "Representative study sites of approximately 210 km2" 
 --> in total or is this the area of each site?r89: "consists of approximately 40,000 panchromatic images" 
 --> it would be interesting to report the number of images per epoch. The histograms
 in Fig. 2 only allow for a rough estimate.Also, it would be helpful to report the format of the swisstopo tiles. I was not able to 
 find the format using Google (in acceptable time). A rough estimation makes me think
 it is 17.5 x 12 km².Some information on the orientation quality would be helpful. So far the only figure in 
 this respect is "resulted in accurate horizontal positioning but biases in
 the Z direction, ranging from 1 to 5 m" in row 123. How accurate is the horizontal positioning?
 Perhaps swisstopo provides information like the reference standard deviation of the
 respective aerial triangulations (comparable to mean reprojection error)?
 At least, the dedicated publication on the orientation of
 epoch 2 should provide some values, that can be cited.r140: "two strategies were adopted, namely urban and low contrast." 
 --> the reader might wonder why an "urban" strategy is used, although the main focus
 are forested areas. I think it has to do with the anticipated height changes, which are
 similar in urban and vegetation area, but a sentence in this respect would be good.
 Actually, an information comes later (r148), but it would be good if this would right after
 introducing the strategies.r154: "this correction was also applied to the DSM from epoch 2, resulting in neglectable 
 changes due to the already precise horizontal and vertical orientation."
 --> It is interesting that epoch 2 fits much better from the very start. Probably, the
 ground control information used for the AT of epoch 2 (Heisig & Simmen, 2021) is also based
 on the very same reference DTM like the here presented correction strategy?r161: "which were generalized using the median height" 
 --> Perhaps, "merged" is a better term than "generalized"? Also, did all the overlapping
 DSMs from the individual stereo pairs have the same raster layout - meaning same grid width (yes, 1 m)
 but also same offset (?). If not, some accuracy might get lost during the necessary interpolation for
 the median-based fusion.r166-r170: 
 The description of the correction work flow is not entirely clear to me.
 Usually, moving windows have an odd number of rows and columns, because this allows to center the window
 in the central pixel, analyze the data within the window and store the outcome in that central pixel.
 You, however, use a 50 x 50 moving window. How do you place it on the data raster?
 How does one get to the number "1200" for the regular grid cells per map sheet?Fig.4: "Point classified as sealed surface and grass/herb" --> "sealed surface or grass/herb" r180: The word "existing" in "existing objects" might be confusing. Either drop it or 
 restrict it to the historical epochs, so that it can not be confused with meaning "existing today".r185: "nDSM raster cells with values > 60 m were considered outliers" 
 --> buildings in larger cities may not be larger than 60 m?r187: "To achieve this, a binary non-vegetation mask (0 = non-vegetation and 1 = vegetation) was created" 
 --> Wouldn't setting non-vegetation to NaN be a better choice in order to keep proper vegetation
 differences, which are exactly 0? i.e. after the multiplication with this mask (row 192) a pixel with value == 0
 can not be told apart from a proper vegetation pixel with difference == 0 and a masked non-vegetation pixel.r190: "To retain vegetation close to buildings and streets in the VHMs, single trees 
 and hedges from the TLM were buffered by 15 m"
 --> Following this 15m-buffering, some parts of buildings may end up as being considered as vegetation?r220: "For each forest plot, 25 elevation measurements were extracted on the same stereo pair 
 as used for generating the historical DSM"
 --> But the historic DSM in the end is created from a median based fusion, where the DSM from this "single" stereo
 pair is no longer "available". So, manual image stereo
 measurements from one image pair are not really compared with automatic heights from the very same image pair?r224: Regarding "completeness": A differentiation regarding how a pixel got its height is not performed, meaning 
 from building strategy or from low contrast strategy or from median-based fusion?Fig 6: I am confused by the total number of grid points reported for the six classes. They add up in total 
 to 226606 points (=6101+43654+77963+13488+74065+11335). However (if I am not mistaken), the six considered
 tiles of size 210 km² would result only in 6 * 210 / 0.1^2 = 126000 points, where the spacing of the points is
 100 m (according to row 201). How can this huge difference be explained?r281 "and an NMAD of approximately 1.5 m." 
 --> The respective number in table 2 is actually 1.41 m. So rounding would be 1.4 m, although I'd prefer to
 cite the exact number (i.e. 1.41). (This may seem a bit nitpicking, but given the many numbers in different
 figures and tables it would be helpful to keep the very same digits in order to help the reader pricely located quoted values.)Table 2: Interestingly, the smallest error values result for epoch 1. Which appears a bit surprising, considering 
 that this is the oldest epoch, thus being effect by long storage harm, etc. the most (maybe even being captured by
 the least advanced camera?). Any explanations for this? Maybe epoch 1 has better GSD or better forward/side overlap?
 But maybe the picture changes after adopting the robust statistics mentioned in the beginning?Fig. 8: Please, add (°) to the two aspect captions. r341 "between epoch 4 and 2022 (Fig. 10)." 
 --> Do you mean a certain part in area 4? then perhaps add the location along the "length", i.e. from 370 m to 870 m.Fig. 11. The heading of the center figure says "1994", thus coming from epoch 3 not epoch 2? r382: The completeness of the image-matching shows a steady increase over the four epochs (Fig. A2), particularly 
 in areas classified as grasses and herbs.
 --> Is this increase really so dramatic, and worth mentioning? Some values actually decrease. Or do you mean with
 respect to epoch 1 (which, however, is not included in the referenced Fig. A2).r383: "This improvement is strongly related to advancements in camera technology ..." 
 --> Since the error values for epoch 1 are the smallest in Tab. 2 (currently), this claim stands on weak feet or
 at least needs some clarification.r386: "at sealed surface points shows metric to sub-metric accuracy with an uncertainty" 
 --> Somewhere here it should be repeated that the GSD is around 35 cm and that the DSMs were generated with 1 m
 grid width.
 I am not a native speaker, but I am not sure, if the term "sub-metric" exists. Perhaps, "meter to sub-meter" is more correct.r405: "The larger bias in historical data might be due to the more demanding visual detection of objects during 
 manual stereo measurements on panchromatic imagery and the coarser resolution of the images."
 --> Actually, the considered historical images in your study have a GSD of approx. 0.35 m, which (nominally) is
 better then the 0.5 m of the mentioned ADS images.r412: "the low quality of scanning film negatives" 
 --> Just the wording is maybe odd: The scanning itself is not the problem, but the quality of the original films.r422: "vertical correction of about 20,000 DSMs" 
 --> Maybe add "(stereo footprints)".Citation: https://doi.org/10.5194/essd-2024-428-RC2 
- 
                     AC1:  'Comment on essd-2024-428', Mauro Marty, 02 Jul 2025
                        
                                
                        
            
            
            
            
                        Dear Topic Editor Alexander Gruber, Dear reviewers We herewith submit a point by point reply for the manuscript essd-2024-428 "Countrywide Digital Surface Models and Vegetation Height Models from Historical Aerial Images". We thank the reviewers for their valuable comments and suggestions. We are convinced that these revisions, as detailed in the point-by-point reply, address all the comments and questions raised by the reviewers and significantly improves the quality of our study. Sincerely Mauro Marty, on behalf of the author team 
Data sets
Historical Vegetation Height Model NFI Mauro Marty, Livia Piermattei, Lars T. Waser, and Christian Ginzler https://doi.org/10.16904/envidat.528
Viewed
| HTML | XML | Total | BibTeX | EndNote | |
|---|---|---|---|---|---|
| 770 | 126 | 30 | 926 | 24 | 39 | 
- HTML: 770
- PDF: 126
- XML: 30
- Total: 926
- BibTeX: 24
- EndNote: 39
Viewed (geographical distribution)
| Country | # | Views | % | 
|---|
| Total: | 0 | 
| HTML: | 0 | 
| PDF: | 0 | 
| XML: | 0 | 
- 1
 
                         
                         
                         
                         
                 
                 
                 
                 
                
Interesting work although it would benefit to have some more technical details to boost understanding and replicability inn other areas, such as:
- how were the DSM generated in the various epochs? I mean, not sw-wise...for the epoch 1979–1985, did you generate DSM for each year and then average it to have a DSM for the epoch?
- in the photogrmmetric processing, how did you handle the information of the fiducial marks?
- how did you produce the VHM from the available DSM? how was the filtering done? did you use an available DTM? if so, which one?
- how did you assess the quality of the historical DSMs? You mention "evaluating the vertical accuracy and the completeness of image-matching" but what was the ground truth? The reported % are very optimistic