the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
A hyperspectral and multi-angular synthetic dataset for algorithm development in waters of varying trophic levels and optical complexity
Abstract. This data paper outlines the development and structure of a synthetic dataset (SD) within the optical domain, encompassing inherent and apparent optical properties (IOPs-AOPs) alongside associated optically active constituents (OACs). The bio-optical modeling benefited from knowledge and data accumulated over the past three decades, resulting on a comprehensive dataset of in situ IOPs, including diverse water typologies, and enabling the imposition of rigorous quality standards. Consequently, the bio-optical relationships delineated herein represent valuable contributions to the field.
Employing the Hydrolight scalar radiative transfer equation solver, we generated above-surface and submarine light fields across the specified spectral range at a “true” hyperspectral resolution (1 nm), covering the ultraviolet down to 350 nm, therefore facilitating algorithm development and assessment for present and forthcoming hyperspectral satellite missions. A condensed version of the dataset tailored to twelve Sentinel-3 OLCI bands (400 nm to 753 nm) was crafted. Derived AOPs encompass an array of above- and below-surface reflectances, diffuse attenuation coefficients, and average cosines.
The dataset is distributed in 5000 files, each file encapsulating a specific IOP scenario, ensuring sufficient data volume for each water type represented. A unique feature of our dataset lies in the calculation of AOPs across the complete range of solar and viewing zenith and azimuthal angles as per the Hydrolight default quadrants, amounting to 1300 angular combinations. This comprehensive directional coverage caters to studies investigating signal directionality, previously lacking sufficient reference data. The dataset is publicly available for anonymous retrieval via the FAIR repository Zenodo at https://doi.org/10.5281/zenodo.11637178 (Pitarch and Brando, 2024).
- Preprint
(23179 KB) - Metadata XML
- BibTeX
- EndNote
Status: final response (author comments only)
-
RC1: 'Comment on essd-2024-295', Anonymous Referee #1, 16 Sep 2024
-
CC3: 'Reply on RC1', Juan Ignacio Gossn, 23 Sep 2024
I find the following statement from the reviewer "the dataset should have been validated on actual data" to be unclear. What does "actual data" mean here? If the reviewer means "in situ" data, well, it turns that this simulated dataset is relying on empirical relations that actually were fitted to in situ data, and it is evident that this dataset outperforms the pre-existing published simulated ones in terms of the established empirical relations between measured IOPs that are inputted to Hydrolight, this is very well detailed in the sections 1.2 and 1.3. The authors are clear regarding what in situ data they have used to establish such relations and do compare the simulated datasets to those data to check the performance of the established empirical relations (e.g. Fig. 4 and Fig. 5). Naturally more and more in situ data and better RT could come (albeit "optical closure" would never be achieved in an insurmountable way), however it is clear that the dataset that is presented is a substantial improvement to pre-existing simulated datasets and there is no clear reason why the reviewer should reject its publication. RT could be refined in the future (e.g. a Hydrolight version including polarization or a larger in situ dataset on which to rely for the established empirical relations), then a surpassing dataset will be achieved. In the meantime we can't expect perfection, this is clearly the most sophisticated that we have, substantial progress has been demonstrated, and having it published would mean a great service to the science community relying on simulated bio-optical datasets for algorithm development.
Citation: https://doi.org/10.5194/essd-2024-295-CC3 -
RC4: 'Reply on CC3', Anonymous Referee #1, 30 Sep 2024
Dear Juan Ignacio Gossn,
Thanks for your comment. Maybe, you followed the reply to Curtis Mobley's comment where the interest of such a synthetic data set was discussed. Following this comment, I proposed to handle the manuscript with "major revision" before potential publication considering the necessary validation of the simulated outcomes (also asked by the referee David McKee). I foresee this validation exercise as a great asset for further use of the provided data set (algorithm dev., use of AI techniques...). Moreover, I found the manuscript of interest but closer to a "research paper" than a "data paper".
I also apologize if I was unclear on the term "actual data" for the validation exercise. Let me put it another way: (i) in situ measurements of inherent and apparent properties are available from several data sets, (ii) statistical treatments as well as physical/mathematical assumptions have been made to model the inherent optical properties (IOP), (iii) those modeled IOP are used as input to a radiative transfer solver (Hydrolight), (iv) the outcomes of Hydrolight are provided in terms of several apparent optical properties (AOP) knowing the input IOP. The proposed validation exercise is to compare (for a few exemplary cases) the simulated AOP where the ones from the in situ data set. This could be done for the remote sensing reflectance by plotting the Hydorlight computed values along with the measured reflectance (...and for sure we could understand the non-perfect match, but it is better to see it). It is common scientific practice to validate theoretical calculations (with an objective method) before providing them as a reference data set. As a referee, I am pushing this way to reinforce the manuscript and potential impacts on future research, and not to slow down any publication effort.
Best regards.
Citation: https://doi.org/10.5194/essd-2024-295-RC4 -
AC2: 'Reply on CC3', Jaime Pitarch, 04 Oct 2024
Dear Juan,
We appreciate that you rook the time to evaluate our manuscript.
As an EUMETSAT scientist, you are aware of the importance of having a reference dataset for algorithm calibration and validation. Also, in the framework of EUMETSAT’s efforts to provide a bidirectional correction of OLCI data, we believe that this dataset is going to be of help.
We appreciate the insightful comment on the effort we took to provide justification for the relationships we used and the comparisons to independent data that we already show. Indeed, we believe that, comparing to previous datasets, our work makes a great leap forward.
You also make a point that with the given data and analytical tools at hand, this is the best effort we could do, but it does not prevent us from searching improvements in the future as more data becomes accessible and the radiative transfer tool is updated with new features that would be beneficial (i.e., improved modelling of the sea-surface transmission and reflection, improved characterization of the sky radiance, and inclusion of polarization).
Best regards.
Citation: https://doi.org/10.5194/essd-2024-295-AC2
-
RC4: 'Reply on CC3', Anonymous Referee #1, 30 Sep 2024
- AC4: 'Reply on RC1', Jaime Pitarch, 16 Oct 2024
-
CC3: 'Reply on RC1', Juan Ignacio Gossn, 23 Sep 2024
-
CC1: 'Comment on essd-2024-295', Curtis Mobley, 16 Sep 2024
I just ran across this paper. I have not examined the synthetic data set itself (5000 files!!), but it appears to be a major contribution to the community developing algorithms for retrieval of environmental parameters from remotely sensed or in-water data sets. Given the absence of comprehensive data sets containing a wide range of measured data on IOPs, radiometric quantities (radiances and irradiances), and AOPs, the science community is forced to rely on synthetic data sets created by models such as HydroLight. Although HydroLight does not include polarization (its primary weakness), it is nevertheless the most widely used "industry standard" code for computation of in-water and water-leaving radiances and derived quantities. Having available a free and easily accessible HydroLight-computed data set covering the ranges of IOPs, solar zenith angles, viewing directions, etc. encountered in nature gives a common data set for use in developing and comparing many types of algorithms. I expect that this data set will find a great deal of usage worldwide, and I recommend publication of this paper. -- Regards, Curtis Mobley (Full disclosure: although I developed HydroLight decades ago, I am now fully retired and have no personal or financial interest in HydroLight.)
Citation: https://doi.org/10.5194/essd-2024-295-CC1 -
RC2: 'Reply on CC1', Anonymous Referee #1, 17 Sep 2024
Dear Curtis Mobley,
Thanks for this insightful comment and I am pleased to see that the discussion tool is alive.
In the review I did, I did not criticize the radiative transfer solver (Hydrolight) and I completely follow you on the need of reference radiative transfer outcomes. Nevertheless, such reference computations must be validated (at least for a few cases) and the assumptions made in the "single scattering " properties investigated. This effort should deserve a proper publication with a "research paper" before providing the computations as is through a "data paper". It could be of interest to provide an "optical closure" for a few cases and to investigate the anisotropy behavior of the water-leaving radiance for a series of bulk single scattering albedo and scattering phase function of the water column. But I have to recognize that is out of the scope of a data paper.
I personally think that it could be misleading to work on the 5000 files provided here without having insights on the representativeness of those computations. To conclude, I asked for rejection since I had no means to evaluate the quality/realism of the computed values and feel unethical to provide a synthetic data set as a community reference without thorough validation. But I would like to give full consideration to your comment and will be happy to ask for major comment provided a clear validation for some typical water types and assessment of the effects of the scattering phase function on the water-leaving signal anisotropy.
Best regards
Citation: https://doi.org/10.5194/essd-2024-295-RC2 -
AC1: 'Reply on CC1', Jaime Pitarch, 04 Oct 2024
Dear Curt,
your kind words on our new synthetic dataset are very much appreciated.
You clearly highlight the point that the absence of comprehensive datasets containing a wide range of IOPs, derived AOPs (and angular ranges), with known uncertainties, trigger the development of synthetic datasets. In fact, our synthetic dataset covers very wide data ranges.
In order to be sure that the derived data is meaningful, the bio-optical modelling made in this paper has been forced by a greater amount of data respect to any previous dataset. Independent verifications show trends that are highly consistent with empirical data.
Then, you know very well that the Hydrolight code delivers AOPs that are accepted as "error free" and are the reference for studying consistency between matchup datasets IOPs-AOPs, the so-called closure. In such a case, it is the in situ AOP that is examined against the "truth", which is the Hydrolight simulation.
Regarding the directionality, multi-angular plots showing trends that are consistent with expected patterns. This is very relevant to the bidirectional problem.
In order to further increase the confidence on the dataset, you will see an updated version of the manuscript showing some crossed relationships that involve reflectance, compared to equivalent plots with in situ data, or fitting curves that come from in situ data.
Thank you again and best regards.
Citation: https://doi.org/10.5194/essd-2024-295-AC1
-
RC2: 'Reply on CC1', Anonymous Referee #1, 17 Sep 2024
-
CC2: 'Comment on essd-2024-295', Giuseppe Zibordi, 21 Sep 2024
The manuscript by Pitarch and Brando proposes a comprehensive data set of IOP and AOP values from radiative transfer simulations and related parameterisations relying on previous scientific investigations and high quality in situ data.The newly proposed data set definitively shows advances with respect to various predecessors and because of this it deserves to be supported by the proposed manuscript.
I only have three comments I would like to convey to the authors.
The first is quite minor and refers to the terminology. The term ‘ synthetic ’ should be replaced by the more appropriate ‘ simulated ’.
The second on the statement qualifying simulated data not affected by errors. This is quite questionable: simulated data can only provide an 'interpretation' of the ‘truth’ based on a number of input parameters and modelling solutions. Regardless of the RT solution, the input parameters may not capture the actual ‘truth’.
The last comment is the most relevant one. It is commendable that the data set is proposed with 1-nm spectral resolution. However, it is questionable that the simulated data can actually capture 1-nm spectral variations. This appears confirmed by the aggressive smoothing applied to the experimental aph values. This limitation should be acknowledged.
Giuseppe Zibordi
Citation: https://doi.org/10.5194/essd-2024-295-CC2 - AC3: 'Reply on CC2', Jaime Pitarch, 14 Oct 2024
-
RC3: 'Comment on essd-2024-295', David McKee, 22 Sep 2024
A hyperspectral and multi-angular synthetic dataset for algorithm development in waters of varying trophic levels and optical complexity
Jaime Pitarch, Vittorio Ernesto Brando
Comments
This paper presents a new synthetic data set linking apparent and inherent optical properties based on a very substantial set of radiative transfer simulations that are intended to provide comprehensive representation of optical water types found in nature. The purpose is to support ocean colour algorithm development and there is specific effort made to cover a wide range of sun sensor geometries, high spectral resolution and other important features. I am generally supportive of the effort and believe that the ambition of the work is significant. However, there are a couple of areas where I feel there are issues that might be either addressed or at least acknowledged before publication goes forward.
Limitations of measured data sets: One of the key themes of the paper is an ambition to better replicate the true range of variability found in nature. This is particularly emphasised with respect to oligotrophic waters which are reasonably claimed to be relatively under-sampled. In several sections, the authors point to existing field data sets and attempt to replicate all of the observed variability. Whilst this appears sensible on first inspection, I believe there is an underlying issue that needs to be considered. Essentially this boils down to the quality of field data. Any measurement is going to be subject to uncertainty and in many (most?) cases this uncertainty will become more significant as signal levels become smaller. This has been explored in some papers e.g. ref 1. Examples from the current manuscript that I think need to be considered include Figures 6 and 9 which both show apparently very strong variations in spectral slopes that just happen to coincide with signa levels dropping to very low levels. Is this real variability or is it the result of poor quality fits caused by limited data quality when signals are very low? Does it make sense to reproduce this level of variability in a synthetic data set if it is actually effectively noise and therefore potentially misleading? I think this at least needs to be considered. There are also known issues with aspects of filter pad absorption measurements (Ref 2 pathlength amplification and baseline correction - the latter can also be an issue for CDOM absorption) that are not discussed but that could lead to significant discrepancies in observed data sets. These issues are effectively being baked into the training of this synthetic data set. The description of how these data were measured is lacking detail and I think there is scope to at least mention that there may be issues of this nature.
Oligotrophic under-sampling: The authors make a significant play on extending coverage of oligotrophic waters that have been historically under-sampled. Whilst this is true, it remains the case that these waters have been sampled. I am concerned that Figure 9 appears to show at least a full order of magnitude of additional CDOM (ag440) range that has never been observed, even with Ultrapath CDOM sampling. I am perfectly happy to criticise measurement quality (see above) but I am a bit concerned about the justification for effectively inventing an additional decade of variability in this parameter? It is possible that community measurements have a lower limit that inhibits resolution of lower signals, but it is also potentially true that there is a background level of dissolved organic absorption that is a natural feature. I am not convinced that this aspect of the data set is as reliable as the paper currently suggests. Again, a more careful discussion of potential merit or otherwise would be advisable I think.
Parameterisation: The paper takes considerable effort to describe and justify construction of the bio-optical model and other aspects which go into parameterising the Hydrolight runs. Inevitably there are decisions that need to be made and options discarded as a result. This is fine, but in several cases here various decisions are presented as inevitable when in fact alternative option could have been chosen. I would not ask for these decisions to be revered or for models to be reworked in addition - that would be unfair. However, I think it is possible for the authors to recognise that alternatives would be available and might also be legitimate options. For example, they have opted to use the a version of the Hydrolight input generation where they calculate backscattering from backscattering ratios applied to scattering coefficients rather than directly inputting backscattering SIOPs. I can point to a small number of papers where there have been efforts made to directly estimate thee parameters (refs 3 and 4) and which would have provided alternative options that could be considered. Again, I would like to emphasise that I am not looking for more work to be done here, just that there is a slightly less emphatic description of what is possible and available (or not), taking into account material that is not hard to find in the literature.
Validation: The synthetic data set produces hyperspectral remote sensing reflectance spectra that may be of great value for algorithm development. However, it is unclear how representative the simulated spectra actually are? The discussion of the outputs very rapidly branches off into cluster analysis and consideration of geometric effects, but there is no real analysis of how representative the spectra are of natural distributions. I would like to see a comparison with existing measured data sets to get a sense of where there are overlaps and divergences that may or may not be of interest when considering value as an allegedly global data set. I would emphasise that I have no trouble with the quality of the simulated reflectance spectra per se – Hydrolight will produce essentially the right reflectance spectrum for whatever conditions you tell it to work with. However, the value of this synthetic data set is very much in its ability to cover the range of naturally occurring variation and I would like to see harder evidence that it does this e.g. for turbid coastal waters as well as more open coastal and oceanic conditions.
Final comment: I have pointed to four references that are all from my own work. I am very uncomfortable doing this and I am NOT looking for these to be specifically referred to. They do, however, represent the basis for where my opinions have been shaped on these matters and where I believe we might have some philosophical differences that are not, however, insurmountable. I would be more comfortable with a slightly less emphatic version of the paper that provides the reader with clear explanations of the decisions that were taken, but that notes that alternative options could have been taken in at least some cases. I genuinely think the authors need to carefully consider the rationale for reproducing all of the observed variability, including measurement uncertainties, some of which are very significant indeed. Ultimately I would be unlikely to use this synthetic data set as I would struggle to accept some of the decisions that have gone into producing it, but I can imagine it being welcomed by a significant part of the community more or less as is. As with all of these things, the expression caveat emptor pertains. I hope that these comments will help to encourage a slightly less emphatic description of the data set and encourage potential users to be mindful of where the limitations might still be found.
References
- McKee, D., R. Röttgers, G. Neukermans, V. Sanjuan Calzado, C. Trees, M. Ampolo-Rella, C. Neil and A. Cunningham Impact of measurement uncertainties on determination of chlorophyll-specific absorption coefficient for marine phytoplankton. J. Geophys. Res. Oceans: 119, 9013–9025, doi:10.1002/2014JC009909, 2014.
- Lefering, I., R. Röttgers, R. Weeks, D. Connor, C. Utschig, K. Heymann, and D. McKee Improved determination of particulate absorption from combined filter pad and PSICAM measurements, Opt. Express 24, 24805-24823, 2016.
- Bengil, F., D. McKee, S. T. Beşiktepe, V. S. Calzado, and C. Trees A bio-optical model for integration into ecosystem models for the Ligurian Sea, Prog. Oceanography 149, 1-15, 2016.
- Lo Prejato M., McKee D. (2023) Optical Constituent Concentrations and Uncertainties Obtained for Case 1 and 2 Waters From a Spectral Deconvolution Model Applied to In Situ IOPs and Radiometry. Earth and Space Science, 10 (12), art. no. e2022EA002815. DOI: 10.1029/2022EA002815
Citation: https://doi.org/10.5194/essd-2024-295-RC3 - AC5: 'Reply on RC3', Jaime Pitarch, 16 Oct 2024
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
423 | 83 | 140 | 646 | 14 | 15 |
- HTML: 423
- PDF: 83
- XML: 140
- Total: 646
- BibTeX: 14
- EndNote: 15
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1