A Dataset for Investigating Socio-ecological Changes in Arctic Fjords
by Schlegel, R. W., and J.-P. Gattuso
Summary:
This manuscript attempts to provide a summary and description of the first version of an inter-disciplinary observational data base of some (7) European fjords within the EU project FACE-IT. The fjords were selected to represent different stages of accelerated climate warming and its impacts on the fjords as far as it pertains to its oceanography, its ice cover (be it land of sea ice), selected biogeochemistry observations and socio-economic aspects. The manuscript provides some information about from where data were obtained, gives a few aspects of re-formatting issues and reflects briefly on the potential the data base may have.
Upfront comment:
In your reply to the previous review you write: "That being said, a live version of the dataset will be directly benefiting from your comments, and is available via a user interface that may be accessed via the WP1 website by clicking the ‘Data access’ tab." I can click but nothing happens. There is a time-out and I am not able to connect.
Data Comments
- I could download the summary file from PANGAEA but was not able to downloaded the all_files.zip Version. I needed to go via the html representation of the data summary webpage to finally download the *.csv files.
- I find the order of the columns confusing. Any interested user wants to have the time and geolocation within the first columns, followed by the data. Any source information, date of data set access, comments, limitations, constraints whatsoever are better to be shown at the end, i.e. the rightmost columns.
- Did you check how routines such as cdos, IDL, python and the like handle "NA" instead of a digital missing value, e.g. -99.9 for instance?
Cryosphere data:
- contains variables which do not come with a latitude / longitude information. This complicates automated (FAIR) usage.
- The file has a "date/time" column but is also has time as dimension along the rows in form of the "sea ice cover whatsoever proportions" per month and year. I am not convinced that it is overly useful to do it this way. You could consider to put these statistical measures into a separate .csv file.
- Naming given in the topmost row is unclear. What is "EsEs", what "EsEs acc"? Where is the difference between "Ice cover [%]" and "Ice cov [%]"? What are "sea ice cover (mean proportion)" and "Sea ice cover (sd proportion)"? Why is it "sea ice cover" in those columns but only "ice cover" in the others? This is inconsistent. What is "IP"? Where is the need to have more than 4 quantities describing the areal sea ice cover of the respective fjord? This is rather confusing and should have been solved in the process of amalgamation. An "ice conc [tenths]" is simply "Sea ice concentration [%]" but the values are 0, 10, 20, ... 100.
- Please provide all quantities listed in that row with a unit. If there is not unit because the quantity is dimensionless either write [1] or [none]. Writing [1] seems more universal to me.
- I don't find any column describing flags or data quality issues.
- The latitude/longitude columns of Kohler, J. & Moholdt, G. (2021) Mass balance of Svalbard glaciers [Dataset]. Environmental monitoring of Svalbard and Jan Mayen (MOSJ). Accessed: 2022-04-26. have very strange values when looking at the data set in Excel; also the next and last data set. When using other tools to view the data this problems does not occur.
- Data of column "ice cover [%]" are shown as a datum occasionally, e.g. Line 15: 29. Apr in Excel. The same applies to other quantities. When using other tools to view the data this problems does not occur.
- I did not open the PHYSICAL file because with 2.5 Gbyte it is a bit large to browse through and to work with. But I looked into the socio-economic file and into the chemical oceanography. What strikes me in the latter that there are 3 different pCO2 variables or quantities listed and it is not clear to me why. Also why we have separate columns for NO2 and NO3 but then a combined NO2 + NO3 one.
- I remember from the text of the ESSD manuscript that you needed to change variables and/or units in the chemistry part, if I am not mistaken. I could not see any column that tells me whether the data are see are original data with original units or whether these have been created and/or modified from the original data using which (?) process and/or routine. One way to do that could be an extra-column that denotes information whether this is original or modified data by a simple 1/0 flag.
- I guess what would be really helpful for any kind of user is if you could link the content in the .csv files - in terms of the names of the quantities given - much better to your metadata database by, for instance, avoiding using acronyms but using the same full name and terms that you used in the running text of either the ESSD manuscript or the parts of the FACE-IT web page.
General Comments
GC0: I have serious concerns about the way how the data files are constructed and also about the way the content is structured and explained (see my comments above). Also, from the comments of the reviewers to the previous submission plus the way the data base constructed I am not overly convinced that this data base is in a form and has the maturity that it should already advertised with ESSD paper. To my opinion, the authors should wait until the last version of the data base is ready and uploaded to PANGAEA. At that point I can imagine that the majority of the open points are answered and any inconsistencies (see above but see also below) are mitigated and both, the data base but also the description of the data base, its content, the processing steps involved eventually and other important scientific but also formal aspects is mature enough that it warrants to place a manuscript into the ESSD. I just don't think that data and manuscript are ready for publication.
GC1: I have concerns with motivating this manuscript and the data base with FAIR principles. These are, to my opinion, not overly strictly followed with the data base created. The Interoperability and the Reusability are not sufficiently well taken into account (see also GC2).
GC2: The manuscript repeatedly mentions that data were modified before they were amalgamated in the data base. This modification ranges from changing units over computing temporal averages to simply discarding data because of having credibility issues with their validity. All these processing steps seem to go without an adaequate documentation of what has been done and why. Criteria used to, e.g. discard data, are not described. Neither the manuscript nor the data base provides sufficiently well transparent information about these steps; QC flags appear not to be provided along with the data set.
GC3: Even though the authors are speaking of an amalgamation one can find 4 different quantities for basically the same variable (here sea ice concentration) or 3 different version of the pCO2. This is not sufficiently well motivated and/or solved. A potential user might easily get lost, not knowing which of the 4 such data sets they should use.
GC4: The description of categories and how they were defined needs to be improved. The description of why the drivers selected were selected and the linkage between these drivers and the quantities could be improved. Whether drivers is the correct term instead of using indicators could be discussed.
GC5: I find several statement that are not correct and need to be re-written.
Please see my specific comments for more details.
Specific Comments
Abstract in general: Reading the abstract immediately posed two questions that I would like to be informed of when reading an abstract.
- 1) You write about "in-situ" data. While I can imagine from the response to the reviewer's comments document what the quantities are, it is important that you tell the reader which kind of data you are dealing with in your data paper.
- 2) You write about "the key drivers" of change in Arctic fjords but you remain generic here. Please be more specific so that the reader can connect the quantities included in your data set with these drivers.
L43-56: This paragraph lacks a clearer statement about the focus of this manuscript and why the focus is on (which?) in-situ observations. You are using the term "monitoring of the polar North". What comes immediately to my mind here are remote sensing techniques that at least allow to monitor (aka regularly observe) a suite of geophysical quantities that describe the status of the system. To these belong snow and land ice cover, snow thickness and/or snow water equivalent, melt onset, end and duration of the melt period and of course various sea ice and ocean surface parameters such as sea ice concentration, freeze-up and break-up, land-fast sea ice coverage, to mention a few. All these you don't touch in your manuscript according to the introduction. While this is of course ok you need to motivate this more clearly. From the introduction I also have difficulties to understand the scope you want to cover with your amalgamated data set; it is not sufficiently clear whether you are targeting quantities from ocean physics and biogeochemistry, whether you include meteorological measurements, whether you want to include biology and so forth. Hence the scope is unknown and leaves the reader in a vacuum about what to expect. I recommend to change this accordingly.
A second issue I would like to bring up is your notion of the FAIR principles. I am wondering whether you could elaborate more on that already in the introduction - simply because this has many aspects. When I am thinking about the kind of data that you aim to merge in your data set, then the most pressing issue that comes into my mind is metadata ... metadata that accurately describe how which measurements were carried out with which instrument for which period at which specific locations; metadata that describe limitations and errors of these measurements; metadata that provide information about how data were processed and/or analyzed to reach a more useful level of maturity - including modifications such as computing daily and/or monthly means, changing units or deeming data as not reliable. While a collection of air temperature measurements would be sufficiently difficult in this respect already (different manufacturers, different shielding of the instrument, different locations and environmental conditions of placement and so forth), I can see substantially more complexity when - what I assume at this point - you are trying to merge observations across disciplines, instrument types, scales and so forth. Therefore, I recommend that you raise the flag of this complexity in a more clear way already in the introduction. A reader would like to know upfront whether you are aware of this complexity and whether they could expect a real benefit from using your merged data set instead of contacting individuals.
L78-84: "The structure .... transects."
- What you write here sounds very ambitious - simply because you ar brigding different disciplines. Especially the overarching aim of "conversion of the different units and methodogies into a project-wide standard" seems not credible to me at the moment.
- Apart from that I am wondering whether, when organising the system into "category - driver - variable", you took a look at which categorization is out there in the different communities. It might be a good idea to take a look at the work of WMO-GCOS panels AOPC, OOPC and TOPC and to also consider definitions of quantities or variables made elsewhere such as the so-called "essential climate variables" or the so-called "essential ocean variables". Another aspect of this categorization could also be to how easily the categorized data could be taken up by other communities such as the ocean modeling community or the community dealing with biogeochemical processes and/or modeling of mitigation and adaptation measures.
L108: "the databases detailed below" --> please be more specific and refer to the exact place where you list the databases. Is this by chance Table 2? If so then how many databases are sub-summed under "Others"?
Figure 1:
- I suggest to change "sea ice cover" to "sea ice cover duration" because this is the quantity that is shown here in form of a trend,
- You denote the time period covered by the SST data but not the time period covered by the MASIE data set. Please add this information.
- I am not sure I understand your motivation to show negative trends in ice cover duration in olive / greenish tone and not also in red. You could simply use an inverted version of the color table used for the SSI trends.
- While I can understand that the size of the study sites differs and you cannot plot them to scale, I suggest that you add a bar and/or scale that tells the reader what the dimension of each region actually is.
- Is there a reason why the box showing the Storfjorden is truncated and does not show its northern part?
Table 1:
- I have to admit that this is a quite diverse and with that in itself inconsistent collection of what you name "drivers". Also the choice of your "categories" is not straightforward. Before I go into details let me recommend that you write the full name of the category or drivers into the table cells and not in the caption and provide in addition in parenthesis the short name or abbreviation you are using henceforth, e.g. "socio-ecological (soc)". I note that you don't need a short name for some of the drivers anyways, e.g. "tourism", "fisheries".
- Let me begin with "categories". soc is ok and stands on its own. However, chem and bio seem to overlap each other and cryo and phys overlap strongly and it is not clear why you did not put these into one category named, e.g., "physical environmental conditions" because all these belong to physics and/or have physical units.
- Concerning "drivers" the first question I have is: "Why drivers"? Could also be "indicators". While you tried to explain why the selection of these "drivers" was not easy I did not get the rationale behind it. Looking at Table A1 (see my comments there as well, please) also is not super enlightening in this regard. I guess what this needs is to tie these drivers closer to the concept used for ECVs or even better EOVs (see one of my previous comments). Even though I can understand that your quieries of the data portals possibly provided different results depending on the variable name used (no surprise), it seems not to be of advantage to bring this indefiniteness into your structure. I have issues with: runoff as a driver under "cryo" --> runoff is not a part of the cryosphere // sea temp --> is too vague. Is this sea and lake water temperature at all depths including the surface? // Same question for salinity // "light" seems to contain what the projects's community deemed as important for the socio-economic context and for biology. But it is not just PAR that is of relevance but the full spectrum of the radiation, including the longwave radiation and the overall shortwave radiation budget - which I am sure is accessible via the data sets you searched. Hence my problem why "light" and not "radiation"? "prim prod" --> in the water? in/on the glacier ice from algae? in the sea ice? on land? // Same for biomass . I believe the readability and uptake of this data set description paper would increase a lot by being more specific here and by perhaps clarifying even more that you focus on the fjord WATERS and not on the fjord eco-system as a whole - which is at least my understanding at the moment.
- Let me also note that you mix different granularities. For instance "sea water temperature" is a physical variable in itself with a clearly associated unit. It is one quantity. The same applies to glacier mass balance and terrestrial runoff. Sea ice cover, however, is in itself a collection of at least 5 different physical quantities (concentration, thickness, type, age, surface state, snow thickness on top, motion, surface temperature, surface albedo, ...). Hence, there are "vague" drivers such as "sea ice cover" and there are well defined drivers such as "sea water temperature". This is - to my opinion - not solved in a credible, easy to understand way in this ESSD manuscript.
L153-154: "When data are available at ..." --> This I don't understand. If you in fact did so before creating your data set then you were actually modifying the majority of your data (about close to 7 000 000 of the 7 564 441 data points). Was this your intention? If you did so: How did you do it? Which tools did you use? How did you take into account data quality information provided along with the original sub-daily data? How did you deal with missing data? Did you document what you did in the respective metadata of your product?
Figure 2. I am not sure this way of showing how the data set is composed of data from the different categories is useful. Also panel B seems to be truncated.
May I suggest that, since you wrote already about the dominance of drivers "sea water temperature" and "salinity" in the text, you keep these out of this figure and focus on the contributions of the remaining half a million data points?
Table 2: I note that the total count for some fjords is not equal to the sum resulting from the columns of the different data sources given in the right hand side of the table.
L193: Does this "840" relate to the number of more than 1500 data sets found? I note that this number is smaller than the sum of all data sets provided in column "PANGAEA" in the table above (1142)
L219 / Section 2.3.5: After reading this section I also know which additional centers / organizations you tried to use for your data collection. Ideally, all data sets of the kind that are relevant for you have been included into the data bases you queried. I am wondering whether you attempted to look into i) project oriented data bases (national funding agencies and/or EU data portals) and ii) into international project data web pages or portals with content of researchers outside the EU - namely of Canada and the U.S.?
L232: Not clear, what are "direct glacier mass balance data"?
L271: I assume that the socio-economic data fall under the type of data "in situ". Is this a credible approach that goes along with the definition what an "in situ" observation (!) is?
L276-282: As mentioned above in an earlier comment already: This computing the averages of the data - apparently over time but also over spatial intervals is a serious modification of the data set, changing its nature and changing its statistics. How FAIR, i.e. reproducible are the data now? How easy would an interested user be able to invert the averaging process to work with the native data?
I am wondering about the "<0.1% of the space dedicated to the other 12 drivers" because about half a million data sets (or data points) of the total of over 7 550 000 seem to describe these other 12 drivers.
L287-289: "This was necessary ... process." --> I am not sure everybody understands what you did in these cases. Would you mind to provide an example in the text?
L291-296: This paragraph describes another, in part quite serious modification of the data set which contradicts the concept of a simple amalgamation. Instead of removing apparently implausible data one could simply have added a component to the QC flag which states that your (!) post-processing and/or QC check of this particular data caused doubts in its validity. However, I would be rather hesitant to simply remove data of observations in an environment which you yourself claim as one of the fasted changing ones in the Arctic realm, paving the way for unexpected, extreme observations that apparently fall out of the known range.
L298-305: This paragraph contains confusing statements and I suggest to remove it completely. It is not needed - except perhaps the information that data from the social-economical sector are often collected at an annual basis only and therefore the count (and not the values) of data sets out of this category are comparably low and are possibly connected with the release data of the statistical numbers in form of a report or assessment or yearbook.
L331-334 and the paragraph beginning in L342: "Measurements of winter ice cover ... sites. ... Comparisons of in situ sea ice cover across ..." --> These statement are contradicting notions from data centers such as the NSIDC, the AWI, met.no or other data centres. Sea ice observations - especially of the sea ice concentration (which is the basis for the MASIE data you employed) - are available daily based on satellite remote sensing techniques. The nature of this quantity, being a fraction of the ocean surface covered with sea ice, makes in-situ measurements, i.e. at a specific location not overly useful - hence the remote sensing approach. I note that for the scale of some of your fjords satellite sea ice concentation data would be limited in quality they exist for a number of the fjords you are investigating. In any case ice charts of the Danish and/or Norwegian ice services provide this information. Hence what I recommend is that you reformulate your statement given here because as written it is not correct.
I can see in the next paragraph that you included the MASIE product and derived a sea ice cover (concentration?) product from that. Certainly, such an approach can only be a surrogate for a real sea ice concentration product, e.g. from OSI SAF (OSI-450) or the UHH-IFREMER ASI algorithm sea ice concentration product, because MASIE is kind of a classification using sea ice concentration data above a threshold of 40% to define a high-resolution extent product. It has its limitations. In short, I recommend that you clearly state how you backwards computed the sea ice concentration from the MASIE product and what the potential uncertainties are.
- There are several sea ice thickness products available for the Arctic Ocean - hence your notion is not correct. What is true, however, is that it is notoriously difficult to derive sea ice thickness from satellite remote sensing data in general and in fjords in particular. Therefore, within the fjords there will possibly only be very few data - most likly only for the Storfjorden.
Again I would like to point out to you that Danish and Norwegian ice services provide information about the ice cover around Svalbard and Greenland.
What is so bad including remote sensing observations based products into your data base? Especially sea surface temperature data (see your Figure 1) are of good quality and provide daily information about the SST also within fjords for periods without sea ice cover.
L368-371: What is your motivation to average over all pixels found in the respective fjords? Wouldn't it - in view of the in-situ observations - be more meaningful to provide SST time series for individual grid cells? The relative weight of in-situ sea water temperature observations included in the database cannot be the reason.
L387: "so it was necessary ... convert data as necessary." --> Here is another location in your publication where I stumble over modifications of the data without a clearly written down way of how the modification was done. Are all the steps done included into the metadata of the data set? Since you hooked up on the FAIR principles I am wondering how easy a user would be able to rewert your conversions if need be. I am also wondering whether the QC flags (are there any?) contain the required information.
- Apart from that I am indeed wondering whether all aspects that are in some way or the other related to which processing / flagging / correction steps should not find their way into the Methods section. And these should then be expanded so that the reader understands better the various steps carried out.
L406-411: I note that you complemented your data set with satellite remote sensing data of both sea ice concentration and sea surface temperature. At the OB.DAAC you will find other satellite products available since 2000 of, e.g., the Chl a concentration - among other products of the same kind. I am wondering whether you should not consider to include such data as well.
L435-440: I am wondering whether you at all considered to get into contact with indigenuous people to include information based on their traveling, hunting, and other life habits into this data base. Wouldn't this be a super interesting supplement to the rather one-dimensional industrialized nations dominated view about tourism and fisheries?
L496: "oceanography" --> I could be wrong, but I have the impression that now I read for the first time that you (all the time) have exclusively been writing about physical OCEANOGRAPHY (and not just physics or physical drivers); same applies to chemical.
L530 / Table A1:
I took a look at this table and have to admit that I don't understand what you mean by "cleaned names". I find this table in parts very confusing because it uses abbreviations that are not fully explained and it contains examples of - from a climate researcher point of view - multiple variable names for the same physical (or other) quantity. The best example of this is the sea ice cover. I also note that apparently you did not make an effort to SI units as much as possible where deemed useful, e.g. in case of the sea ice thickness which should have unit "m" but is listed as having unit "cm".
Typos / Editoral Comments
L10: "GLODAP" --> Please explain what this is.
L43: "Schlegel et al, 2023" --> this self-citation of a more fjord related work with a broader scope should be replaced by a publication that is to the point of the statement given in the sentence.
L142/143: "After a couple ..." Please check this sentence. It reads a bit strange.
L145: "direction" --> "directions" ?
L149: I note that this number 1565 does not comply to the sum of data sets found in the table below (1259).
L231: "as.csv files" --> "as .csv files"
L257: "however;the final dataset" --> "however, the final dataset"
L324: What is a "total site"?
L395: Why do you now come back to "FAIR" and stress it here. I suggest to remove that.
L444/445: What do you mean by "in a course analysis"? Was this work done in the context of a lecture?
L454. "accordingly.Finally ..." --> "accordingly. Finally ..."
################ Supplement:
L59: "downard" --> "downward" |