the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Global 30-m seamless data cube (2000–2022) of land surface reflectance generated from Landsat-5,7,8,9 and MODIS Terra constellations
Abstract. The Landsat series constitutes an unparalleled repository of multi-decadal Earth observations, serving as a cornerstone in global environmental monitoring. However, the inconsistent coverage of Landsat data due to its long revisit intervals and frequent cloud cover poses significant challenges to land monitoring over large geographical extents. In this study, we developed a full-chain processing framework for the multi-sensor data fusion of Landsat-5, 7, 8, 9 and MODIS Terra surface reflectance products. Based on this framework, a global, 30-m resolution, and daily Seamless Data Cube (SDC) of land surface reflectance was generated, spanning from 2000 to 2022. A thorough evaluation of the SDC was undertaken using a leave-one-out approach and a cross-comparison with NASA’s Harmonized Landsat and Sentinel-2 (HLS) products. The leave-one-out validation at 425 global test sites assessed the agreement between the SDC with actual Landsat surface reflectance values (not used as input), revealing an overall Mean Absolute Error (MAE) of 0.014 (the valid range of surface reflectance values is 0–1). The cross-comparison with the HLS products at 22 Military Grid Reference System (MGRS) tiles revealed an overall Mean Absolute Deviation (MAD) of 0.017 with L30 (Landsat-8-based 30-m HLS product) and a MAD of 0.021 with S30 (Sentinel-2-based 30-m HLS product). Moreover, experimental results underscore the advantages of employing the SDC for global land cover classification, achieving a sizable improvement in overall accuracy (2.4 %~11.3 %) over that obtained using Landsat composite and interpolated datasets. A web-based interface has been developed for researchers to freely access the SDC dataset, which is available at https://doi.org/10.12436/SDC30.26.20240506 (Chen et al., 2024).
- Preprint
(18878 KB) - Metadata XML
- BibTeX
- EndNote
Status: closed
-
CC1: 'Comment on essd-2024-178', Diana Jones, 29 Jun 2024
The DOI link to the shared dataset is not accessible.
Citation: https://doi.org/10.5194/essd-2024-178-CC1 -
CC2: 'Reply on CC1', Jie Wang, 30 Jun 2024
Thank you for your interest in our data.
I tried to access the data via https://doi.org/10.12436/SDC30.26.20240506 and it was redirected to https://data-starcloud.pcl.ac.cn/resource/26 correctly.
The web page contains a Resource Description of the data. Please find out how to access data from the given instructions. Thank you!
Citation: https://doi.org/10.5194/essd-2024-178-CC2 -
CC3: 'Reply on CC2', Diana Jones, 02 Jul 2024
Thank you for updating the data access URL. However, I still have two questions.
First, the public SDC covers only a very small area of the world. It appears that there is no data available for the entire United States.
Second, the time frequency of the public SDC does not seem to be daily; instead, it appears to be every four days or even eight days.
Citation: https://doi.org/10.5194/essd-2024-178-CC3 -
AC1: 'Reply on CC3', Shuang Chen, 02 Jul 2024
Thank you for your comment. We have provided a web-based interface that allows users to access and customize the SDC for any location (except Antarctica) and with any temporal frequency (up to daily). Detailed information and instructions can be found at the following webpage: https://doi.org/10.12436/SDC30.26.20240506 or https://data-starcloud.pcl.ac.cn/resource/26. Please search the key words "Test Account" and "Instructions". If you have any other questions, please feel free to contact us.
Citation: https://doi.org/10.5194/essd-2024-178-AC1 -
CC4: 'Reply on AC1', Diana Jones, 03 Jul 2024
Thanks for your explanation.
1. As you suggested, I used the test account you provided and attempted to submit an order for data download. However, I was unable to do so successfully. The website displayed the message "Failed to write order to OBS." I tried various locations worldwide, but all attempts were unsuccessful. Based on this situation, I am uncertain whether the data actually exists. This seems to contradict the aims and scope of the ESSD journal. At the very least, the manuscript should not be published or reviewed further until all the claimed data are fully disclosed. Since the data results and authenticity cannot be verified, only theoretical and methodological discussions can be carried out at this stage.
2. The manuscript states that the accuracy of the SDC dataset has an MAE of 0.014. From the perspective of review and publication, the complete test dataset should be disclosed for public verification by third parties. This is a common and recognized practice in the field of image classification and segmentation; otherwise, the authenticity of the accuracy result cannot be confirmed. Additionally, considering the issues mentioned in point 1, if most of the data do not exist, how was the accuracy verified? This raises a significant concern. Furthermore, I noticed that the website also provides land cover data by year. However, the manuscript appears to only provide an OA indicator. Which year's accuracy is this? Is using OA alone sufficient to verify the accuracy of such a large dataset?
3. If the data are to be made publicly available for free, it would be best to provide direct download links (similar to practices by USGS, ESA, and other remote sensing data providers) to facilitate user access. Currently, it seems that most of the data are inaccessible.Citation: https://doi.org/10.5194/essd-2024-178-CC4 -
AC2: 'Reply on CC4', Shuang Chen, 04 Jul 2024
Thank you for your valuable feedback.
1. We have found that the "Standard" mode cannot process multiple order submissions simultaneously, which triggered the error "Failed to XXX". We apologize for the inconvenience, and we are working on solving this issue and improving the serving capability of our system. Please select the "Quick" order mode, which is more stable at this stage.
The data volume of a global daily SDC from 2000 to 2022 would exceed 22 PBs. There is not yet a cost-effective approach to store such a vast amount of data. Therefore, we use a distributed storage system as a cached data pool to store the most recently accessed SDC data. Users can access the SDC for their AOI through the provided interface. If the requested SDC data is not available in the cached data pool, the system will generate the SDC from raw data in real time and then return the generated SDC data to users.
Considering the large data volume and complex data storage mechanisms, we are not able to provide direct download links. The order-download mode is also a common practice as adopted by ESA’s EO Browser and USGS’s EarthExplorer platforms. We apologize for the inconvenience and will continuously work on improving the usability of our system.
2. As described in our manuscript, we conducted a leave-one-out validation on the reconstructed SDC dataset, which reveals an overall MAD of 0.014. We have uploaded the related test dataset in the Download section of our website. The data used in the cross-comparison with NASA’s HLS dataset are also available.
This manuscript focuses on the development and validation of the SDC dataset, and the land cover data provided on our website were generated to test the capability of SDC for global applications. The Overall Accuracy (OA) indicator mentioned in Table 9 is the result of the comparative experiment, and it is not related to the provided land cover data files. We briefly mentioned the land cover data in the Data Availability section. Other researchers can download these data for reference if they are interested in. We will improve our mapping framework and publish complete land cover products formally in our future work.
3. We have provided a web-based interface that allows users to access and customize the SDC for any location (except Antarctica) and with any temporal frequency (up to daily). Detailed information and instructions can be found at the following webpage: https://doi.org/10.12436/SDC30.26.20240506 or https://data-starcloud.pcl.ac.cn/resource/26. Large-scale SDC data processing requires much greater computing capacity that exceeds the capability of this web-based interface. If users need large-scale SDC for their research, please feel free to contact us for cooperations. We have a computing cluster that can efficiently reconstruct global-scale SDC data for subsequent applications.
Citation: https://doi.org/10.5194/essd-2024-178-AC2 -
CC5: 'Reply on AC2', Delmar Wilson, 05 Jul 2024
1. Before I raised concerns, the DOI provided by the authors for the SDC dataset was invalid, and no validation data was supplied. After my inquiry, the authors updated the data access URL and uploaded the validation data. However, when I attempted to download the validation data, the system indicated "No access permission." This makes me question whether the authors genuinely intend to make the dataset public. Sharing data publicly is a fundamental requirement for articles published in the ESSD journal. Providing validation data is also essential for assessing accuracy; without it, the dataset's authenticity cannot be verified.
2. Issues related to data storage and website functionality are the authors' responsibility. These are not my primary concerns and are unrelated to the data's availability. Avoiding the provision of a complete dataset due to its large size is unacceptable. Unless there is a reasonable third-party verification, we are justified in questioning the data's authenticity. If the current storage mechanism and website cannot meet the requirements, resulting in the dataset not being publicly available, then this paper should be retracted. The paper can be resubmitted once these issues are resolved and the complete dataset is provided.
3. Although this dataset closely resembles existing data products from USGS and ESA, such as the HLS dataset, I am still interested in some of the unique characteristics claimed in this paper. I hope the authors can adopt a sincere and transparent attitude to promptly release the complete dataset in an accessible and verifiable manner, in accordance with ESSD journal publication requirements and data openness regulations. This will allow third-party validation and oversight to alleviate concerns regarding data authenticity and accuracy. I believe this approach will better advance the field. I am also willing to offer any assistance and support within my capabilities to facilitate the successful publication and application of this paper.
Citation: https://doi.org/10.5194/essd-2024-178-CC5 -
AC3: 'Reply on CC5', Shuang Chen, 09 Jul 2024
Dear Delmar Wilson,
Thank you for your comment.
I am sure that I have not received any comment/message from you before. Or, do you mean that the previous comments posted by Diana Jones are also from you? Anyway, I will respond to all your concerns here.
- The complete message you received should be “No access permission. Please login in first.” So, maybe you can login in using the provided test account. Then, you will be able to download the validation data for as many times as you need. If it still doesn’t work, try to log out and then login in again.
- The DOI was registered on May 6, 2024. We have not and are not able to make any modifications on the DOI since that time. We have tested and verified the accessibility of this DOI before. We don’t know why you cannot access that DOI at that time, maybe you can check your network conditions.
- As described in the manuscript, the SDC dataset for any location(except Antarctica) and any time (2000-2022) are available via the provided interface on our website, which meets the data access requirements of ESSD. If you have any questions on accessing the SDC dataset, please feel free to contact us at any time. We are willing to provide assistance and information needed.
- Before the submission of this manuscript, I had made available the validation data on another website (sdc.iearth.com, as described in the manuscript). After receiving your request, I made a copy of these validation data and uploaded it onto the project website, so that it be found by users more easily.
- Thanks for your interest in this research work. We are working on improving our system, so that it can better serve all users within the community. We are grateful for your willing to offer assistance on improving the data access system. Could you provide us with your contact information? We can discuss about it in detail.
Citation: https://doi.org/10.5194/essd-2024-178-AC3
-
AC3: 'Reply on CC5', Shuang Chen, 09 Jul 2024
-
CC5: 'Reply on AC2', Delmar Wilson, 05 Jul 2024
-
AC2: 'Reply on CC4', Shuang Chen, 04 Jul 2024
-
CC4: 'Reply on AC1', Diana Jones, 03 Jul 2024
-
AC1: 'Reply on CC3', Shuang Chen, 02 Jul 2024
-
CC3: 'Reply on CC2', Diana Jones, 02 Jul 2024
-
CC2: 'Reply on CC1', Jie Wang, 30 Jun 2024
-
CC6: 'Comment on essd-2024-178', Diana Jones, 10 Jul 2024
The previous conversation was too long, so I am starting a new comment for easier reading.
1. I have tried logging in with the provided test account multiple times, but I still cannot download the validation data. Either the system does not respond at all, or it displays a "Download failed" message. Have the authors successfully downloaded the validation data using the method you described?2. The authors have provided many different URLs for data access, which is very confusing. Additionally, the data access system provided by the authors is user-unfriendly and highly unstable. Throughout the download process, you may encounter various issues at each step. These problems significantly increase the difficulty of obtaining the data and may even render the data inaccessible. I strongly recommend that other readers participate in testing the downloadability and accessibility of the data.
3. I do not believe that the accessibility and transparency of the data in this manuscript comply with the ESSD journal's data policy.
For example, the ESSD journal requires that,
"upon submission, all data must be directly accessible through link(s) in the manuscript;"
This means that upon manuscript submission to the journal, all data supporting the research results must be directly accessible via clickable links, without additional steps or permissions.
However, I do not think your data meets this criterion. As mentioned earlier, even after you submitted the manuscript, and even up to now, I have encountered many issues accessing the data website and downloading the data. For instance, the system frequently displays messages such as "Failed to write order to OBS," "Download failed," and "Failed to get the file under the directory."
Additionally, the ESSD journal also requires that,
"upon submission, authors must certify some form of fully anonymous review access, directly at the chosen repository (i.e., no registration, name, email, or other information is required of reviewers as they access the data);"
This means that data should be accessible without requiring registration or providing any information. However, your current data access method does not comply with this policy. In your system, it is necessary to register and log in to access the data. Otherwise, it prompts "No access permission. Please log in first." Sometimes, even after logging in, the data cannot be accessed or downloaded.
4. I am willing to offer assistance but would like to know the specific progress and timeline for the system improvements. I hope the authors can take a candid approach to make substantial improvements to the system. If there are any bottlenecks or difficulties, we can work together to resolve them. However, the author's responses so far have consistently denied the existence of these objective issues and have not resulted in any substantive improvements.
Citation: https://doi.org/10.5194/essd-2024-178-CC6 -
CC7: 'Reply on CC6', Yamal Vivian, 11 Jul 2024
I tried to access the data following provided instructions, and successfully downloaded the data.
Citation: https://doi.org/10.5194/essd-2024-178-CC7 -
CC12: 'Reply on CC7', Diana Jones, 14 Jul 2024
I'm referring to the validation dataset. I still haven't been able to successfully download it. Have you managed to download the validation dataset?
Citation: https://doi.org/10.5194/essd-2024-178-CC12
-
CC12: 'Reply on CC7', Diana Jones, 14 Jul 2024
-
CC8: 'Reply on CC6', Ethan Carter, 11 Jul 2024
I encountered a similar issue. I also failed to download the data, and the validation data download was unresponsive.
Citation: https://doi.org/10.5194/essd-2024-178-CC8 -
CC9: 'Reply on CC6', Congcong Li, 12 Jul 2024
I successfully downloaded the data using the test account and the instructions provided.
-
CC11: 'Reply on CC9', Diana Jones, 14 Jul 2024
I'm referring to the validation dataset. I still haven't been able to successfully download it. Have you managed to download the validation dataset?
Citation: https://doi.org/10.5194/essd-2024-178-CC11 -
AC6: 'Reply on CC11', Shuang Chen, 04 Aug 2024
Thank you for your feedback and for bringing this issue to our attention. The system may take longer time to respond due to the large file size of the experimental data. We apologize for any inconvenience this has caused.
We have also uploaded the experimental data to OneDrive, you can access it through this URL: https://1drv.ms/u/s!AvA96w8h5Q9sgtEBEk9J5D0nKa5tsA?e=TdNRk7.
If you continue to experience issues, please let us know, and we will assist you further.
Citation: https://doi.org/10.5194/essd-2024-178-AC6
-
AC6: 'Reply on CC11', Shuang Chen, 04 Aug 2024
-
CC11: 'Reply on CC9', Diana Jones, 14 Jul 2024
-
CC7: 'Reply on CC6', Yamal Vivian, 11 Jul 2024
-
RC1: 'Comment on essd-2024-178', X. Zhu, 12 Jul 2024
General comments
This paper presents the development of a global 30-m seamless data cube by fusing Landsat and MODIS data, building on the authors' previous methods. This work is highly valuable for future applications requiring fine-resolution time series data. Despite the numerous algorithms developed for fusing Landsat and MODIS data or filling gaps in Landsat data, no global datasets generated by these technologies are currently available. However, the paper has several issues that need to be addressed.
Specific comments
- Page 3, Line 70: For Landsat interpolation methods, there are techniques that do not require numerous clear-sky observations. For instance, the nearest similar pixel interpolator method only needs one clear-sky observation.
- Introduction: Before the last paragraph, the authors should discuss the current research gap. Specifically, they should explain why a full-chain processing framework and such fused data are necessary. Additionally, they should outline the challenges users face when using current methods to produce data independently.
- Page 11, Figure 3: When performing harmonization, was Landsat upscaled to the resolution of MODIS? Figure 3 suggests that both datasets have the same resolution. A linear transformation model was used; how do the authors address the issue when the linear model is not statistically significant?
- Major Differences Between uROBOT and ROBOT Models: What are the primary differences between the uROBOT model and the previously developed ROBOT model by the authors?
- Accuracy of the Time Series Model in Eq. 6: The accuracy of the time series model in Equation 6 could be affected by the time interval of the data. In cloudy regions, if the data is too sparse, is the result reliable? How do the authors address diverse changes when the data is sparse?
- Eq. 7: The coefficients from MODIS are used for Landsat. This approach may be acceptable if the land is homogeneous, but it lacks a clear mechanism for complex landscapes. Landsat pixels have different temporal dependencies compared to coarse pixels. If this issue affects the reliability of the final product, the end users should be notified.
- Eq. 9: The equation redistributes the residual to handle land cover changes, but the residual is at a coarse scale. How do the authors address changes at the fine pixel scale?
- Page 15, Line 355: The three accuracy metrics mentioned cannot adequately assess the spatial context preserved by the fused images. Some spatial metrics, such as edge features (see examples in https://doi.org/10.1016/j.rse.2022.113002), should be presented.
- Tables 4-6: Is the accuracy assessment conducted for each site? The tables only show the mean value. What is the range of these indices?
- Figure 14: The RMSD values in Figure 14 are on a different scale compared to Tables 4-6.
- Table 9: It would be beneficial to include a figure showing examples where only the Seamless Data Cube (SDC) can accurately classify the pixels, whereas other data cannot.
Citation: https://doi.org/10.5194/essd-2024-178-RC1 -
AC4: 'Reply on RC1', Shuang Chen, 01 Aug 2024
Dear X. Zhu,
Thank you very much for the time you spent on our manuscript and for the useful comments. In the attached file, we provide responses to your comments/suggestions and point to the made changes in the revised manuscript.
Regards,
The authors
-
CC10: 'Comment on essd-2024-178', James Anderson, 12 Jul 2024
Many thanks to the authors for providing such a dataset. However, I have some concerns regarding its quality.
1. The image quality in many areas seems to be poor, as shown in the example below. Severe cloud contamination and strong spatial stitching effects were observed.
2. In many regions, the reconstructed time-series images appear unable to reflect the real changes in land surface. The two images below, taken on day 1 and day 86 of the year 2000, illustrate this issue. Despite being nearly three months apart, they show almost identical cloud and cloud shadow distributions. Besides, the cloud distribution appears nearly the same for every day between these two timestamps.
Citation: https://doi.org/10.5194/essd-2024-178-CC10 -
AC7: 'Reply on CC10', Shuang Chen, 04 Aug 2024
Thank you for your valuable feedback.
The issue you highlighted is primarily due to undetected residual clouds in Landsat imagery. We have informed readers about the impacts of atmospheric correction and cloud detection algorithms in Section 5.3 of the manuscript. The specific MGRS tile 20MQB is located in Brazil, a region frequently affected by cloud cover, with input data from the year 2000. The atmospheric correction algorithm LEDAPS for Landsat TM/ETM+ does not perform as effectively as the LaSRC for Landsat OLI (Vermote et al., 2018). Moreover, there are increased cloud omission errors in Landsat TM/ETM+ observations due to the absence of a cirrus band (Zhu et al., 2015a). Overall, the data quality of the SDC dataset is significantly better in most other regions and years.
Vermote, E., Roger, J.C., Franch, B., Skakun, S., 2018. LaSRC (Land Surface Reflectance Code): Overview, application and validation using MODIS, VIIRS, LANDSAT and Sentinel 2 data’s, in: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium. Presented at the IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, IEEE, Valencia, pp. 8173–8176. https://doi.org/10.1109/IGARSS.2018.8517622
Zhu, Z., Wang, S., Woodcock, C.E., 2015a. Improvement and expansion of the Fmask algorithm: cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sensing of Environment 159, 269–277. https://doi.org/10.1016/j.rse.2014.12.014
Citation: https://doi.org/10.5194/essd-2024-178-AC7
-
AC7: 'Reply on CC10', Shuang Chen, 04 Aug 2024
-
CC13: 'Comment on essd-2024-178', Emma Thompson, 18 Jul 2024
The web-based interface for the data archive is showing a black screen.
Citation: https://doi.org/10.5194/essd-2024-178-CC13 -
CC14: 'Comment on essd-2024-178', Isabella Miller, 20 Jul 2024
After carefully examining the validation data provided by the authors, I found significant transparency issues. The data is presented as NumPy arrays but lacks temporal or spatial coordinates. Additionally, there is insufficient explanation regarding the format and structure of the validation data, making it difficult to understand its content and usage. I am unsure if this approach complies with standard practices. As it stands, this method of data disclosure seems to hinder other researchers from effectively verifying the results. Therefore, I recommend temporarily retracting the paper until the transparency of the data is improved.
Citation: https://doi.org/10.5194/essd-2024-178-CC14 -
RC2: 'Comment on essd-2024-178', Anonymous Referee #2, 26 Jul 2024
This submission aims at building a daily 30-m Landsat data record (2000-2022) based on Landsat 5,7,8,9 and MODIS for gap filling. Authors made sure that data are pre-processed (with AC and BRDF) and harmonized. The main pillars of the data is that they are daily at 30 m and span the 20+ years. While this dataset will certainly find a lot of applications and might be valuable to the community, and I applaud authors for taking on this challenge, in my opinion, the description of this dataset is exaggerated (e.g., daily component) and authors do not provide evidences that this is actual daily data. You cannot use 8-(16-)-day data to prove that your dataset is actual daily because you should have available daily reference data. I think this is one of the areas that authors did not work through and claim that your dataset resemble actual daily data is false. Maybe, it would have made sense to focus on regular 8-day composites. Maybe, the problem of generating daily data from 8-/16-day Landsat does not have a solution. Certainly, MODIS can help in certain regions, but I have huge doubts about its applicability globally given discrepancies in spatial resolution. Furthermore, even MODIS, majority of users use 8-day or 16-day composites given that MODIS acquires twice per day (Terra/Aqua). It's done because of clouds and because decrease in spatial resolution of the viewing angle (which increases with the cycle 1-15, and nadir only every 16 days). There are also some very dubious choices (e.g., moving the grid) . From CC comments, one can see obvious artifacts - it's understandable, from global product you can always find errors. But, as mentioned above, it's not real daily product, you did not provide evidences to claim it. I suggest authors to substantially re-work this and give a good thought on what real problem you are trying to solve and provide tangible solution that can be validated.
Issues:
1. "It is noteworthy that our adopted grid slightly deviates from the Sentinel-2 grid. Since the original Landsat coordinate system exhibits a half-pixel (15 meters) offset relative to the Sentinel-2 grid, we expanded and shifted the original MGRS grid by 15 meters in each direction to align with the Landsat coordinate system."
First, there is no shift; it's what is used for referencing a pixel value: center in Landsat and UL in S2 (in grid) (also what is used by default in GDAL). It was a poor decision to shift the grid. Whereas, if you selected MGRS as a coordinate grid, you should have re-projected Landsat (like HLS) into MGRS. Such shift can cause artifacts when comparing to the HLS data.
2. "Therefore, our approach aims at building multiple transformation models for each MGRS tile and each spectral band separately."
Building models per MGRS tile might introduce issues re spatial consistency. Did you check the impact of such an approach on the overlapping areas (between tiles)? How are consistent those temporally to reflect land cover changes?
3. I'm very skeptical about the applicability of MODIS gap-filling for Landsat on global scale. First, almost all existing approaches, including yours, does account for changes in spatial resolution with different angles. In reality, 500-m pixel can actually decrease up to 2 km one - see https://doi.org/10.1109/TGRS.2016.2604214
So, your assumption " the basic assumption of uROBOT is that the MODIS image 𝐶 can be accurately approximated by a linear combination of other similar MODIS images in the input timeseries data" is only valid under the condition that spatial resolution is invariant. That's not the case, especially in cloud-prone regions. Another issue that will not work in areas of small ag fields, e.g., Arica, SE Asia, etc.
I have seen multiple times examples when a bare ground field between two vegetative fields will be brightened in MODIS when resolution decreases. And used MODIS-based vegetation signal for Landsat will introduce huge errors. Therefore, a study must be conducted to explore how spatial resolution impacts restoration.
4. Section 5.3 must be re-written as it does not show limitations but rather what have not been done. There should be paragraphs re snow, re coastal regions and water, as AC algorithms do not work the best there (especially for snow as retrieval of aerosols is extremely difficult there). Furthermore, probably this product is not applicable to detecting rapid changes (daily) in land cover such as constructions as it will depend on actual acquisitions (and not blended daily). It will probably will not allow to detect "daily" burned fields or harvested fields because the signal will change in 2-3 days, or landslides, or iceberg movement, or fire propagation - anything that truly changes every hour or day. Again, in reality your daily product will not allow (please, prove me wrong!) detection of these events (at daily basis) which require true daily data.Citation: https://doi.org/10.5194/essd-2024-178-RC2 -
AC5: 'Reply on RC2', Shuang Chen, 01 Aug 2024
Dear anonymous Reviewer#2,
Thank you very much for the time you spent on our manuscript and for the useful comments. In the attached file, we provide responses to your comments/suggestions and point to the made changes in the revised manuscript.
Regards,
The authors
-
AC5: 'Reply on RC2', Shuang Chen, 01 Aug 2024
Status: closed
-
CC1: 'Comment on essd-2024-178', Diana Jones, 29 Jun 2024
The DOI link to the shared dataset is not accessible.
Citation: https://doi.org/10.5194/essd-2024-178-CC1 -
CC2: 'Reply on CC1', Jie Wang, 30 Jun 2024
Thank you for your interest in our data.
I tried to access the data via https://doi.org/10.12436/SDC30.26.20240506 and it was redirected to https://data-starcloud.pcl.ac.cn/resource/26 correctly.
The web page contains a Resource Description of the data. Please find out how to access data from the given instructions. Thank you!
Citation: https://doi.org/10.5194/essd-2024-178-CC2 -
CC3: 'Reply on CC2', Diana Jones, 02 Jul 2024
Thank you for updating the data access URL. However, I still have two questions.
First, the public SDC covers only a very small area of the world. It appears that there is no data available for the entire United States.
Second, the time frequency of the public SDC does not seem to be daily; instead, it appears to be every four days or even eight days.
Citation: https://doi.org/10.5194/essd-2024-178-CC3 -
AC1: 'Reply on CC3', Shuang Chen, 02 Jul 2024
Thank you for your comment. We have provided a web-based interface that allows users to access and customize the SDC for any location (except Antarctica) and with any temporal frequency (up to daily). Detailed information and instructions can be found at the following webpage: https://doi.org/10.12436/SDC30.26.20240506 or https://data-starcloud.pcl.ac.cn/resource/26. Please search the key words "Test Account" and "Instructions". If you have any other questions, please feel free to contact us.
Citation: https://doi.org/10.5194/essd-2024-178-AC1 -
CC4: 'Reply on AC1', Diana Jones, 03 Jul 2024
Thanks for your explanation.
1. As you suggested, I used the test account you provided and attempted to submit an order for data download. However, I was unable to do so successfully. The website displayed the message "Failed to write order to OBS." I tried various locations worldwide, but all attempts were unsuccessful. Based on this situation, I am uncertain whether the data actually exists. This seems to contradict the aims and scope of the ESSD journal. At the very least, the manuscript should not be published or reviewed further until all the claimed data are fully disclosed. Since the data results and authenticity cannot be verified, only theoretical and methodological discussions can be carried out at this stage.
2. The manuscript states that the accuracy of the SDC dataset has an MAE of 0.014. From the perspective of review and publication, the complete test dataset should be disclosed for public verification by third parties. This is a common and recognized practice in the field of image classification and segmentation; otherwise, the authenticity of the accuracy result cannot be confirmed. Additionally, considering the issues mentioned in point 1, if most of the data do not exist, how was the accuracy verified? This raises a significant concern. Furthermore, I noticed that the website also provides land cover data by year. However, the manuscript appears to only provide an OA indicator. Which year's accuracy is this? Is using OA alone sufficient to verify the accuracy of such a large dataset?
3. If the data are to be made publicly available for free, it would be best to provide direct download links (similar to practices by USGS, ESA, and other remote sensing data providers) to facilitate user access. Currently, it seems that most of the data are inaccessible.Citation: https://doi.org/10.5194/essd-2024-178-CC4 -
AC2: 'Reply on CC4', Shuang Chen, 04 Jul 2024
Thank you for your valuable feedback.
1. We have found that the "Standard" mode cannot process multiple order submissions simultaneously, which triggered the error "Failed to XXX". We apologize for the inconvenience, and we are working on solving this issue and improving the serving capability of our system. Please select the "Quick" order mode, which is more stable at this stage.
The data volume of a global daily SDC from 2000 to 2022 would exceed 22 PBs. There is not yet a cost-effective approach to store such a vast amount of data. Therefore, we use a distributed storage system as a cached data pool to store the most recently accessed SDC data. Users can access the SDC for their AOI through the provided interface. If the requested SDC data is not available in the cached data pool, the system will generate the SDC from raw data in real time and then return the generated SDC data to users.
Considering the large data volume and complex data storage mechanisms, we are not able to provide direct download links. The order-download mode is also a common practice as adopted by ESA’s EO Browser and USGS’s EarthExplorer platforms. We apologize for the inconvenience and will continuously work on improving the usability of our system.
2. As described in our manuscript, we conducted a leave-one-out validation on the reconstructed SDC dataset, which reveals an overall MAD of 0.014. We have uploaded the related test dataset in the Download section of our website. The data used in the cross-comparison with NASA’s HLS dataset are also available.
This manuscript focuses on the development and validation of the SDC dataset, and the land cover data provided on our website were generated to test the capability of SDC for global applications. The Overall Accuracy (OA) indicator mentioned in Table 9 is the result of the comparative experiment, and it is not related to the provided land cover data files. We briefly mentioned the land cover data in the Data Availability section. Other researchers can download these data for reference if they are interested in. We will improve our mapping framework and publish complete land cover products formally in our future work.
3. We have provided a web-based interface that allows users to access and customize the SDC for any location (except Antarctica) and with any temporal frequency (up to daily). Detailed information and instructions can be found at the following webpage: https://doi.org/10.12436/SDC30.26.20240506 or https://data-starcloud.pcl.ac.cn/resource/26. Large-scale SDC data processing requires much greater computing capacity that exceeds the capability of this web-based interface. If users need large-scale SDC for their research, please feel free to contact us for cooperations. We have a computing cluster that can efficiently reconstruct global-scale SDC data for subsequent applications.
Citation: https://doi.org/10.5194/essd-2024-178-AC2 -
CC5: 'Reply on AC2', Delmar Wilson, 05 Jul 2024
1. Before I raised concerns, the DOI provided by the authors for the SDC dataset was invalid, and no validation data was supplied. After my inquiry, the authors updated the data access URL and uploaded the validation data. However, when I attempted to download the validation data, the system indicated "No access permission." This makes me question whether the authors genuinely intend to make the dataset public. Sharing data publicly is a fundamental requirement for articles published in the ESSD journal. Providing validation data is also essential for assessing accuracy; without it, the dataset's authenticity cannot be verified.
2. Issues related to data storage and website functionality are the authors' responsibility. These are not my primary concerns and are unrelated to the data's availability. Avoiding the provision of a complete dataset due to its large size is unacceptable. Unless there is a reasonable third-party verification, we are justified in questioning the data's authenticity. If the current storage mechanism and website cannot meet the requirements, resulting in the dataset not being publicly available, then this paper should be retracted. The paper can be resubmitted once these issues are resolved and the complete dataset is provided.
3. Although this dataset closely resembles existing data products from USGS and ESA, such as the HLS dataset, I am still interested in some of the unique characteristics claimed in this paper. I hope the authors can adopt a sincere and transparent attitude to promptly release the complete dataset in an accessible and verifiable manner, in accordance with ESSD journal publication requirements and data openness regulations. This will allow third-party validation and oversight to alleviate concerns regarding data authenticity and accuracy. I believe this approach will better advance the field. I am also willing to offer any assistance and support within my capabilities to facilitate the successful publication and application of this paper.
Citation: https://doi.org/10.5194/essd-2024-178-CC5 -
AC3: 'Reply on CC5', Shuang Chen, 09 Jul 2024
Dear Delmar Wilson,
Thank you for your comment.
I am sure that I have not received any comment/message from you before. Or, do you mean that the previous comments posted by Diana Jones are also from you? Anyway, I will respond to all your concerns here.
- The complete message you received should be “No access permission. Please login in first.” So, maybe you can login in using the provided test account. Then, you will be able to download the validation data for as many times as you need. If it still doesn’t work, try to log out and then login in again.
- The DOI was registered on May 6, 2024. We have not and are not able to make any modifications on the DOI since that time. We have tested and verified the accessibility of this DOI before. We don’t know why you cannot access that DOI at that time, maybe you can check your network conditions.
- As described in the manuscript, the SDC dataset for any location(except Antarctica) and any time (2000-2022) are available via the provided interface on our website, which meets the data access requirements of ESSD. If you have any questions on accessing the SDC dataset, please feel free to contact us at any time. We are willing to provide assistance and information needed.
- Before the submission of this manuscript, I had made available the validation data on another website (sdc.iearth.com, as described in the manuscript). After receiving your request, I made a copy of these validation data and uploaded it onto the project website, so that it be found by users more easily.
- Thanks for your interest in this research work. We are working on improving our system, so that it can better serve all users within the community. We are grateful for your willing to offer assistance on improving the data access system. Could you provide us with your contact information? We can discuss about it in detail.
Citation: https://doi.org/10.5194/essd-2024-178-AC3
-
AC3: 'Reply on CC5', Shuang Chen, 09 Jul 2024
-
CC5: 'Reply on AC2', Delmar Wilson, 05 Jul 2024
-
AC2: 'Reply on CC4', Shuang Chen, 04 Jul 2024
-
CC4: 'Reply on AC1', Diana Jones, 03 Jul 2024
-
AC1: 'Reply on CC3', Shuang Chen, 02 Jul 2024
-
CC3: 'Reply on CC2', Diana Jones, 02 Jul 2024
-
CC2: 'Reply on CC1', Jie Wang, 30 Jun 2024
-
CC6: 'Comment on essd-2024-178', Diana Jones, 10 Jul 2024
The previous conversation was too long, so I am starting a new comment for easier reading.
1. I have tried logging in with the provided test account multiple times, but I still cannot download the validation data. Either the system does not respond at all, or it displays a "Download failed" message. Have the authors successfully downloaded the validation data using the method you described?2. The authors have provided many different URLs for data access, which is very confusing. Additionally, the data access system provided by the authors is user-unfriendly and highly unstable. Throughout the download process, you may encounter various issues at each step. These problems significantly increase the difficulty of obtaining the data and may even render the data inaccessible. I strongly recommend that other readers participate in testing the downloadability and accessibility of the data.
3. I do not believe that the accessibility and transparency of the data in this manuscript comply with the ESSD journal's data policy.
For example, the ESSD journal requires that,
"upon submission, all data must be directly accessible through link(s) in the manuscript;"
This means that upon manuscript submission to the journal, all data supporting the research results must be directly accessible via clickable links, without additional steps or permissions.
However, I do not think your data meets this criterion. As mentioned earlier, even after you submitted the manuscript, and even up to now, I have encountered many issues accessing the data website and downloading the data. For instance, the system frequently displays messages such as "Failed to write order to OBS," "Download failed," and "Failed to get the file under the directory."
Additionally, the ESSD journal also requires that,
"upon submission, authors must certify some form of fully anonymous review access, directly at the chosen repository (i.e., no registration, name, email, or other information is required of reviewers as they access the data);"
This means that data should be accessible without requiring registration or providing any information. However, your current data access method does not comply with this policy. In your system, it is necessary to register and log in to access the data. Otherwise, it prompts "No access permission. Please log in first." Sometimes, even after logging in, the data cannot be accessed or downloaded.
4. I am willing to offer assistance but would like to know the specific progress and timeline for the system improvements. I hope the authors can take a candid approach to make substantial improvements to the system. If there are any bottlenecks or difficulties, we can work together to resolve them. However, the author's responses so far have consistently denied the existence of these objective issues and have not resulted in any substantive improvements.
Citation: https://doi.org/10.5194/essd-2024-178-CC6 -
CC7: 'Reply on CC6', Yamal Vivian, 11 Jul 2024
I tried to access the data following provided instructions, and successfully downloaded the data.
Citation: https://doi.org/10.5194/essd-2024-178-CC7 -
CC12: 'Reply on CC7', Diana Jones, 14 Jul 2024
I'm referring to the validation dataset. I still haven't been able to successfully download it. Have you managed to download the validation dataset?
Citation: https://doi.org/10.5194/essd-2024-178-CC12
-
CC12: 'Reply on CC7', Diana Jones, 14 Jul 2024
-
CC8: 'Reply on CC6', Ethan Carter, 11 Jul 2024
I encountered a similar issue. I also failed to download the data, and the validation data download was unresponsive.
Citation: https://doi.org/10.5194/essd-2024-178-CC8 -
CC9: 'Reply on CC6', Congcong Li, 12 Jul 2024
I successfully downloaded the data using the test account and the instructions provided.
-
CC11: 'Reply on CC9', Diana Jones, 14 Jul 2024
I'm referring to the validation dataset. I still haven't been able to successfully download it. Have you managed to download the validation dataset?
Citation: https://doi.org/10.5194/essd-2024-178-CC11 -
AC6: 'Reply on CC11', Shuang Chen, 04 Aug 2024
Thank you for your feedback and for bringing this issue to our attention. The system may take longer time to respond due to the large file size of the experimental data. We apologize for any inconvenience this has caused.
We have also uploaded the experimental data to OneDrive, you can access it through this URL: https://1drv.ms/u/s!AvA96w8h5Q9sgtEBEk9J5D0nKa5tsA?e=TdNRk7.
If you continue to experience issues, please let us know, and we will assist you further.
Citation: https://doi.org/10.5194/essd-2024-178-AC6
-
AC6: 'Reply on CC11', Shuang Chen, 04 Aug 2024
-
CC11: 'Reply on CC9', Diana Jones, 14 Jul 2024
-
CC7: 'Reply on CC6', Yamal Vivian, 11 Jul 2024
-
RC1: 'Comment on essd-2024-178', X. Zhu, 12 Jul 2024
General comments
This paper presents the development of a global 30-m seamless data cube by fusing Landsat and MODIS data, building on the authors' previous methods. This work is highly valuable for future applications requiring fine-resolution time series data. Despite the numerous algorithms developed for fusing Landsat and MODIS data or filling gaps in Landsat data, no global datasets generated by these technologies are currently available. However, the paper has several issues that need to be addressed.
Specific comments
- Page 3, Line 70: For Landsat interpolation methods, there are techniques that do not require numerous clear-sky observations. For instance, the nearest similar pixel interpolator method only needs one clear-sky observation.
- Introduction: Before the last paragraph, the authors should discuss the current research gap. Specifically, they should explain why a full-chain processing framework and such fused data are necessary. Additionally, they should outline the challenges users face when using current methods to produce data independently.
- Page 11, Figure 3: When performing harmonization, was Landsat upscaled to the resolution of MODIS? Figure 3 suggests that both datasets have the same resolution. A linear transformation model was used; how do the authors address the issue when the linear model is not statistically significant?
- Major Differences Between uROBOT and ROBOT Models: What are the primary differences between the uROBOT model and the previously developed ROBOT model by the authors?
- Accuracy of the Time Series Model in Eq. 6: The accuracy of the time series model in Equation 6 could be affected by the time interval of the data. In cloudy regions, if the data is too sparse, is the result reliable? How do the authors address diverse changes when the data is sparse?
- Eq. 7: The coefficients from MODIS are used for Landsat. This approach may be acceptable if the land is homogeneous, but it lacks a clear mechanism for complex landscapes. Landsat pixels have different temporal dependencies compared to coarse pixels. If this issue affects the reliability of the final product, the end users should be notified.
- Eq. 9: The equation redistributes the residual to handle land cover changes, but the residual is at a coarse scale. How do the authors address changes at the fine pixel scale?
- Page 15, Line 355: The three accuracy metrics mentioned cannot adequately assess the spatial context preserved by the fused images. Some spatial metrics, such as edge features (see examples in https://doi.org/10.1016/j.rse.2022.113002), should be presented.
- Tables 4-6: Is the accuracy assessment conducted for each site? The tables only show the mean value. What is the range of these indices?
- Figure 14: The RMSD values in Figure 14 are on a different scale compared to Tables 4-6.
- Table 9: It would be beneficial to include a figure showing examples where only the Seamless Data Cube (SDC) can accurately classify the pixels, whereas other data cannot.
Citation: https://doi.org/10.5194/essd-2024-178-RC1 -
AC4: 'Reply on RC1', Shuang Chen, 01 Aug 2024
Dear X. Zhu,
Thank you very much for the time you spent on our manuscript and for the useful comments. In the attached file, we provide responses to your comments/suggestions and point to the made changes in the revised manuscript.
Regards,
The authors
-
CC10: 'Comment on essd-2024-178', James Anderson, 12 Jul 2024
Many thanks to the authors for providing such a dataset. However, I have some concerns regarding its quality.
1. The image quality in many areas seems to be poor, as shown in the example below. Severe cloud contamination and strong spatial stitching effects were observed.
2. In many regions, the reconstructed time-series images appear unable to reflect the real changes in land surface. The two images below, taken on day 1 and day 86 of the year 2000, illustrate this issue. Despite being nearly three months apart, they show almost identical cloud and cloud shadow distributions. Besides, the cloud distribution appears nearly the same for every day between these two timestamps.
Citation: https://doi.org/10.5194/essd-2024-178-CC10 -
AC7: 'Reply on CC10', Shuang Chen, 04 Aug 2024
Thank you for your valuable feedback.
The issue you highlighted is primarily due to undetected residual clouds in Landsat imagery. We have informed readers about the impacts of atmospheric correction and cloud detection algorithms in Section 5.3 of the manuscript. The specific MGRS tile 20MQB is located in Brazil, a region frequently affected by cloud cover, with input data from the year 2000. The atmospheric correction algorithm LEDAPS for Landsat TM/ETM+ does not perform as effectively as the LaSRC for Landsat OLI (Vermote et al., 2018). Moreover, there are increased cloud omission errors in Landsat TM/ETM+ observations due to the absence of a cirrus band (Zhu et al., 2015a). Overall, the data quality of the SDC dataset is significantly better in most other regions and years.
Vermote, E., Roger, J.C., Franch, B., Skakun, S., 2018. LaSRC (Land Surface Reflectance Code): Overview, application and validation using MODIS, VIIRS, LANDSAT and Sentinel 2 data’s, in: IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium. Presented at the IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, IEEE, Valencia, pp. 8173–8176. https://doi.org/10.1109/IGARSS.2018.8517622
Zhu, Z., Wang, S., Woodcock, C.E., 2015a. Improvement and expansion of the Fmask algorithm: cloud, cloud shadow, and snow detection for Landsats 4–7, 8, and Sentinel 2 images. Remote Sensing of Environment 159, 269–277. https://doi.org/10.1016/j.rse.2014.12.014
Citation: https://doi.org/10.5194/essd-2024-178-AC7
-
AC7: 'Reply on CC10', Shuang Chen, 04 Aug 2024
-
CC13: 'Comment on essd-2024-178', Emma Thompson, 18 Jul 2024
The web-based interface for the data archive is showing a black screen.
Citation: https://doi.org/10.5194/essd-2024-178-CC13 -
CC14: 'Comment on essd-2024-178', Isabella Miller, 20 Jul 2024
After carefully examining the validation data provided by the authors, I found significant transparency issues. The data is presented as NumPy arrays but lacks temporal or spatial coordinates. Additionally, there is insufficient explanation regarding the format and structure of the validation data, making it difficult to understand its content and usage. I am unsure if this approach complies with standard practices. As it stands, this method of data disclosure seems to hinder other researchers from effectively verifying the results. Therefore, I recommend temporarily retracting the paper until the transparency of the data is improved.
Citation: https://doi.org/10.5194/essd-2024-178-CC14 -
RC2: 'Comment on essd-2024-178', Anonymous Referee #2, 26 Jul 2024
This submission aims at building a daily 30-m Landsat data record (2000-2022) based on Landsat 5,7,8,9 and MODIS for gap filling. Authors made sure that data are pre-processed (with AC and BRDF) and harmonized. The main pillars of the data is that they are daily at 30 m and span the 20+ years. While this dataset will certainly find a lot of applications and might be valuable to the community, and I applaud authors for taking on this challenge, in my opinion, the description of this dataset is exaggerated (e.g., daily component) and authors do not provide evidences that this is actual daily data. You cannot use 8-(16-)-day data to prove that your dataset is actual daily because you should have available daily reference data. I think this is one of the areas that authors did not work through and claim that your dataset resemble actual daily data is false. Maybe, it would have made sense to focus on regular 8-day composites. Maybe, the problem of generating daily data from 8-/16-day Landsat does not have a solution. Certainly, MODIS can help in certain regions, but I have huge doubts about its applicability globally given discrepancies in spatial resolution. Furthermore, even MODIS, majority of users use 8-day or 16-day composites given that MODIS acquires twice per day (Terra/Aqua). It's done because of clouds and because decrease in spatial resolution of the viewing angle (which increases with the cycle 1-15, and nadir only every 16 days). There are also some very dubious choices (e.g., moving the grid) . From CC comments, one can see obvious artifacts - it's understandable, from global product you can always find errors. But, as mentioned above, it's not real daily product, you did not provide evidences to claim it. I suggest authors to substantially re-work this and give a good thought on what real problem you are trying to solve and provide tangible solution that can be validated.
Issues:
1. "It is noteworthy that our adopted grid slightly deviates from the Sentinel-2 grid. Since the original Landsat coordinate system exhibits a half-pixel (15 meters) offset relative to the Sentinel-2 grid, we expanded and shifted the original MGRS grid by 15 meters in each direction to align with the Landsat coordinate system."
First, there is no shift; it's what is used for referencing a pixel value: center in Landsat and UL in S2 (in grid) (also what is used by default in GDAL). It was a poor decision to shift the grid. Whereas, if you selected MGRS as a coordinate grid, you should have re-projected Landsat (like HLS) into MGRS. Such shift can cause artifacts when comparing to the HLS data.
2. "Therefore, our approach aims at building multiple transformation models for each MGRS tile and each spectral band separately."
Building models per MGRS tile might introduce issues re spatial consistency. Did you check the impact of such an approach on the overlapping areas (between tiles)? How are consistent those temporally to reflect land cover changes?
3. I'm very skeptical about the applicability of MODIS gap-filling for Landsat on global scale. First, almost all existing approaches, including yours, does account for changes in spatial resolution with different angles. In reality, 500-m pixel can actually decrease up to 2 km one - see https://doi.org/10.1109/TGRS.2016.2604214
So, your assumption " the basic assumption of uROBOT is that the MODIS image 𝐶 can be accurately approximated by a linear combination of other similar MODIS images in the input timeseries data" is only valid under the condition that spatial resolution is invariant. That's not the case, especially in cloud-prone regions. Another issue that will not work in areas of small ag fields, e.g., Arica, SE Asia, etc.
I have seen multiple times examples when a bare ground field between two vegetative fields will be brightened in MODIS when resolution decreases. And used MODIS-based vegetation signal for Landsat will introduce huge errors. Therefore, a study must be conducted to explore how spatial resolution impacts restoration.
4. Section 5.3 must be re-written as it does not show limitations but rather what have not been done. There should be paragraphs re snow, re coastal regions and water, as AC algorithms do not work the best there (especially for snow as retrieval of aerosols is extremely difficult there). Furthermore, probably this product is not applicable to detecting rapid changes (daily) in land cover such as constructions as it will depend on actual acquisitions (and not blended daily). It will probably will not allow to detect "daily" burned fields or harvested fields because the signal will change in 2-3 days, or landslides, or iceberg movement, or fire propagation - anything that truly changes every hour or day. Again, in reality your daily product will not allow (please, prove me wrong!) detection of these events (at daily basis) which require true daily data.Citation: https://doi.org/10.5194/essd-2024-178-RC2 -
AC5: 'Reply on RC2', Shuang Chen, 01 Aug 2024
Dear anonymous Reviewer#2,
Thank you very much for the time you spent on our manuscript and for the useful comments. In the attached file, we provide responses to your comments/suggestions and point to the made changes in the revised manuscript.
Regards,
The authors
-
AC5: 'Reply on RC2', Shuang Chen, 01 Aug 2024
Data sets
Global 30-m seamless data cube (2000-2022) of land surface reflectance generated from Landsat-5,7,8,9 and MODIS Terra constellations S. Chen et al. https://doi.org/10.12436/SDC30.26.20240506
Viewed
HTML | XML | Total | BibTeX | EndNote | |
---|---|---|---|---|---|
1,341 | 338 | 222 | 1,901 | 20 | 23 |
- HTML: 1,341
- PDF: 338
- XML: 222
- Total: 1,901
- BibTeX: 20
- EndNote: 23
Viewed (geographical distribution)
Country | # | Views | % |
---|
Total: | 0 |
HTML: | 0 |
PDF: | 0 |
XML: | 0 |
- 1