the Creative Commons Attribution 4.0 License.
the Creative Commons Attribution 4.0 License.
Digital soil mapping of lithium in Australia
Budiman Minasny
Alex McBratney
Patrice de Caritat
John Wilford
Download
- Final revised paper (published on 14 Jun 2023)
- Supplement to the final revised paper
- Preprint (discussion started on 13 Jan 2023)
Interactive discussion
Status: closed
-
RC1: 'Comment on essd-2022-418', Anonymous Referee #1, 08 Feb 2023
General comments
This manuscript proposed to create a digital soil map of Australia based on a series of environmental covariates acquired at distinct spatial resolutions. Modeling was achieved with a machine learning algorithm trained on a large available soil geochemistry dataset and evaluated in another independent soil dataset. While the premise of the work seems very interesting and the results provided can be a major contribution to mineral exploration in Australia, I have some concerns regarding the representability of the input data after resampling as well of the external validation dataset. The correlation between input and predicted variables is also not very convincing. Some key literature works could help to give another dimension to the work, especially in the discussion. I would also like to see some general comments on the possibility of conducting similar approaches in other regions of the globe. I will try to explain each of these issues in more detail, hoping to help the authors to improve their work.
Specific comments
Main comments
- It should be clear from the beginning of the abstract which other types of input data (besides geochemical data) were used for modeling.
- One of my main concerns is related to model validation since the authors claim the success of the method proposed based on the external validation dataset. Is NAGS representative of all of Australia? In my view, soil characteristics and the Li content of the soils will vary throughout the country, and model performance will vary accordingly. Moreover, the NGSA and NAGS correspond to different sampling campaigns, with different collection dates and sampling densities. Despite the leveling employed by the authors, how can you confirm that the NAGS is suitable for the validation of the model trained in another dataset? Are the results obtained with the NAGS comparable to the out-of-the-bag validation using the NGSA data?
- Resampling of the data: the authors have resampled data acquired at 30 m or 90 m spatial resolution to a final resolution of 3 km. How can you ensure important information is not being lost with the resampling?
- Remote sensing has limited penetration depth (0-10cm). How can you correlate the remote sensing variables with the BOS dataset?
Rao, K.S., Chandra, G. & Narasimha Rao, P.V. Study on penetration depth and its dependence on frequency, soil moisture, texture and temperature in the context of microwave remote sensing. J Indian Soc Remote Sens 16, 7–19 (1988). https://doi.org/10.1007/BF03014300
- Figure 3: all variables show correlations below 0.3, which is considered by many authors as a negligible correlation. Taking this into account, how can you create a reliable model and consequently digital soil map?
- Which is the advantage of this method compared with traditional interpolation approaches (IDW, kriging)? I am aware that a comparison with other methods is beyond the scope of this manuscript, but the authors could comment on the advantages/disadvantages compared to previous works (if available), considering the NGSA dataset.
- Similarly, which is the advantage of the produced maps (Figure 6) when compared with the maps of Figure 1? Are any regions highlighted by the proposed method that were not highlighted in Figure 1? This information would be important for the readers to assess with the method you proposed is of interest.
Minor comments
Introduction
- Lines 56-60: the authors explain that Li extraction from brines is in the form of Li-chloride. However, it would also be important to clarify that Li-carbonate is not directly extracted from Li-minerals, but the Li metal instead.
- Lines 61-76: a brief description of the behavior of Li in soils is presented and previous works on soil geochemistry are presented. Other works related to this topic are also worth mentioning:
Luecke, W. (1984). Soil Geochemistry above a Lithium Pegmatite Dyke at Aclare, Southeast Ireland. Irish Journal of Earth Sciences, 6(2), 205–211. http://www.jstor.org/stable/30002472
Steiner, B. (2018). Using Tellus stream sediment geochemistry to fingerprint regional geology and mineralisation systems in Southeast Ireland. Irish Journal of Earth Sciences, 36, 45-61. doi: 10.3318/ijes.2018.36.45.
- Lines 82-89: a summary of mineral prospectivity mapping is made, but other recent prospectivity mapping studies are missing:
Parsa, M. (2021). A data augmentation approach to XGboost-based mineral potential mapping: An example of carbonate-hosted ZnPb mineral systems of Western Iran. Journal of geochemical exploration, 228, 106811. doi: https://doi.org/10.1016/j.gexplo.2021.106811.
von der Heyden, B. P., Todd, C., Mayne, M. J., & Doggart, S. (2023). Zipf rank analysis highlights the exploration potential for Lithium-Caesium-Tantalum -type pegmatites in the Northern Cape, South Africa. Journal of African Earth Sciences, 197, 104769. doi: https://doi.org/10.1016/j.jafrearsci.2022.104769.
- Lines 92-96: a short literature review on the use of remote sensing for Li pegmatite identification is presented. Some of these works could be replaced by more recent studies:
Cardoso-Fernandes, J., Teodoro, A. C., Lima, A., & Roda-Robles, E. (2020). Semi-Automatization of Support Vector Machines to Map Lithium (Li) Bearing Pegmatites. Remote Sensing, 12(14), 2319. doi: 10.3390/rs12142319.
Morsli, Y., Zerhouni, Y., Maimouni, S., Alikouss, S., Kadir, H., & Baroudi, Z. (2021). Pegmatite mapping using spectroradiometry and ASTER data (Zenaga, Central Anti-Atlas, Morocco). Journal of African Earth Sciences, 177, 104153. doi: https://doi.org/10.1016/j.jafrearsci.2021.104153.
Booysen, R., Lorenz, S., Thiele, S. T., Fuchsloch, W. C., Marais, T., Nex, P. A. M., & Gloaguen, R. (2022). Accurate hyperspectral imaging of mineralised outcrops: An example from lithium-bearing pegmatites at Uis, Namibia. Remote Sensing of Environment, 269, 112790. doi: https://doi.org/10.1016/j.rse.2021.112790.
Materials and methods
- Lines 143-145: “levelling method were utilized using the standards Certified Reference Materials (Main and Champion, 2022). In short, a correction factor based on the CRM measurements from the two datasets is calculated and applied as multiplier to relevel the data”. Can we see a figure showing this leveling process?
- Table 1: Spatial resolution of Landsat data is 30 m and not 25 m.
Results and discussion
- Line 239: “concentrations ranging from 0.1 – 67.4 and 0.1 – 56 mg kg-1, for TOS and BOS respectively”. These values do not seem to match Figure 2. Please revise.
- Line 242: “mean Li concentration”. Which is the mean for the TOS and BOS datasets? Right now, you are comparing the median and the mean.
- Lines 261-262: “Despite other studies (Robinson et al., 2018; Kashin, 2019) reporting strong correlations between Li and Mg, and other elements elsewhere”. Extensive work on Li correlations with other elements for both stream sediment samples and a large pegmatite dataset for the Iberian Peninsula. Please consider comparing your results with other works.
Cardoso-Fernandes, J., Lima, J., Lima, A., Roda-Robles, E., Köhler, M., Schaefer, S., Barth, A., Knobloch, A., Gonçalves, M. A., Gonçalves, F., & Teodoro, A. C. (2022). Stream sediment analysis for Lithium (Li) exploration in the Douro region (Portugal): A comparative study of the spatial interpolation and catchment basin approaches. Journal of geochemical exploration, 236, 106978. doi: https://doi.org/10.1016/j.gexplo.2022.106978.
- Section 3.1.1: Please explain which threshold was used to consider a strong/moderate correlation.
- Lines 267-271: Please consider key works on the alteration of Li minerals to clays:
London, D., & Burt, D. M. (1982). Chemical models for lithium aluminosilicate stabilities in pegmatites and granites. American Mineralogist, 67(5-6), 494-509.
Quensel, P. (1937). Minerals of the Varuträsk Pegmatite. Geologiska Föreningen i Stockholm Förhandlingar, 59(2), 150-156. doi: 10.1080/11035893709444939.
Quensel, P. (1938). Minerals of the Varuträsk Pegmatite. Geologiska Föreningen i Stockholm Förhandlingar, 60(2), 201-215. doi: 10.1080/11035893809444995.
- Line 283: “Landsat bands 3, 5 and 6 had stronger negative correlations (r = -0.14 to -0.16)”. Please notice that -0.15 and -0.17 represent stronger negative correlations than -0.14 and -0.16. Moreover, the graph scale is not the same for TOS and BOS in Figure 3. That is why the bars seem bigger for the BOS data when the values are smaller in module.
- Lines 306-307: there are some studies on the spectral behavior of Li minerals and in some cases cross-analysis with the Li content:
Cardoso-Fernandes, J., Silva, J., Perrotta, M. M., Lima, A., Teodoro, A. C., Ribeiro, M. A., Dias, F., Barrès, O., Cauzid, J., & Roda-Robles, E. (2021). Interpretation of the Reflectance Spectra of Lithium (Li) Minerals and Pegmatites: A Case Study for Mineralogical and Lithological Identification in the Fregeneda–Almendra Area. Remote Sensing, 13(18), 3688. doi: 10.3390/rs13183688.
- Lines 334-335: “the model separates out prediction values based on its spectral response of vegetation”. I do not understand. Didn't you use the bare soil dataset where the vegetation influence was removed? Please comment.
- Line 343: “Landsat bands 2 and 6, and temperature range also affect model conditions”. Again, remote sensing data has low penetration depth, therefore the correlation with the BOS dataset should be low. How do you explain these results?
- Lines 356-357: “the model developed here to have a higher concentration of soil Li, especially for the BOS model”. However, this is the model with a higher standard deviation. Please comment.
- Figure 7 is just a zoom of Figure 6, not bringing new information. I would prefer a comparison between the predicted contents in the validation area with both NGSA and NAGS measured values.
Technical corrections
- Figure 1: Can you improve the quality of Figure 1? Is it possible to display the Li concentration with a ramp color to aid visualization? Can you separate the two maps into subfigures A (TOS) and B (BOS)?
- Figure 2: in the histogram of the left we don't see values > 40 mg/kg. Also, can we see the histogram for the NAGS dataset?
- Figure 3: the graph bars go further than the X-axis. Please correct this issue.
- Line 301: “Higher accuracy was observed in TOS”. Higher, but still low. Please consider revising the sentence.
- Table 2: the metric values presented for the external validation do not match the values mentioned in the text. Please revise.
- Figure 8: there are no units on the Y-axis.
- Please revise the use of acronyms throughout the manuscript.
Please see the attached pdf file (edited version of the original file) with some minor corrections/suggestions and yellow highlights that need to be addressed carefully.
- AC1: 'Reply on RC1', Wartini Ng, 05 May 2023
-
RC2: 'Comment on essd-2022-418', Anonymous Referee #2, 04 Apr 2023
This manuscript presents results from a study designed to develop predictive maps of the Li concentration in soil or Australia. The calibration data set used was the Li data generated during the National Geochemical Survey of Australia. In addition, several environmental covariates were used such as annual precipitation, annual evaporation, airborne radiometric data, etc. The predictive Li maps were then compared to a validation data set from the northern Australia Geochemical Survey. The paper is relatively well-written and well organized. Unfortunately, the results of the study were disappointing in that the correlation between observed Li in the validation data ser and the predicted Li values from the model was relatively poor. The authors have recognized several limitations from their study, but have neglected to discuss what I think is an important issue—the nature of the Li data from the NGSA.
The authors should emphasize that they are using aqua-regia extractable Li data from the NGSA in this study. Aqua-regia digestion only extracts a portion of the total Li found in soil. I am not sure what the fraction of the total that may be, but it depends on the extraction parameters (e.g., temperature of extraction, length of time the soil material is left in contact with the aqua regia) as well as soil mineralogy. Lithium in clay minerals may be extracted, but I am not sure that Li in spodumene will be released by aqua regia extraction. The national-scale soil geochemical survey of the conterminous United States (Smith et al., 2019, complete reference given below) used a 4-acid extraction that is a much more vigorous extraction than aqua regia and should give a good estimate of the total Li content in soil For this US study, three samples were collected at each site (4,857 sites): soil from a depth of 0-5 cm. a composite of the soil A horion, and a sample of the top 20 cm of the soil C horizon. The results shows a range of <1-300 mg/kg (median 20 mg/kg) for the 0-5 cm sample; a range of <1-315 mg/kg (median 20 mg/kg) for the soil A horizon; and a range of <1-280 mg/kg (median 24 mg/kg) for the soil C horizon. These concentrations are considerably higher than the aqua regia extraction data for Australia. So one might ask if the results of the current study would be different if total Li data were used instead of the aqua-regia-extractable data? Another question is whether a weaker extraction that only released “plant available” Li might be more likely to give better results.
Despite the somewhat disappointing results of this study, I feel the paper should be published to demonstrate a step forward in the development of machine learning in generating predictive geochemical maps. The authors should also perhaps do a better job of recognizing the importance of the National Geochemical Survey of Australia whose data gave them the opportunity to conduct the current study. Perhaps a recommendation might be to recognize the need to conduct higher density national- and international-scale geochemical surveys and to add additional parameters to these studies (e.g., quantitative mineralogy) that would aid in future studies such as these authors have conducted.
Reference for Smith et al. (2019):
Smith, D.B., Solano, F., Woodruff, L.G., Cannon, W.F. and Ellefsen, K.J. (2019). Geochemical and Mineralogical Maps, with Interpretation, for Soils of the Conterminous United States. United States Geological Survey Scientific Investigations Report, 2017-5118, https://pubs.usgs.gov/sir/2017/5118/index.html.
I have made a few specific suggestions and editorial comments below:
1. Lines 16-17: The authors begin this sentence by stating that “soil samples were collected.” Then later in the sentence, they refer to “catchment outlet sediments.” Are they saying that all the catchment outlet sediments sampled during the National Geochemical Survey of Australia can also be considered to be “soils?” This is explained later in the text (lines 121-124); however, it probably deserves a sentence in the abstract to clarify this issue. An alternative would be to avoid the use of “catchment outlet sediments” in the abstract.
2. Lines 24-25. Note this sentence: “The map shows high Li concentration around existing mines and other potentially anomalous Li areas.” It seems a bit strange to this reviewer to say there are high Li concentrations around potentially anomalous Li areas. If there are high Li values, then the area is, by definition, anomalous. Perhaps the sentence is constructed in this manner because the map to which the authors refer is predictive and it would require collecting physical samples from the “potentially anomalous” areas to confirm if they were actually anomalous.
3. Line 43: Change “was the second” to “is second”; change “the first” to “first”
4. Line 44: Change “economic resource” to “economic resources.” Change “According to recent survey” to “According to a recent survey”
5. Line 47: Change “Li is hosted mainly spodumene” to “Li is hosted mainly in spodumene” Change “while” to “whereas”
6. Line 142: Note this sentence: “Furthermore, these samples were collected at different times and /or laboratories.” Do you mean that multiple laboratories were used to analyze the samples? It is unclear what exactly is meant here.
7. Lines 142-145. In my opinion, there should be a more specific discussion about the leveling of these two data sets. What Certified Reference Material was used? Was it Till-1 as mentioned previously for the NGSA data? How about a simple plot of NGSA Li concentration versus NAGS Li concentrations to give the reader a better idea of data comparability. I just do not think there is sufficient information given here.
8. Lines 148-149. Change “, that contributes” to “that contribute”
9. Line 178. Change “map” to “mapping” or just omit the word “map”
10. Line 179-180. I think there is a word missing between “measurements” and “soil” in line 179. Perhaps it should be “measurements on soil?”
11. Line 238. I would suggest using “aqua-regia-extractable Li concentrations” instead of just saying “Li concentrations” here and anywhere else in the text. This lets the reader know that you are using data derived from a partial extraction (aqua regia) and not the total Li content of the soil. Another option would be to have a sentence early in the text to state that for the remainder of the paper, any reference to Li concentrations is understood to mean aqua-regia-extractable Li unless otherwise noted.
12. Line 242. Here, again, I would suggest using “aqua-regia extractable Li concentrations” when referring to the Negrel et al. (2019) publication. The European study also used an aqua regia extraction in the determination of Li concentrations, so their data should be comparable to the Australian data.
13. Line 243. I do not know what data Schrauzer (2002) used for obtaining the estimated range of 7-200 mg/kg for a world background Li concentration. If you have that information, I would suggest including it in the manuscript. However, Hu and Gao (2008, complete reference given below) estimated the average concentration of Li in the upper continental crust is 41 mg/kg. This is higher than even the US study where the median Li content was about 20 mg/kg using a total extraction method. I think a brief discussion about reported Li concentration being totally dependent on the extraction used would be useful in the text.
Reference for Hu and Gao (2008):
Hu, Z., and Gao, S., 2008. Upper crustal abundances of trace elements—A revision and update. Chemical Geology 253 (3-4), 205–221.
14. Lines 250-254. The concentration ranges of the various sized circles in Figure 1 make it difficult for the reader to see where the higher concentrations of Li occur. I suggest that you show a range and median for the samples in each of the areas discussed in the text (i.e., Cape York, Goldfields-Esperance, etc.).
15. Lines 55, 265, and 380. The authors cite “Foregs (2006)”. However, there is no Foregs (2006) in the References. I think the authors have this listed as “Geochemical Atlas of Europe” in the References. The correct citation in the text should be “De Vos, Tarvainen, et al. (2006).” Then the complete reference should be as follows:
De Vos, W., Tarvainen, T., Salminen, R., Reeder, S., De Vivo, B., Demetriades, A., Pirc, S., Batista, M.J., Marsina, K., Ottesen, R.T., O’Connor, P.J., Bidovec, M., Lima, A., Siewers, U., Smith, B., Taylor, H., Shaw, R., Salpeteur, I., Gregorauskiene, V., Halamic, J., Slaninka, I., Lax, K., Gravesen, P., Birke, M., Breward, N., Ander, E.L., Jordan, G., Duris, M., Klein, P., Locutura, J., Bel-lan, A., Pasieczna, A., Lis, J., Mazreku, A., Gilucis, A., Heitzmann, P., Klaver, G. and Petersell, V. (2006). Geochemical Atlas of Europe. Part 2 – Interpretation of geochemical maps, Additional Tables, Figures, Maps and related publications. Geological Survey of Finland, Espoo, Finland, p. 225-228. http://weppi.gtk.fi/publ/foregsatlas/text/Li.pdf
16. Line 315. Here again the authors refer to “releveling” the NAGS data sets. As stated in comment #7, it would be useful to discuss this leveling process in a bit more detail.
17. Line 423. Delete “with anomalous Li concentration” at the end of this sentence.
Citation: https://doi.org/10.5194/essd-2022-418-RC2 - AC2: 'Reply on RC2', Wartini Ng, 05 May 2023
-
RC3: 'Comment on essd-2022-418', Anonymous Referee #3, 27 Apr 2023
This study aims to predict and map lithium (Li) concentration in soil across Australia using a digital soil mapping framework and environmental covariates. The model was developed using a Cubist regression tree algorithm and validated on an independent Northern Australia Geochemical Survey dataset, showing good prediction for the top depth. The importance of variables indicates that Landsat 30+ Barest Earth bands and gamma radiometric dose have a strong impact on Li prediction.
My overall impression is that despite the not relay convincing prediction power and out-of-sample verification that needs to be extended authors rigorously planned their work and did their best. For example, the set of statistics chosen to evaluate prediction performance was chosen wisely, the methodology seems appropriate (although I have questions about that), but the MS is quite worthy of being published in ESSD after the questions of all reviewers are answered. Here I underline that in my review I primarily evaluated the work from a methodological aspect.
Major comments
- I am curious why the Cubist model was chosen. There is no literature review on which other approach could have been used in this particular exercise. Did you check other tree-based machine learning algorithms like Random forest for example. Since Random Forest can capture complex non-linear relationships between input variables and output variables by creating multiple decision trees and combining them, whereas Cubist uses linear models to estimate the output values for each leaf node of the decision trees, which may not be able to capture complex non-linear relationships.
I work in isotope hydrology and before conducting the prediction of isoscapes we conducted through research on which approach would be most suitable for the task keeping in mind the number of predictors, the drivers of the parameter etc. Such a comparison would be necessary to be:
- cited
- conducted and places in supplement
- or published in another study.
See for example: https://doi.org/10.1016/j.jhydrol.2023.129129. In addition, a flowchart should also be added to the MS, possibly in supplement to help reproduce the steps taken.
- How was the preprocessing conducted, outlyers, extreme values? See for example the ultimate two paragraphs of Sect. 2.1 in https://doi.org/10.1016/j.jhydrol.2023.129129 . Did you check the outliers in the input data, I'm not sure how the Cubist model can handle them, as it uses linear models to estimate output values for each leaf node of the decision trees. Outliers can have a large impact on the estimated output values of the linear models, which can lead to inaccurate predictions.
- A more detailed description of the used metrics is requires, since all of these account for different kind of errors. E.g. the Lin’s CCC measures both the correlationand the bias between the measured and predicted values and it provides a measure of the strength of the linear relationship between the two value sets, while accounting for the magnitude of the differences.
In addition, references should be inlcuded, e.g. Lin, 1989 https://www.jstor.org/stable/2532051
- The argument in L412-413 is acceptable, but isn’t there a possibility to validate the results with data from other regions or conduct a pilot study from elsewhere? In a study in a journal as ESSD (upper 1 percentile in SJR) it would be expected to provide an even broader validation scheme, or give an extensive explanation on why this is not possible.
Minor comments
- It might be more appropriate to categorize the predictors according to which ones were considered static (do not change over time) and which ones were considered dynamic (can change over time).
- It was not discussed earlier, is a linear relationship (measured by Pearson r) required, or is there a nonlinear relationship expected between the predictors and Li content. Please elaborate on this.
- L262: What were these correlation values for Al, B, Fe…, a table should be included e.g. in the supplement.
- The significance values should be reported and all the statistics in APA style. https://www.socscistatistics.com/tutorials/correlation/default.aspx
- L407: This limitation is very important and must be mentioned in the abstract, in addition, it could even be incorporated into the title.
Miscellaneous
- Add spaces before and after mathematical operators.
- L20 and all other places use superscript for measurement units kg-1
- Variables should be in italics.
- 8. A more detailed description is needed on the boxplots explaining what is in the figure, see e.g. caption of Fig. 3 in https://doi.org/10.1016/j.jhydrol.2022.128925
Citation: https://doi.org/10.5194/essd-2022-418-RC3 - AC3: 'Reply on RC3', Wartini Ng, 05 May 2023