Unlike some other well-known challenges such as facial recognition, where
machine learning and inversion algorithms are widely developed, the
geosciences suffer from a lack of large, labelled data sets that can be used
to validate or train robust machine learning and inversion schemes. Publicly
available 3D geological models are far too restricted in both number and the
range of geological scenarios to serve these purposes. With reference to
inverting geophysical data this problem is further exacerbated as in most
cases real geophysical observations result from unknown 3D geology, and
synthetic test data sets are often not particularly geological or
geologically diverse. To overcome these limitations, we have used the Noddy
modelling platform to generate 1 million models, which represent the first
publicly accessible massive training set for 3D geology and resulting
gravity and magnetic data sets (

Although they have become the focus of intense research activity in recent
times, with more papers published in the 5 years prior to 2018 than all
years before that combined, machine learning (ML) techniques applied to
geoscience problems date back to the middle of the last century (see Van
der Baan and Jutten, 2000, and Dramsch, 2020, for reviews). ML applications
relate to a whole range of geological and geophysical problems, but many of
these studies face common challenges due to the nature of geoscientific data
sets. Karpatne et al. (2017) summarise the principal challenges as follows:

We address issues related to

We address

We address

Finally, we address the spatial and temporal

Implicit geological modelling is based on the calculation of scalar fields that can be iso-surfaced to retrieve stratigraphy and structure, as opposed to earlier methods that were CAD-like or based on the interpolation of data points. Recent advances in implicit modelling allow extensive geology model suites to be generated by perturbing the data inputs to the model (Caumon, 2010; Cherpeau et al., 2010; Jessell et al., 2010; Wellmann et al., 2010; Wellmann and Regenauer-Lieb, 2012; Lindsay et al., 2012, 2013a, b, 2014; Wellmann et al., 2014, 2017; Pakyuz-Charrier et al., 2018a, b, 2019) as part of studies that characterised 3D model uncertainty; however since they use a single model as the starting point for the stochastic simulations, these works do not provide a broad exploration of the range of geological geometries and relationships found in nature. Work on the automating of modelling workflows may allow us to explore the model uncertainty space more efficiently (Jessell et al., 2021).

In this study, we have created a massive open-access resource consisting of 1 million three-dimensional geological models using the Noddy modelling package (Jessell, 1981; Jessell and Valenta, 1996). These are provided as the input file that define the kinematics, together with the resulting voxel model and gravity and magnetic forward-modelled response. The models are classified by the sequence of their deformation histories, thus addressing a temporal paucity of ground truth. This resource is provided to anyone who would like to train a ML algorithm to understand 3D geology and the resulting potential-field response or to anyone wishing to test the robustness of their geophysical inversion techniques. Guo et al. (2022) used the same modelling engine to produce more than 3 million models of a more restricted range of parameters to train a ML convolutional neural network system to estimate 3D geometries from magnetic images. In this study we aim to provide a much broader range of possible geological scenarios as the starting point for a more general exploration of the geological model space.

The Noddy software has been used in the past for a range of studies due to its ease in producing “reasonable-looking” geological models with a low design or computational cost. A precursor to this study used 100 or so manually specified models as a way of training geologists in the interpretation of regional geophysical data sets by providing a range of 3D geological models and their geophysical responses (Jessell, 2002). Similarly, Clark et al. (2004) developed a suite of ore deposit models and their potential-field responses. The automation of model generation using Noddy was first explored using a genetic algorithm approach to modifying parameters as a way of inverting for potential-field geophysical data, specifically gravity and magnetics (Farrell et al., 1996). Wellmann et al. (2016) developed a modern Python interface to Noddy to allow stochastic variations of the input parameters to be analysed in a Bayesian framework. Finally Thiele et al. (2016a, b) used this ability to investigate the sensitivity of variations in spatial and temporal relationships as a function of variations in input parameters.

In this study we draw upon the ease of generating stochastic model suites to build a publicly accessible database of 1 million 3D geological models and their gravity and magnetic responses.

The Noddy package (Jessell, 1981; Jessell and Valenta, 1996) provides a
simple framework for building generic 3D geological models and calculating
the resulting gravity and magnetic responses for a given set of
petrophysical properties. The 3D model is defined by superimposing
user-defined kinematic events that represent idealised geological events,
namely base stratigraphy (STRAT), folds (FOLD), faults (FAULT),
unconformities (UNC), dykes (DYKE), plugs (PLUG), shear zones (SHEAR-ZONE)
and tilts (TILT), which can be superimposed in any order, except for STRAT,
which can only occur once and has to be the first event. The 3D geological
models are calculated by taking the current ^{®} Xeon^{®} Gold 6254 CPU at 3.10 GHz, and produces
“geologically plausible” models that may occur in nature. Given that the
final 3D model depends on the user's choice of geological history, Noddy
can be thought of as a kinematic, semantic, implicit modelling scheme.

As opposed to Wellmann et al. (2016), Thiele et al. (2016) and Guo et al. (2022), who used a Python wrapper to generate stochastic model suites, in this study we have modified the C code itself to simplify use by third parties, although the philosophy of model generation is an extension of, as well as very similar to, these earlier studies. The most significant difference is that we have added petrophysical variations by randomly selecting from a set of stratigraphic groups; see the next section.

Figure 1 shows one example model set for a STRAT–TILT–DYKE–UNC–FOLD history, consisting of a 3D visualisation looking from the NE of the voxel model, with some units rendered transparent for clarity; the top surface of the model; an E–W section at the northern face of the model looking from the south; a N–S section on the western face of the model looking from the east; and the resulting gravity and magnetic fields.

Example model set for a STRAT–TILT–DYKE–UNC–FOLD sequence showing

In this section we describe the choices and range of values for the
parameters that we have allowed to vary for our 1-million-model suite. We
recognise there are other unused modes of deformation that Noddy allows that
have been ignored. The selection of these parameters is based on assessing
the range of parameter values that will produce suites of models that we
believe will help in and not hinder addressing the challenges cited in the
Introduction. For example, we limited the size of the plugs so
that a single plug could not replace the geology of the entire volume of
interest. In the discussion, we refer to additional event parameters that
could be activated in future studies. We limited the study to five
deformation events, starting with an initial horizontal stratigraphy which
is always followed by tilting of the geology. The following three events are
drawn randomly and independently from the event list comprised of folds,
faults, unconformities, dykes, plugs, shear zones and tilts. The likelihoods
of folds, faults and shear zones are double those of the other events as we found,
based on a qualitative assessment, that they had a bigger impact of changing
the overall 3D geology, and hence we wished to sample more of these events.
This means we can have

The initial stratigraphy, as well as new, above-unconformity stratigraphies, is defined to randomly have between two and five units to keep the systems relatively simple, but this could of course be increased if desired. The lithology of each unit in a stratigraphy is chosen to be coherent with the specific event and other units in the same sequence so that we do not, for example, mix high-grade metamorphic lithologies and un-metamorphosed mudstones in the same stratigraphic series (Table 2), nor do we assign the petrophysical properties of a sandstone to an intrusive plug. Once a lithology is chosen, the density and magnetic susceptibility are randomly sampled from a table defining the Gaussian distribution of properties (linear for density, log-linear for magnetic susceptibility) for that rock type. In the case of densities this may result in occasional negative values; however since the gravity field is only sensitive to density contrasts, this does not invalidate the calculation. Some rock types have bimodal petrophysical properties to reflect real-world empirical observations, so we draw from a Gaussian mixture in these cases. The petrophysical data are drawn from aggregated statistics (mean and standard deviation of one or two peaks) of the approximately 13 500-sample British Columbia petrophysical database (Geoscience BC, 2008).

The parameters which can be varied for each type of event, together with the range of these parameters, are shown in Table 1. These parameters can be grouped by the shape, position, scale and orientation of the events, and for a five-stage deformation history, the random selection is required of a minimum of 23 parameters for a STRAT–TILT–TILT–TILT–TILT model and up to 69 parameters for a STRAT–TILT–UNC–UNC–UNC model where each stratigraphy has five units. Apart from the petrophysical parameters, all other parameters are randomly sampled from a uniform distribution.

Free parameters with their allowable ranges for each event.

Simplified petrophysical values derived from British Columbia
database (Geoscience BC, 2008). Values are randomly sampled from Gaussian
distributions defined by the mean and standard deviation of density and log
magnetic susceptibility. For lithologies with bimodal magnetic
susceptibilities (flag of 1), mixed sampling is based on offsetting the means
by

Any subset of the geology can be calculated for any sub-volume of an
infinite Cartesian space using Noddy, but we limit ourselves to a

Geophysical forward models were calculated using a Fourier domain formulation using reflective padding to minimise (but not remove) boundary effects. The forward gravity and magnetic field calculations assume a flat top surface with a 100 m sensor elevation above this surface and the Earth's magnetic field with vertical inclination, zero declination and an intensity of 50 000 nT.

The 7

Example models for 100 randomly selected models drawn from the 1-million-model suite showing

The same logic of using millions of Noddy models was first applied by
generating a massive 3D model training set and used to invert real-world
magnetic data (Guo et al., 2022). That study used a model suite consisting of
only FOLD, FAULT and TILT events and only one of each to predict 3D geology
using a convolutional neural network (CNN). This approach corresponds to a use
case where prior geological knowledge of the local geological history has
been used to limit the model search space, and formal expert elicitation
could provide an important precursor step to support the generation of
sensible and tractable problems. In addition to the CNN training
demonstrated by Guo et al. (2022), we can envisage three broad categories of
studies that could build upon the 3D model database we present here.

A study of geophysical image variability using a simple 2D correlation or maximal information coefficient between pairs of images of different histories would be illuminating. Do we have images which are the same as each other (or at least very similar and within the noise tolerance of the geophysical fields) but belong to very different histories? If these exist, the ambiguity of the histories can be examined, and we then know where we would expect poor performance from ML techniques which rely on easily discriminated images. The systems of equations characterising geophysical inverse problems often have a non-unique solution. In ML research, if we only use magnetic data or gravity data for inversion, we will be troubled by the non-uniqueness of the solution. However, because we have both gravity data and magnetic data, we can extract features from multi-source heterogeneous data at the same time and then classify or regress them after feature fusion. This could greatly reduce the influence of the non-unique solution. Having a large set of models will allow clustering of models according to their geophysical response and identifying subsets of geological models that are geophysically equivalent and cannot be distinguished using geophysical data. The analysis of diversity of such subsets of models will give an estimate of the severity of non-uniqueness and allow the derivation of posterior statistical indicators conditioned by geological plausibility.

In this study we have produced a ML training data set that attempts to address six recognised limitations of applying ML to geoscientific data sets, namely spatio-temporal structure; high dimensionality; small sample size; paucity of ground truth; multi-resolution data; and noise, incompleteness and uncertainty in data. Contrary to usual practice, the work for the generation of a comprehensive suite of geological models did not depend on the manual labelling of data. We relied solely on geoscientific theory and principles while remaining computationally efficient. While realistic-looking suites of geological models have been generated using generative adversarial networks (Zhang et al., 2019), these generally represent a limited range of geological scenarios and lack extensive training samples.

Noddy is by design a spatio-temporal modelling engine that uses a geological history to generate a model. Simple variations in the ordering of three events following two fixed events (STRAT and TILT), even with fixed parameters, quickly demonstrate the importance of relative time ordering to final model geometry (Fig. 3). While Noddy is limited to simple sequential events, nature presents geological processes to be coeval (such as syn-depositional faulting) or partially overlapping, resulting in complex spatio-temporal relationships (Thiele et al., 2016a). Nonetheless, re-ordering only sequential events still produces a vast array of plausible geometries and indicates the enormity of the model space and the necessity of efficient methods to explore them.

Four possible 3D geological models with the same base stratigraphy (STRAT) followed by five events using four of the possible different event ordering sequences.

We have limited ourselves to five deformation events in this study and no more than five units in any one stratigraphy. These decisions were based on an idea to “keep it simple” whilst simultaneously allowing a great variety of models to be built. We recognise that these are somewhat arbitrary choices. We could have true randomly complex 3D histories, leading to models with, for example, nine phases of folding; however the utility of over-complicating the system is not clear and would rarely or ever be discernible in natural systems. Similarly, we limited the parameter ranges of each deformation event, again on the basis that the ranges chosen made models that are more interesting. For example, there did not seem much interest in having folds with very large wavelengths or very low amplitudes, as they are equivalent to small translations of the geology and would translate in the geophysical measurements into a regional trend that is often approximated and removed from the measurements.

Noddy is capable of predicting continuous variations in petrophysical properties, including variably deformed magnetic remanence vectors and the anisotropy of susceptibility, or densities that vary away from structures to simulate alteration patterns; however we decided to limit this study to simple litho-controlled petrophysics whilst recognising the interest of studying more complex discrete–continuous systems. The indexed models could also be reused with different, simpler petrophysical variations, such as keeping constant values for each rock type. Each model comes with the history file used to generate the model, and this provides the full label for that model so that if additional information, such as the number of units in a series, is considered to be important, this can be easily extracted from the file.

The total number of models sounds impressive; however once we divide that number by the 343 different event sequences, we are left with between 905 and 8245 models per sequence, which, whilst still large, is by no means exhaustive. There is no fundamental problem with building 10 or 100 million models, and if this is found to be necessary to provide useful ML training data sets, we can certainly do so at the expense of an increased computation time: these models were built in around a week on a computer using 20 processor cores. We can also try to apply a metric, such as model topology, to analyse how well sampled the model space is. Thiele et al. (2016b) analysed the topology of stochastically generated Noddy models and found that after 100 models for small perturbations around a starting model, the number of new topologies dropped off rapidly. In our case we are not making small perturbations, so we could expect to require more models before the rate of production of new topologies decays, and topology is only one possible metric for comparing models.

The primary goal of this study was to build a large data set to provide a wide range of possible models for use in training ML systems and to test more traditional geophysical inversion systems. The models here, whilst simpler than the large test models mentioned earlier, represent to our knowledge the largest suite of 3D geological models with resulting potential-field data and tectonic history, which have their own utility. This usage applies equally well to classical geophysical inversion codes, which have traditionally been tested on only a handful of synthetic models prior to being applied to real-world data, for which there is no ground truth available.

To use this suite of models as the starting point for inversion of real-world data sets (as has been pioneered by Guo et al., 2022), we can envisage the introduction of expert elicitation methods to meaningfully constrain the model output space while acknowledging our inherent uncertainty regarding the model input space. As a probabilistic encoder of expert knowledge, formal elicitation procedures (O'Hagan et al., 2006) have contributed greatly to physical domain sciences where complex models are essential to our understanding of the underlying processes. From climatology, meteorology and oceanography (Kennedy et al., 2008) to geology and geostatistics (Walker and Curtis, 2014, and Lark et al., 2015) to hydrodynamics and engineering (Astfalck et al., 2018, 2019), the central role of expert elicitation is being increasingly recognised. The complexity and parameterisations of geophysical models, as well as the expert knowledge that resides within the geophysical community, suggest this domain should be no different. It is worth noting that the choice of parameter bounds used to define the 1-million-model suite in this article is itself an informal expression of expert elicitation.

Once a targeted structure is reasonably well characterised, the approach taken by Guo et al. (2021) of thoroughly exploring a narrow search space becomes possible. Unfortunately, in many parts of the world there is no outcrop available due to tens to hundreds of metres of cover. In these scenarios, it makes sense to start with a broader search for possible 3D models that may match the observed gravity or magnetic response, given their inherent ambiguity. We can imagine a hierarchical approach where a subset of the 1 million models is identified as possible causative structures, and then these are accepted or rejected based on the geologist's prior knowledge, and the accepted models are then used as the basis for a focussed parameter exploration. In addition, within the 1-million-model suite, it is currently possible to filter the models based on event ordering, and with minor modifications to the code, it would be possible to filter by any parameter, such as fold wavelength.

In the future we may need a better representation of the “real-world” 3D
model space, specifically by doing the following:

A DOI (

the source code (C language) for the version of Noddy adapted to producing random models;

a readme.md file with a link to the windows version of the Noddy software, plus a link to 343 .tar files, 1 for each event history ordering of the model suite;

a Jupyter notebook (Python code) for sampling from and unpacking the models;

a link in the same readme.md file to the equivalent

This study represents our first steps in producing geologically reasonable training sets for ML and geophysical inversion applications. We have used Noddy to generate a very large, open-access 1-million-model set of 3D geology and resulting gravity and magnetic models as ML training sets. These training sets can also be used as test cases for gravity and/or magnetic inversions. The work presented here may be a first step to overcoming some of the fundamental limitations of applying these techniques to natural geoscientific data sets.

MJ wrote the original and modified Noddy software, ran the experiments, and wrote the Python software for visualising the models. JG and YL were involved in conceptualisation and manuscript preparation. ML, JG and GP were involved in the conceptualisation, as well as in co-writing the Introduction and Discussion sections of the paper. VO, RS and EC were involved in developing and co-writing the Introduction and Discussion sections of the manuscript.

The contact author has declared that neither they nor their co-authors have any competing interests.

Publisher's note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

We acknowledge the support from the Loop consortia. This is MinEx CRC Document 2021/52. We would like to thank AARNet for supporting this work by hosting the 500 GB model suite at CloudStor.

This research has been supported by the Australian Research Council (grant nos. LP170100985, DE190100431 and IC190100031), the Mineral Exploration Cooperative Research Centre and the National Natural Science Foundation of China (grant no. 41671404).

This paper was edited by Jens Klump and reviewed by Jiajia Sun and one anonymous referee.