I just got back from the EAGE, the 79th annual European conference and exhibition for Geoscience and Engineering. But what I actually want to tell you about is what I did on the weekend before the conference started. A few of us from Teradata joined a slightly more recent tradition – the Subsurface Hackathon organised by Agile Scientific.
Agile Scientific – AKA Matt and Evan – have been punching way above their weight for years promoting open source and DIY alternatives to the closed, expensive and often old-fashioned commercial software solutions available for geoscience workflows. Lately they have been teaching geoscientists to code in Python; challenging them to try machine learning. And so unsurprisingly, the theme of this year’s hackathon was machine learning (ML).
Now, some of us at Teradata – including myself – have been a bit cynical about the current hype surrounding machine learning in general, and Deep Learning in particular – especially when applied to scientific or industrial domains.
I often say that just because we can do machine learning – easily, with the many libraries and toolkits available these days – it doesn’t mean we should. In the world of geoscience data, our data is actually pretty sparse. You could argue that to perform adequate machine learning for Geoscience we would need at least as much data as Google has images of cats, for example. What we do have is real world physics controlling the relationships between our data. So is machine learning really the most appropriate approach?
I’m not suggesting there isn’t a place for machine learning. Before the hackathon, I believed after using additional techniques to quality control, and engineer features from the data, features could be successfully input into machine learning algorithms to generate useful insights. However the old ‘garbage in, garbage out’ law still applies, even in the near-magical world of ML. You can’t machine learn everything.
Continually try to prove yourself wrong
But you know what, deciding not to try something because you think it will fail is not smart thinking. That isn’t agile thinking. Smart businesses – and smart entrepreneurs for that matter – test their theories as soon as they catch themselves making any assumption about viability. They set up a short project to try to prove the opposite of what they instinctively believe. In other words, as a smart business or smart entrepreneur you continually try to prove yourself wrong.
Think about it: it’s much cheaper and less risky to discover up front that your assumption is wrong. The alternative would be to continue along in blissful ignorance, only realising your error after having your business idea side-swiped by someone who didn’t hold your built-in biases – likely after serious investment in time and money.
And so we took part in the ML hackathon. When I saw teams at the hackathon taking something of a naive approach: throwing raw data into generic algorithms provided by Google and others, I was initially put off. But we watched with open minds what other teams were trying, and we learned some things.
Standard image-feature extraction techniques that find faces in photos can actually find (obvious) traps and (obvious) faults in (synthetic) seismic images. They can do that with only a couple of days training, and only a few hundred data sets.
Open machine learning libraries allowed a team to train a neural net to create (simple) geological models from (synthetic) seismic, and create (synthetic) seismic from (simple) geological models. They can do that with only a couple of days training, and a few data sets.
So did throwing raw data at openly available ML algorithms completely fail? No, it didn’t. But can it replace human interpreters today? Well, no. Not after only two days of training, and certainly not by only looking at a couple of hundred data sets.
But in time, with enough data, maybe machine learning could replace human interpreters. Outside the hackathon environment time is much less of a constraint, so we could easily solve the problem of limited training time. But limited training data? That could be the rub.
In time… maybe machine learning could replace human interpreters
If we want to be able to detect changes in reservoirs via seismic data as easily as Google can find pictures of cats, it follows that we’re going to need to train our models with as many seismic images as Google has pictures of cats.
I just found over 2 billion cat images on Google in 0.81 seconds. That equates to an awful lot of seismic images.
So how are we going to arrange that? In the case of super majors doing this work themselves, with access to vast amounts of data, maybe this could be viable. But for most of us – including universities and research institutes – it will be very difficult to take part in the model training experiment without some major changes to how we share data.
More fundamentally – should we be using supervised ML image processing techniques on seismic data at all? The hackathon teams chose this as an easy-entry, ‘low-hanging fruit’ approach to using existing ML libraries on subsurface data. They replicated the old workflow of interpreting seismic sections visually and detecting structural features.
How can machine learning deliver an adequate answer if even the experts can’t agree?
Remember the old joke about the five geoscientists looking at the same seismic section? How many interpretations will there be? At least six, right? That implies supervised ML might not be the best approach: how can machine learning deliver an adequate answer if even the experts can’t agree on the interpretation? Perhaps the way forward is to stop turning seismic survey data into images for humans to visually interpret. Would a better route be to apply machine learning to (as raw as possible) measurement data?
As is often the way these days, there doesn’t seem to be one single answer – just a load more questions… I’m off to look at some more cat pics while I think about it.