The Power of Processes
dunnhumby builds long-term relationships with clients, and this is in part due to the links between our clients and our data scientists. It’s not just the science itself that delivers value; it’s the people behind the science, and their ability to provide understandable, explainable, stable results.
Data science is an industry largely defined by extremes of scale. As data scientists, it’s our job to take vast quantities of information and distil them down into insights and recommendations that can be understood and acted upon by our clients.
Often those recommendations can seem very ‘small’ against the data from which they originated. The decision to discount a product by 5% feels slight compared with the complex, million–transaction analysis required to reach that conclusion, for instance.
Complexity isn’t the only problem with scale, though; it also tends to engender hyperbole. Big problems require big solutions, after all. And, when the problem happens to be gigantic volumes of data, what better answers could there be than machine learning (ML)?
It’s very easy to get caught up in the hype around artificial intelligence (AI) and ML, and understandably so. The idea that we can take reams of unstructured information, feed it into a bank of supercomputers, and sit back while the hard work gets done for us is undeniably appealing. In grocery retail, where information volumes are especially massive, trends shift by the day, and even small snippets of data can provide huge clues about the future, that appeal is greater still.
I spend a lot of time with potential clients, many of whom have invested in the past five years into developing in-house data science teams of their own. Many of these people are genuinely bought into the benefits of becoming a data and science orientated organisation but have been underwhelmed by the progress they have made. ‘Chasing curiosities’, ‘slow’, and ‘inconsistent and unexplainable results’ are all statements I have heard.
The heart of their frustrations often comes down to investing in the wrong order. To be clear, I genuinely believe that machine learning is vital to what we do as data scientists. The algorithms and applications we use to process data are smarter and more sophisticated than they have ever been, and the amount of raw computational power we have at our disposal is continually growing. All of this helps us crunch data faster than we’ve ever been able to before.
At the same time, all this power means very little if we aren’t able to set it against the right framework. No matter how advanced the AI, nor how mighty the ML, we still need to provide the right environment for it to work effectively
This is why above all else, when it comes to data science, I believe in the power of processes.
A cupboard of ingredients
Imagine for a second that we’re not retail customer scientists using customer data. Imagine instead that we’re bakers, baking cakes from a vast cupboard of ingredients.
In the past, we’d bake using time-honoured recipes and techniques, passed down through generations of bakers, and honed over time. So the existence of AI and ML should make bakers’ lives easier. Applying it to baking would find new combinations of ingredients, techniques and presentations that we’d not previously considered, as we’d become overly reliant on our tried and tested means.
However, to truly use ML/AI in the way we need it to, we can’t just let it loose in the ingredients cupboard and expect to automatically see a delicious cake at the end. What we’d actually get is thousands of different cakes that would all need to be tasted, and a lot of work trying to work out what made a particular cake a success.
Moreover, every time we do this, we’ll get a completely new set of cakes, which will look very different to the last set. Some of the ingredients will be different, due to anything from seasonal availability to room temperature, and we’d get an entirely different batch. While people buying our cakes are happy with a small amount of variety, they are expecting at least some consistency of flavour and texture.
Fortunately, there’s a way to get the most out of the latest techniques, while at the same time building on experience. As bakers, we can put a lot of work into laying out the ingredients in advance and applying specific machines to work on specific parts of the process. We’d have a different machine for whisking, a different machine for stirring, a different for kneading, and so on. This allows us to get the best results out of each machine; it’s specialised to its task, and the overall process requires less labour and happens quicker.
Data science works in much the same way. You can throw the most sophisticated algorithms at your information but, without the right processes, the answers you’ll get will never be as good as they could be.
Dunnhumby’s data scientists carefully apply different models and techniques to each phase of the process that we’re trying to achieve. That way, we ensure that the variability and innovation our clients are expecting but can also reproduce findings and achieve some form of consistent and explainable results.
This means a lot of time is put into dunnhumby’s data engineering, coding guidelines, global code lines, and quality assurance processes. This may sound dull, but our experience shows that this process is vital in achieving high-quality, consistent results.
A guiding hand
Does this mean that we can only follow an existing recipe? That the processes we use to analyse data must be written in stone? Absolutely not. Just as an experienced baker can take a simple recipe and turn it into something even better, great data scientists can improve upon and refine their processes in order to get better, smarter answers. Moreover, they can ask better, smarter questions.
In both cases though, the ability to do so comes down to experience; the time you’ve spent immersed in that subject and having the confidence to build on a well-established base.
Processes can be adapted and improved upon, but only once we have an acute understanding of how and why those changes will impact the end result.
AI and ML are wonderful innovations. Their continued evolution is helping the data science industry to provide richer, deeper insights to our customers, and we shouldn’t underestimate the value they bring to our work.
At the same time, AI and ML are only tools. Using them effectively means knowing not just how they work, but where they sit in that much bigger process too.