Podcast | The Evolution of Customer Data Science

With David Clements, Sandra Stanley,
13 September '22

Dave Clements (00:12): Hello everyone, and welcome to our next episode of the dunnhumby Customer First podcast. In today’s episode, we are going to try and demystify the world of data science, use of artificial intelligence and machine learning in retail. Sometimes it can feel like there’s a lot of buzzwords, but it’s not always clear what approach is right for your needs and how to practically apply it to everyday use cases.

Dave Clements (00:37): My name’s Dave Clements. I’m the Retail Director at dunnhumby, and today I’m delighted to be joined by Sandie Stanley, our Chief Data Science Officer at dunnhumby, who’s been a pioneer in this space for many years. Welcome Sandie.

Sandra Stanley (00:49): Thank you for having me, Dave. It’s a pleasure to be talking to you today.


Defining data science

Dave Clements (00:53): So let’s start, Sandie, with your view, an explanation of what data science is? Have you got a particular explanation or description you use? And also, I’d be interested to hear how you’ve seen it evolve a bit over time in the retail space?

Sandra Stanley (01:07): Thanks, Dave. Start with easy questions, right? I think the term data science was probably first used back in the 60s, but I don’t think it really gained popularity until 2010 or so. Perhaps the highlight for me was in 2012 when Harvard University said a Data Scientist was the sexiest job of the 21st century.

Dave Clements (01:25): Wow.

Sandra Stanley (01:26): But I can definitely see the foundations of data science from when I joined dunnhumby in the data analytics field with my stats background in 1999. And for me, data science is really about extracting value from data to drive decision-making. Thoughts to talk about in terms of it comes from combining the business domain expertise with the programming skills and the knowledge of mass and stats, and obviously lots of data, but it’s really about driving those decisions.

Dave Clements (01:51): I like that simplicity of it’s just about extracting value from data to drive decisions. That’s really clear and simple. And how have you seen things evolve?

Sandra Stanley (01:59): Actually in my mind, the real change has been both the explosion of data available and the compute power. So the data science started with statistics, and it’s evolved to include concepts and practices like artificial intelligence and machine learning, but its foundation is still those statistics. It’s the data in the technology that’s really changed for me.

Sandra Stanley (02:19): When I joined back in 1999, we used to receive tapes of data and it’d be weekly if we were lucky. They were often biked over from whichever retailer we were working with and we were right at the start of the big data with Tesco Clubcard then. We have brilliant rich data giving us that longitudinal view of customers, but we didn’t have the computing power or the budgets to be able to store and process all those transactions.

Sandra Stanley (02:42): So we would work on 1% samples of data. We’d build segmentations, predictive models on those samples, and then we’d spend time selecting those features one by one because the compute couldn’t handle us just pulling loads of data in, and then we’d have to find ways to extrapolate those to customers.

Dave Clements (02:59): I guess the power and the speed of computation has really changed the way that you can use much bigger data sources, much richer data sources, but actually much more powerful processing to get to the answers much quicker in a much better way.

Sandra Stanley (03:14): Absolutely. And it’s that frequency of data as well. So today we received data continuously throughout the day, and then we have that compute power to be able to bring multiple sources together. And also, our technology now can handle many more features going into those models and to optimise and run loop through many more times. I think I probably spent the first 10 years of my career or so showing retailers, CPGs and financial services companies, and many others, how to really use data to understand their customers, create segments.

Sandra Stanley (03:44): We’d often have segment managers within retailers or other businesses. They would be looking at how do they create strategies for those segments and how do they create personalised communications around those segments. Whereas obviously today, it’s much more around hyper personalisation as we like to say.

Dave Clements (04:01): Yeah.

Sandra Stanley (04:01): And that’s much more about one-to-one. Of course, you and I remember the early days of the Tesco Clubcard’s statements where we used to go to 15,000,000 customers with 12,000,000 variations, but all of that was done by hand. It was all about rules-based. It was every line was coded. The optimisation was done by an analyst or a scientist who was running around and looping those and be running the code. Whereas today, that’s all done by machines which can loop through and optimise much more fully.


Personalisation and the age of quantum computing

Dave Clements (04:29): Yeah. So that shift to hyper personalisation, individuals being able to really provide the right recommendations on what a consumer could do next or a shopper can do next, have really evolved over that time.

Sandra Stanley (04:45): Yeah. And of course, beyond personalisation, we’ve also seen a much greater use in rapid forecasting and optimisation problems, how you make a change to a range or a price and get an immediate forecast. And that facilitates much more dynamic decision-making.

Dave Clements (04:57): So yeah, working out all those, as you say, forecasts that probably take into account a lot of understanding of the whole category and cannibalisation and all of these things.

Sandra Stanley (05:07): And we look forward to the age of quantum computing and being able to optimise the whole store. And that’s one of the areas we’re researching at the moment, but I guess there’s lots of new areas of machine learning that’s all been made possible by the advances in technology and data, computer vision that’s enabling the automated stores where you can check out or natural language processing to codify customer feedback. Those are areas that have come much more recently with that advancement in computing power.

Dave Clements (05:36): Yeah, amazing. The amount of change and development over all those years. And I certainly remember some of those early Tesco days when we were starting to really move from segmentations into personalised offers for every individual.


Building a successful data science team

Dave Clements (05:51): As you think about how you’ve built and managed data science teams over those years, are there any particular things that have been key to success? What are your tips around how to keep up with the change and pace of computing developments and build a good data science team? What are your top tips there?

Sandra Stanley (06:10): Oh gosh, where to start again? So, I think we’ve all read the stats about the number of failed data science projects and big data projects. I think accepted wisdom probably says that about 80% of data science projects don’t actually end up being implemented, and I think actually why that is and the first thing I really say is it’s really important to be close to the business. It’s really important that the data science teams understand the problems that the business are trying to solve, to be sat with the decision-makers that are going to be using the science, to understand the process in which that they’re making those decisions and really be able to do that.

Sandra Stanley (06:44): I think data science teams that are kind of siloed themselves away with some intellectual problems are some of those that struggle to really see the light of day.

Dave Clements (06:52): They’re really embedded within the organisation, close to the business users and really close to what their challenges are to really make sure they understand them well.

Sandra Stanley (07:02): Yeah, really understanding. And with me actually, the second one that goes with that is around the agile mindset. And I don’t really mean agile in terms of the technology or the techniques around scrums and standups, but really around an agile mindset, which is very difficult for a data scientist. Data scientists like logical flows. And actually, but if I think about the agile mindset, it really is about being flexible, responsive, and adaptable to the needs.

Sandra Stanley (07:28): Data scientists again, would like to have the luxury of time to build their models, assemble their features and the data and that before building their models, but businesses expect faster results. So data scientists really need to make sure that they’re being flexible and adaptable and adapting as they go along.

Dave Clements (07:43): Yeah. So actually responding to… You might not always have the opportunity to have the perfect model. Sometimes you may have to have a model that’s good enough to answer a certain part of the question. Sometimes you might need much more accuracy.

Sandra Stanley (07:55): Yeah. And actually, we will often talk about pulling in a heuristic solution first. So a kind of more rules-based heuristic solution before then actually building on top of that. So you can prove the value early, which actually would bring me onto my third tip, which is about measuring value. And I think one of the biggest things that I’ve learned is the importance of estimating value before you even start a project. So by understanding the business really well and being close to what the business is trying to do, then you can actually make estimates around what’s the additional value you’ll be able to add.

Sandra Stanley (08:26): And then when you do that, that helps you to get the resource that you need and it helps you to get the time that you need. Of course, it’s really important that when you finish, you do then measure that value and that you do communicate how much value because that’s what will enable you to be able to move on to the next thing. We can be a bit too fast to want to move on to the next challenge and not pause to really understand have we delivered the value that we intended?

Dave Clements (08:49): And presumably it’s not just value necessarily in terms of increased sales or improved customer penetration. There’s many ways to measure the type of value you presume.

Sandra Stanley (08:57): No. And it’s often productivity and efficiency gains that you are really trying to gain for as well. I guess actually, the over key thing I would really say is that what’s really important to data science teams and to data scientists is that they’re continuously learning. The field moves so fast. I just talked about some of the changes, but if you think about the new techniques and technologies and concepts that are being introduced frequently and the community that there is around data science, it’s really important that we give our data scientists the time and the tools to be able to learn and to be able to experiment. And it’s what keeps them motivated. It’s one of the key ways that you can keep your data science team motivated is by giving them the new challenges and the new tools.


Why data projects fail

Dave Clements (09:36): So is that about also sort of saying to your data science team, “Look, take some time out to learn new techniques, to go and access various communities, data science communities, networks, to find out new techniques, to bring them in, to spend time researching, not just actively using the same old approaches every time?

Sandra Stanley (09:56): Absolutely. And I think there are a number of professions that talk about continuous personal development and having to put certain number of CPD hours certain days in each year. And I think there’s a lot to be learned from that within data science and making sure we’re building it into people’s work and days rather than it being something that feels like an add-on afterwards.

Sandra Stanley (10:16): Dave, let me be cheeky and ask you a question. You’ve been in the industry for a long time. I’d love to know what you think are some of the reasons why data science projects fail and what some of the biggest challenges would be?

Dave Clements (10:26): Well, I’ve always thought one of the best examples of when things work well is when you do blend, as I call it art and science, the art of retail and the science of data and analytics. And I think you really need to be working closely hand-in-hand with your data science teams I’ve found, really bringing them in at a very early stage when you are actually analysing the key challenge or the problem that you’re trying to face so that they’re really embedded as you were saying, right from the start and access to the whole challenge, and constantly sort of involved so that they’re part of shaping, how are we going to build the right solution? How are we going to tackle this problem?

Dave Clements (11:05): So I think one of the critical things for me is blending those things together, blending the retail knowhow, how can you practically apply something and the data science and the analytics which is going to surface lots of new insights. And it’s also going to surface recommendations of how easy are they are to apply. So you’ve got to blend them together, if that makes sense.

Sandra Stanley (11:25): Oh, absolutely. That commercial acumen and the arts and science, for sure.


A much greater demand from retailers and CPGs to utilise our data science

Dave Clements (11:30): So you mentioned then about how does it work well in retail. As you look at some of the retailers and consumer goods companies that you are working with today and the teams are working with, what are some of the biggest needs currently that data science is trying to tackle that you think is very top of mind? Is it all about inflation and how are we solving that? And is data science helping there or are there many broad topics that are very much top of minds with retailers, CPGs at the moment?

Sandra Stanley (11:58): Wow, you are absolutely spot on that inflation’s a massive concern for retailers and CPGs at the moment. I think inflation is a generational high in many parts of the world and what that’s meaning for retailers and CPGs is they’re having to find ways to deal with increases in product and operational costs. Meanwhile, obviously customers are having to reassess their spending and their habits.

Sandra Stanley (12:18): And then on top of that, we have the challenges around supply and demand of commodities, whether it’s oil or grain and that’s impacting supply chains. So we see a much greater demand from retailers and CPGs to utilise our data science for assortment optimisation, price investment, and private brands particularly. With assortment, we see a demand to help rationalise range, focus on the products that are truly important to customers and focus the scarce resources where they’re really needed.

Sandra Stanley (12:42): Actually, the science that we are using is the same science that we used at the start of the COVID pandemic when there was, again, a real challenge in making sure that products were available for customers and therefore thinking about customer needs states, the customer priorities and actually, what was substitutable was really important to kind of get into that core minimum range that was needed for retailers.

Sandra Stanley (13:02): With inflation as well, I said the price investment, and that’s about ensuring we really understand what lines we need to protect and which lines can take on some of that inflationary burden. We’ve well tested to demand models and understanding of price elasticity at dunnhumby that’s been used across retailers for many, many years, and that’s helping us to work with our partner retailers to help them make the right decisions for customers.

Sandra Stanley (13:23): We also obviously understand which customers are being hit hardest and meet the most support. And so we can do that either through those core KVI lines, key value lines that we talk about, or actually through the personalised comms.


Using data science for current or predictive analysis…

Dave Clements (13:35): And when you are doing this work at the moment, is it increasingly analysing the current state of things or is it also doing a lot of work on predictive of where things are headed? How’s that evolving?

Sandra Stanley (13:46): And I think that’s where some of the customer behaviour analysis is really coming in. So we’re really understanding which customers are being impacted and how that’s moving through. So you always see the first customers have been impacted and that will have an impact on things like private brands and what you need to be doing about private brands, but then you see the future waves of customers that are being impacted. So we are doing a lot to forecast and to keep track on those trends in customers.

Sandra Stanley (14:10): But I have to say, alongside inflation, almost to make things worse, we’ve seen changing cost models for retailers over the last few years. We’ve all seen the move with customers moving online. And again, you and I Dave know that that has a lower margin. And actually also, the rise of the tech first retailers, those retailers that are going in with technology solutions from the beginning without the legacy systems that many of the big traditional retailers have.

Sandra Stanley (14:34): And I think this is really interesting because I think it moves us away from the early years of using data to influence decision-making, where we were kind of going in and we were talking to decision-makers around what the data was showing, into technology solutions that are actually augmenting or even automating that decision-making. So how do we move from sequential category reviews, for example, where we would hold the hands of a category director or a buyer through a review process to actually parallel where everybody’s got the recommendations that they need and they’re putting their sense check on a recommendation rather than taking what’s coming out of the data and then adding it to their own thoughts and real change in the operating model around that.

Dave Clements (15:12): Yeah. I can imagine, a lot more automation of decision-making and how that can really improve the operating processes.

Sandra Stanley (15:19): I think one of the really interesting things about CPGs is that they’re really focused on building the right data foundations at the moment. They’re unlike retailers, they’re really reliant on the third parties to understand and activate against their customers. They don’t have that final customer insight and interaction and that means that they’re assembling data sources, they’re pulling a lot of effort into ensuring in terms of the trust and the quality of that data before they can then get into enhancing it. And so, I think it’s really interesting that actually, they’re almost in at slightly different place in some of those foundational aspects around bringing that data together.

Sandra Stanley (15:51): Also, I think one of the things that you and I are passionate about around sustainability. We’re hearing perhaps a little bit less with some of the inflation pressures around sustainability, but we are still hearing the big CPGs particularly wanting to progress against the sustainability goals. We don’t yet have a clear set of metrics for the sustainability of a product, but actually, the big CPGs are tracking a variety of metrics through their supply chain and even using techniques like computer vision to look at deforestation and using satellite imaging to see actually the impact on deforestation through their products and their supply chain.

Dave Clements (16:25): Yeah. No, as you say, health and sustainability are here, and they’re going to be continuing for months and years ahead. So it’s really important to keep that research, keep that analysis, keep those decisions front and centre at the moment.


Exciting new innovations in the field of data science

Dave Clements (16:38): Now, I know you are passionate about innovation, Sandie, and we were sitting on a judging jury of our own dunnhumby Labs Challenge last week when some of our colleagues from around the world were pitching their latest ideas to us. Can you share some of your favourite innovations that you are particularly excited about at the moment in the field of data science?

Sandra Stanley (16:57): Yeah. I loved being part of the innovation challenge and to hear the many great ideas across the business. And actually, one that really stood out for me was the use of data science to tailor the design of communications. I remember working with a fledgling e-commerce marketplace 20 years ago, creating test and learn frameworks, and this felt like a very modern approach. But if I think about data science more broadly, I think there are a few areas that really interest me personally.

Sandra Stanley (17:22): So I love the blend of the behaviour economics with the data science. In machine learning and data science we assume that consumers make rational choices if we pull a lower price, thereby the lower price, but actually that’s not how real customers think. And so there’s lots of work around consumer psychology around the different models of well, how consumers make choices.

Sandra Stanley (17:42): And we’ve got some really exciting work with an American university at the moment, looking at predicting that consumer choice based on the product shown to them on the shelf or on the virtual shelf, and how we can factor in that relative thinking into our demand model. So some great kind of bringing of the two disciplines together, that’s really exciting.

Sandra Stanley (17:59): I also think hyper personalisation and this taps… this one for me as a consumer as well in that I think there’s hopefully a much greater adoption and a continued development around hyper personalisation. We’ve got some exciting innovations around predicting the relevancy of new products or isolating incremental response. I’d like to see us continue to improve those algorithms though. It is not based on my recent search history, which I’ve already made a decision on, that actually starts to look at what’s most helpful for me for the next decision I’m going to make. And I think that’s something we’re really focused on at dunnhumby.


Delivering personalisation…beyond just the offer or the discount

Dave Clements (18:31): And on that one when you’re talking about some of the innovations there, is it also going beyond… Because sometimes a lot of the personalisation really all about just a better offer or a better discount, is it also really getting into the right content and the right messaging and the right themes and messages to be engaging consumers with?

Sandra Stanley (18:53): Absolutely. And that was what was exciting for me around the innovation challenger one that came through, was actually, that wasn’t really about the offer or the discount. That was much more around how you personalise the message and what’s the right words for me, or what’s the right pictures for me. And it kind of felt like I was straying into minority report and actually thinking about that. You and I would see something totally different. And I think there’s a lot of future in the use of science to actually tailor that whole content. And as I say, think about what’s the next message that’s most helpful to me rather than necessarily what have I seen so far and therefore, what would I get?

Sandra Stanley (19:25): If I have to say, I think coming from all of that automated decisions, I do think it creates something much more around ethics and the ethics around some of the models that we’re creating and making sure that we’re inclusive. As we move to machines and technology play more of a role in that decision-making, then actually, there’s a much more reliance to make sure that we don’t have the bias in our data. We are thinking about any unintended consequences that we might have from maybe stretch spend-off as where we’re looking to not necessarily target single-person older households and things of the old day, and those consequences need to be really thought through in the model.

Sandra Stanley (20:00): One more that really, really excites me. And I like to term it small data and this one’s potentially really interesting, but I think we think about the last 20 years, it’s all been about the explosion of big data. It’s all been about more and more data, but I think we’ve got storage and cloud costs now starting to go up, we’ve got consumer consent and things like GDPR here in Europe around actually restricting some of our usage of data.

Sandra Stanley (20:27): We’ve also got the sustainability challenge where I think it’s estimated that data storage and processing will account for something like 20% of the global electricity consumption by 2030, which I think is actually encouraging us to think about much more around what’s the minimum amount of data that we need. What data do we absolutely need to store? How do we go back to some of the principles of 20 years ago that is around being much more selective around the data that goes in?

Dave Clements (20:52): Thinking about using only the data you really need and that’s going to drive the most important part of the results, not having access to lots and lots of data. So a bit of a shift again, or a challenge to make sure, look, you don’t have the minimal amount of data you need to make the right decision.

Sandra Stanley (21:16): Yeah. It feels like a bit back to the future. It feels a bit more reverse back to where it all started.

Dave Clements (21:18): Well, it’s been fantastic to talk with you Sandie, and hear everything from how things have developed, what are some of the key factors of success as we talked about being close to the business and agility, and then some of those areas of innovation that are fascinating from linking the core data science analytics with behavioural psychology and much more real-time personalisation and thinking about the relevant messages.

Dave Clements (21:48): So we look forward to seeing how those new innovations develop and come to market, but thank you very much for sharing those today with us Sandie.

Sandra Stanley (21:58): Thanks, Dave. It’s definitely an exciting field.

Dave Clements (21:59): Well, thanks everyone for listening. I hope you found the discussion useful as well, whether you are a retailer or a brand looking into how you are building your data science capability. We’d love to hear your thoughts on the subject. Feel free to contact myself or Sandie at dunnhumby.com, and join us again soon for our next Customer First podcast. And remember, you can access all our podcasts on a variety of different subjects impacting retailers and brands at Customer First Radio on Spotify or on our dunnhumby.com website. Thanks everyone.