Let’s talk about baskets.
When we look into a customer’s basket, we can tell a lot about them. Those observations can range from the simple, like what they’ve bought, where, and when, to the more complex such as why they came to the store in the first place and connections between the different items they bought. With the right approach, we can even predict what they might buy next. Either way, the better we understand their baskets, the better we understand customers too.
At dunnhumby, that understanding starts with “Trip Missions”. These are basket-level segmentations that tie multiple need states together in order to understand a customer’s main reason for visiting a store. Trip Missions are key, because they can help retailers understand business performance and the effectiveness of their pricing, assortment, and store experience, as well as giving them the ability to tailor messaging to specific missions.
In order to create these segmentations, we also have to find out what those different need states are. The only way to do that, of course, is to dive deep into customer baskets, look at what they’re buying, and conduct the analysis required to single out need states and combine them into meaningful missions.
The good news is that there’s no shortage of baskets to peer into. Retailers around the world process billions of baskets every year, and dunnhumby Shop – one of our key insight and analytics tools – analyses well over 20bn baskets per annum on its own. The bad news is that turning that data into something that helps us understand common behaviours across millions of customers is no simple task.
While dunnhumby already has numerous ways of approaching that challenge, we’re always on the lookout for smarter and more sophisticated techniques as well. That’s where topic modelling comes in.
Now, topic modelling isn’t something that’s specific to either the grocery industry or dunnhumby. It’s a statistical technique that’s most commonly used for finding “hidden” topics that can be used to describe a collection of documents. Topic modelling is frequently used in conjunction with automated text-mining, allowing large collections of documents to be understood in a systematic and approachable way.
Topic modelling is also a form of “unsupervised learning”, and approach that applies machine learning processes to the analysis of untagged or unlabelled datasets. The primary benefit of unsupervised learning is that it helps us find commonalities and significant differences within a dataset, all without a human needing to guide things.
Because of that, and while it might not have originated within the grocery industry, topic modelling has clear relevance – particularly when we apply it to the analytical conundrum outlined above. Just as it can be used to find common themes across millions of documents, topic modelling can also be used to identify “basket topics”, those need states satisfied by a certain set of products.
In many ways, the exact same principles that apply topic modelling within text-mining can also be used during basket analysis. Rather than reinventing the wheel, we simply need to substitute those aspects that are specific to text-mining in this process with those that relate to grocery.
Take a look at the diagram above, for instance. Here, we’re using the same modelling techniques that we would during text-mining to sort our four different “documents” – or, baskets – into different “topics”. Rather than just being a group of seemingly disconnected products, each basket is now aligned to a bigger theme. Naturally, that’s incredibly powerful when you extrapolate that across hundreds of thousands, or even millions of baskets.
The next question, of course, is where these topics themselves come from. Naturally, while there’s an academic answer to that question, the best way to think about it from a grocery perspective is in terms of customer needs. Essentially, topics are created around groups of products that are frequently bought together and fulfil a specific customer need. A “dog products” topic might contain kibble, wet food, chew toys, and training treats, for instance.
Let’s just wind back to our original goal here. As mentioned towards the beginning of this article, what we really want to understand is Trip Missions – basket-level segmentations that help us understand why customers are coming to store. Topics are incredibly useful in this context, because they can be used to identify common Trip Missions across different baskets through the process of clustering.
Again, a diagram is useful here. In the example below, we have a collection of overarching Trip Missions and supporting sub- (or low-level) missions. Each of those sub-missions has been identified by looking at topic clusters. These then feed up into that high-level mission; in the first column, for instance, “Scratch Cooking”, “Breakfast”, “Cooking on a Budget”, and “Quick Meals & Snacks” combine to form the “Food at Home” Trip Mission.
Understanding topics – and how they ultimately shape customer Trip Missions – is an immensely useful capability for a retailer to have. Whether it’s trend analysis as part of KPI reporting, a better understanding of assortment and category flow, or being able to bundle key items around seasonal events, Trip Missions can be used in a multitude of ways to refine and improve performance.
Ultimately, topic modelling provides an efficient and effective way of understanding data at the kind of scale required within grocery retail. As mentioned above, drawing consistent insights from billions of baskets is no easy task. Unsupervised learning techniques like topic modelling help us understand what customer baskets are telling us, without us needing to add our own assumptions about their behaviours into the mix.
For retailers, topic modelling gives them the ability to understand shopper missions with greater certainty, ensuring that they can respond with the right tactics across everything from pricing and assortment through to media and customer service.
A look at dunnhumby’s unique Customer Data Science, which is at the core of everything we do.
Data Science solutionsCookie | Description |
---|---|
cli_user_preference | The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data. |
cookielawinfo-checkbox-advertisement | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category . |
cookielawinfo-checkbox-analytics | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
CookieLawInfoConsent | The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data. |
viewed_cookie_policy | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
wsaffinity | Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky. |
Cookie | Description |
---|---|
wordpress_test_cookie | WordPress cookie to read if cookies can be placed, and lasts for the session. |
wp_lang | This cookie is used to remember the language chosen by the user while browsing. |
Cookie | Description |
---|---|
CONSENT | YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data. |
vuid | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website. |
_ga | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_gat_gtag_UA_* | This cookie is installed by Google Analytics to store the website's unique user ID. |
_ga_* | Set by Google Analytics to persist session state. |
_gid | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
_hjSessionUser_{site_id} | This cookie is set by the provider Hotjar to store a unique user ID for session tracking and analytics purposes. |
_hjSession_{site_id} | This cookie is set by the provider Hotjar to store a unique session ID, enabling session recording and behavior analysis. |
_hp2_id_* | This cookie is set by the provider Hotjar to store a unique visitor identifier for tracking user behavior and session information. |
_hp2_props.* | This cookie is set by the provider Hotjar to store user properties and session information for behavior analysis and insights. |
_hp2_ses_props.* | This cookie is set by the provider Hotjar to store session-specific properties and data for tracking user behavior during a session. |
_lfa | This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address. |
Cookie | Description |
---|---|
aam_uuid | Set by LinkedIn, for ID sync for Adobe Audience Manager. |
AEC | Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge. |
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud. |
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, Unique Identifier for Adobe Experience Cloud. |
AnalyticsSyncHistory | Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
bcookie | LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID. |
bscookie | LinkedIn sets this cookie to store performed actions on the website. |
DV | Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site. |
ELOQUA | This cookie is set by Eloqua Marketing Automation Tool. It contains a unique identifier to recognise returning visitors and track their visit data across multiple visits and multiple OpenText Websites. This data is logged in pseudonymised form, unless a visitor provides us with their personal data through creating a profile, such as when signing up for events or for downloading information that is not available to the public. |
gpv_pn | Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics. |
lang | Session-based cookie, set by LinkedIn, used to set default locale/language. |
lidc | LinkedIn sets the lidc cookie to facilitate data center selection. |
lidc | Set by LinkedIn, used for routing from Share buttons and ad tags. |
li_gc | Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes. |
li_sugr | Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
lms_analytics | Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics. |
NID | Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. |
OGP / OGPC | Set by Google, cookie enables the functionality of Google Maps. |
OTZ | Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate. |
s_cc | Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics. |
s_ips | Set by LinkedIn, tracks percent of page viewed. |
s_plt | Set by LinkedIn, this cookie tracks the time that the previous page took to load. |
s_pltp | Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics. |
s_ppv | Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed. |
s_sq | Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics. |
s_tp | Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page. |
s_tslv | Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics. |
test_cookie | Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies. |
U | Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
UserMatchHistory | LinkedIn sets this cookie for LinkedIn Ads ID syncing. |
UserMatchHistory | This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy. |
VISITOR_INFO1_LIVE | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
YSC | YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages. |
yt-remote-connected-devices | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
_gcl_au | Set by Google Analytics, to take information in advert clicks and store it in a 1st party cookie so that conversions can be attributed outside of the landing page. |