How much data is too much? Obviously, the answer to that question depends on the context in which it’s being asked. When we look at enterprise data specifically though, one thing becomes clear quite quickly – more isn’t always better.
Over the past few years, a growing number of organisations have begun falling into the trap of believing that the more data they collect, the better insight they’ll end up with. Consequently, they end up hoarding all kinds of data in the hope that it will – somehow – help them steal a march on the competition. Compounding that is the presence of certain technology vendors, who are only too happy to support those ambitions.
The exact shape that this “support” takes can vary wildly, from data-on-demand and data workbenches through to collaborative clouds. Once the shopping spree is over, however, buyers tend to find themselves struggling to integrate those technologies effectively or realise a return that comes anywhere close to the vision that they were originally sold on.
One of the fundamental problems here is the (usually) cloud-based nature of that technology. If data storage capacity were unlimited and computation cheap, data hoarding could deliver at least some value. With most services demanding cloud services on a pay-per-use basis, though, costs mount in line with the volume of data that is being stored, accessed, and analysed.
That’s just part of the problem, too. The more sources and nodes that organisations add to their base layer of data, the more that management and quality control issues tend to arise. As a result, accuracy is undermined, representativeness suffers, computation time increases, and the ability to take advantage of new and emerging opportunities typically falls away.
To prevent this from happening, organisations need a strategic approach to data enrichment – one that systematically brings in only that information which is most relevant to the task at hand.
In their 1985 paper “Of strategies, deliberate and emergent”, business scholars Henry Mintzberg and James Waters wrote about the nature of strategic choice. In it they identified two strategic philosophies, the eponymous “deliberate” and “emergent” approaches.
In Mintzberg and Waters’ eyes, a deliberate strategy can be defined as one that arises from the conscious, thoughtful, and organised decisions taken by a business after the rigorous analysis of data. An emergent strategy comes from spontaneous actions and initiatives, often as an outcome of innovation or in response to unexpected opportunities. When compared to a deliberate strategy, an emergent strategy allows for a higher degree of flexibility, allowing organisations to adjust goals and pursue opportunities as they emerge.
While it may sound like these are opposing approaches, that’s not really the case. If we think about an organisation’s strategy as a linear process, then the deliberate aspect of that is simply the original intent: what was “meant” to happen. The emergent part then takes into account any new factors and issues that have come to light along the way, and how the organisation responded. They’re two parts of the same journey.
Why does this matter? Because, rather than choosing between deliberate or emergent when it comes to the enrichment of data, I think it’s more important to find the right balance between the two. To be truly successful with their accumulation and management practices, organisations need to be able to blend deliberate and emergent strategies and understand how to employ each when it is most relevant.
As with my opening question, context plays a key role here; what works for one organisation won’t necessarily do so for another. That said, I do believe that there are at least some universal guidelines that can be applied in order to ensure the greatest likelihood of success. With that in mind, here is what I believe is a five-step model for the creation of a well-balanced data enrichment strategy.
A deliberate data strategy is the rock on which everything else should be built. No matter how agile they may wish to be long term, organisations still need to establish a Single-Source-of-Truth (SSOT) hub that builds up a reserve of ethically and legally sourced data over time.
While I might have characterised data hoarding as a problem above, that’s not the case when it comes to data that is purposefully and selectively curated. Add “related” nodes of relevant data with specific use cases, ensuring overall fit with the SSOT hub for scaling later.
Interrupt and weed out any input data bias during the enrichment and integration phases. Bringing in rich demographic data that covers a fraction of the total customer population could add more features to the data model but also introduce bias, for example.
Conduct preliminary data investigations and experimentations on thin slices of representative sample data. After testing your models and removing any biases, scale the underlying data to make your solutions ready for production.
Not all data will be useful – and even information that is useful initially can age out of your models. Establish strong bottom-up controls that detect and purge redundant data, ensuring that you can optimise your cloud storage and compute costs while improving the accuracy of your analyses.
Executed well, this approach will deliver myriad benefits – not least:
More than anything, though, a balanced approach to data enrichment gives organisations the ability to build a sustainable first-party data asset that can be scaled and managed economically. Therein lies the path to that all-important return on investment – and genuine competitive advantage as well.
A look at dunnhumby’s unique Customer Data Science, which is at the core of everything we do.
Data Science solutionsCookie | Description |
---|---|
cli_user_preference | The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data. |
cookielawinfo-checkbox-advertisement | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category . |
cookielawinfo-checkbox-analytics | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
CookieLawInfoConsent | The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data. |
viewed_cookie_policy | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
wsaffinity | Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky. |
Cookie | Description |
---|---|
wordpress_test_cookie | WordPress cookie to read if cookies can be placed, and lasts for the session. |
wp_lang | This cookie is used to remember the language chosen by the user while browsing. |
Cookie | Description |
---|---|
CONSENT | YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data. |
vuid | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website. |
_ga | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_gat_gtag_UA_* | This cookie is installed by Google Analytics to store the website's unique user ID. |
_ga_* | Set by Google Analytics to persist session state. |
_gid | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
_hjSessionUser_{site_id} | This cookie is set by the provider Hotjar to store a unique user ID for session tracking and analytics purposes. |
_hjSession_{site_id} | This cookie is set by the provider Hotjar to store a unique session ID, enabling session recording and behavior analysis. |
_hp2_id_* | This cookie is set by the provider Hotjar to store a unique visitor identifier for tracking user behavior and session information. |
_hp2_props.* | This cookie is set by the provider Hotjar to store user properties and session information for behavior analysis and insights. |
_hp2_ses_props.* | This cookie is set by the provider Hotjar to store session-specific properties and data for tracking user behavior during a session. |
_lfa | This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address. |
Cookie | Description |
---|---|
aam_uuid | Set by LinkedIn, for ID sync for Adobe Audience Manager. |
AEC | Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge. |
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud. |
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, Unique Identifier for Adobe Experience Cloud. |
AnalyticsSyncHistory | Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
bcookie | LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID. |
bscookie | LinkedIn sets this cookie to store performed actions on the website. |
DV | Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site. |
ELOQUA | This cookie is set by Eloqua Marketing Automation Tool. It contains a unique identifier to recognise returning visitors and track their visit data across multiple visits and multiple OpenText Websites. This data is logged in pseudonymised form, unless a visitor provides us with their personal data through creating a profile, such as when signing up for events or for downloading information that is not available to the public. |
gpv_pn | Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics. |
lang | Session-based cookie, set by LinkedIn, used to set default locale/language. |
lidc | LinkedIn sets the lidc cookie to facilitate data center selection. |
lidc | Set by LinkedIn, used for routing from Share buttons and ad tags. |
li_gc | Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes. |
li_sugr | Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
lms_analytics | Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics. |
NID | Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. |
OGP / OGPC | Set by Google, cookie enables the functionality of Google Maps. |
OTZ | Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate. |
s_cc | Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics. |
s_ips | Set by LinkedIn, tracks percent of page viewed. |
s_plt | Set by LinkedIn, this cookie tracks the time that the previous page took to load. |
s_pltp | Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics. |
s_ppv | Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed. |
s_sq | Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics. |
s_tp | Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page. |
s_tslv | Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics. |
test_cookie | Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies. |
U | Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
UserMatchHistory | LinkedIn sets this cookie for LinkedIn Ads ID syncing. |
UserMatchHistory | This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy. |
VISITOR_INFO1_LIVE | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
YSC | YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages. |
yt-remote-connected-devices | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
_gcl_au | Set by Google Analytics, to take information in advert clicks and store it in a 1st party cookie so that conversions can be attributed outside of the landing page. |