New year, new ideas. 2025 is upon us, and with it the chance to make a few speculative predictions about what the next 12 months might hold. From retail media to data, our subject matter experts have been hard at work trying to narrow down the biggest trends for the year ahead.
This time round, it’s the turn of Ross Williams—Lead Data Scientist—who’s laid down what he believes will be five of 2025’s most significant AI trends. Let’s dive in.
We end 2024 with dramatic hearsay leaking out of the world’s top AI labs. Whisper it quietly, but has the transformer scaling paradigm finally plateaued?
Scaling is the phenomenon by which throwing more data and more computing power into the mix has consistently produced better and better Large Language Models (LLMs). Simple as it might sound, this is by no means a trivial or guaranteed result—and the approach has driven Generative AI’s (GenAI) spectacular success over many years (and a pace of improvement that has been hard to keep up with!).
What would it mean if the scaling paradigm is coming to an end, though? What if bigger and better models were no longer always just around the corner? To achieve the ongoing improvements that we’ve grown used to, researchers would have to find new directions in which to take the field.
The trends we consider below mesh well with this backdrop and consider different paths to value—ones that aren’t just focused on “bigger and better”. And who knows, maybe scaling will continue to be the golden goose after all.
The unprecedented success of LLMs comes down to the transformer architecture and the magic of scaling. As these pioneer models potentially plateau (see above), it could be time for smaller, niche models to take centre stage. An added twist here could see companies leveraging their proprietary data sources to create GenAI-based approaches that are laser-focused on their domain of expertise.
This could take many forms: deploying RAG pipelines, or fine-tuning a model on a company’s secret-sauce documentation and code, for instance. Or, perhaps, it could involve a step away from natural language entirely and see different types of data thrown at transformer models instead.
There are a few reasons as to why this is a powerful direction for GenAI to move in. Smaller models designed for specific tasks not only unlock new capabilities for companies, but are also cheaper to train and deploy. The lower computational cost is good news for the bottom line and for the environment, too.
Efficient deployment and operation of these models is another growing concern for any company working in this area, as is data privacy and security. Self-hosting compact, specialised models— rather than relying on the ever-larger third-party generalist monsters—sidesteps fears over what information can or can’t be feed to a foundation model.
At dunnhumby we are particularly excited by our work on basket transformers, where we train transformer-based models on product-basket data, enabling a new type of retail foundation model.
Today, data scientists typically build models that consume a single type of data. That could be forecasting models that purely use tabular data, for instance, or sentiment analysis of social media posts based on text data. In the future, models that natively combine disparate types of data and leverage them together—“multimodal models”—will be more prevalent.
One type of multimodal model that is already well-established is text-to-image generation, but this example is only the tip of the iceberg when it comes to multimodality. It’s hard not to believe that the trend here will be towards “everything-to-vec”, where any conceivable type of input will end up being fed into increasingly multimodal models. Text, image, audio, video, sensor, tabular data, and more will all be on the table.
These Frankenstein’s monster models will unlock not just new abilities, but increasingly enable more natural and context-aware interactions with AI. That said, I’d be lying if I claimed I haven’t rolled my eyes at some of the storyboard narratives conjured up by consultants, with users asking their phone to update and run predictive models!
What do these multimodal models get you in a retail setting? The first layer of benefits to be unlocked may be achieving higher performance on our traditional data science problems. You don't need to know how these complicated neural networks work under the hood to appreciate a key conceptual benefit: different types of data carry different information. Often, machine learning tasks see boosts in performance when you consider several inputs that are uncorrelated with each other, that is, reflect different aspects of the same real-world problem.
Of course multimodal models won’t just boost performance on current retail tasks, but also open up problem solving of the sort that can’t be done today. Consider generating a media banner for a product based on a text description—that’s one use case already happening today. But why not take this a step further? A multimodal approach could allow you to create a product-centric media banner for a specific customer, all based on their purchase history or website touchpoints. An era of hyper personalisation beckons.
A well-publicized weakness of the current generation of LLMs is their capacity to hallucinate and spew erroneous output. At their worst, these models can return convincing answers containing subtle errors—a productivity-destroying rather than -enhancing event for a user.
As such, the top AI labs have strived to create complementary approaches that have a more logical grounding, providing stronger problem-solving abilities. Interestingly this goal of a reasoning model has been tackled from some quite different directions.
In July 2024, Google Deepmind used a neuro-symbolic approach to create a reasoning model capable of achieving a Silver Medal in the International Maths Olympiad. The capabilities of this model are quite distinct from a LLM—it was able to write a 96-step proof to solve a tricky geometry problem. More recently, OpenAI introduced their o1 model family. Again, the goal here is a model with advanced reasoning and chain-of-thought ability.
Whatever the route to problem-solving models, they offer big upsides. If training-time scaling has plateaued, it makes sense to push performance along other dimensions. In the case of the o1 models, for example, performance can be scaled with “thinking time”. Another intriguing use case would deploy a reasoning model as an independent smart tutor to boost LLM training, allowing automated, logically astute marking of LLMs’ (occasionally feverish) responses.
What can we expect in 2025 for this esoteric field that edges slowly towards business impact? 2024 has seen mixed fortunes for quantum startups. While the best have thrived, some companies that went public during the Covid-tech IPO frenzy are now nothing more than penny stocks.
Alongside this drama, though, there have been undeniably impressive achievements—particularly on the hardware front, as the field abandons any remaining belief that small noisy devices can create advantage. Rather the quantum community has set its sights on larger, error-corrected devices. Technically (much) harder to achieve, yes, but with a guaranteed payoff of unlocking currently impossible computations.
Since this blog is about AI, I should also note that quantum computing’s impact on fields like GenAI may be limited for a long time (though the converse is not true; AI will likely help the development of quantum computing a lot).
One of the first areas where we will see quantum advantage in industry will likely be on optimisation problems, and you can check out our work with Durham University on this front here and here.
A look at dunnhumby’s unique Customer Data Science, which is at the core of everything we do.
Learn more about dunnhumby's ScienceCookie | Description |
---|---|
cli_user_preference | The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data. |
cookielawinfo-checkbox-advertisement | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category . |
cookielawinfo-checkbox-analytics | Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category . |
cookielawinfo-checkbox-necessary | This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary". |
CookieLawInfoConsent | The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data. |
viewed_cookie_policy | The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data. |
wsaffinity | Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky. |
Cookie | Description |
---|---|
wordpress_test_cookie | WordPress cookie to read if cookies can be placed, and lasts for the session. |
wp_lang | This cookie is used to remember the language chosen by the user while browsing. |
Cookie | Description |
---|---|
CONSENT | YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data. |
vuid | Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website. |
_ga | The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors. |
_gat_gtag_UA_* | This cookie is installed by Google Analytics to store the website's unique user ID. |
_ga_* | Set by Google Analytics to persist session state. |
_gid | Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously. |
_hjSessionUser_{site_id} | This cookie is set by the provider Hotjar to store a unique user ID for session tracking and analytics purposes. |
_hjSession_{site_id} | This cookie is set by the provider Hotjar to store a unique session ID, enabling session recording and behavior analysis. |
_hp2_id_* | This cookie is set by the provider Hotjar to store a unique visitor identifier for tracking user behavior and session information. |
_hp2_props.* | This cookie is set by the provider Hotjar to store user properties and session information for behavior analysis and insights. |
_hp2_ses_props.* | This cookie is set by the provider Hotjar to store session-specific properties and data for tracking user behavior during a session. |
_lfa | This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address. |
Cookie | Description |
---|---|
aam_uuid | Set by LinkedIn, for ID sync for Adobe Audience Manager. |
AEC | Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge. |
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud. |
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg | Set by LinkedIn, Unique Identifier for Adobe Experience Cloud. |
AnalyticsSyncHistory | Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
bcookie | LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID. |
bscookie | LinkedIn sets this cookie to store performed actions on the website. |
DV | Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site. |
ELOQUA | This cookie is set by Eloqua Marketing Automation Tool. It contains a unique identifier to recognise returning visitors and track their visit data across multiple visits and multiple OpenText Websites. This data is logged in pseudonymised form, unless a visitor provides us with their personal data through creating a profile, such as when signing up for events or for downloading information that is not available to the public. |
gpv_pn | Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics. |
lang | Session-based cookie, set by LinkedIn, used to set default locale/language. |
lidc | LinkedIn sets the lidc cookie to facilitate data center selection. |
lidc | Set by LinkedIn, used for routing from Share buttons and ad tags. |
li_gc | Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes. |
li_sugr | Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
lms_analytics | Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics. |
NID | Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads. |
OGP / OGPC | Set by Google, cookie enables the functionality of Google Maps. |
OTZ | Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate. |
s_cc | Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics. |
s_ips | Set by LinkedIn, tracks percent of page viewed. |
s_plt | Set by LinkedIn, this cookie tracks the time that the previous page took to load. |
s_pltp | Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics. |
s_ppv | Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed. |
s_sq | Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics. |
s_tp | Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page. |
s_tslv | Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics. |
test_cookie | Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies. |
U | Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland). |
UserMatchHistory | LinkedIn sets this cookie for LinkedIn Ads ID syncing. |
UserMatchHistory | This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy. |
VISITOR_INFO1_LIVE | A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface. |
YSC | YSC cookie is set by YouTube and is used to track the views of embedded videos on YouTube pages. |
yt-remote-connected-devices | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt-remote-device-id | YouTube sets this cookie to store the video preferences of the user using embedded YouTube video. |
yt.innertube::nextId | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
yt.innertube::requests | This cookie, set by YouTube, registers a unique ID to store data on what videos from YouTube the user has seen. |
_gcl_au | Set by Google Analytics, to take information in advert clicks and store it in a 1st party cookie so that conversions can be attributed outside of the landing page. |