Agile: does it work for data science teams?

What is Agile?

Agile is a well-established approach to project management and software engineering. The core principles include having small iterations with continuous feedback and improvements. This allows teams to deliver value to clients faster while ensuring that what they are delivering meets the client’s needs.

There are many different Agile methods that can be used, each with their own set of tools and techniques. Two of the most popular methods are Scrum and Kanban. With Scrum, teams are organised with specific roles and responsibilities. The team commits to delivering a set of tasks in a fixed time interval (sprint). This is particularly useful when there is a specific deadline to be met. With Kanban, a visual approach to managing the team’s workload is adopted, and teams focus on reducing Work in Progress (WIP) by finishing current tasks before starting new ones. This is achieved using a Kanban board with a continuous workflow structure.

At dunnhumby, we have adopted Agile ways of workings in several of our Science teams. These teams are cross-functional with data scientists, data science engineers, data engineers, product owners and internal stakeholders. The Agile methodology that is followed depends on the project, but the aim is always to find an approach that works well for the team. If certain aspects are not working, retrospective sessions, where the team discuss what has worked well and poorly, are the perfect opportunities to make improvements to the process.

How does data science work differ from software engineering?

At its heart, data science is all about using data to create actionable insight. It is part software engineering, part research and part innovation. In a traditional Agile framework for software engineering, the expected deliverable after each iteration is working software. However, in data science this is not necessarily the case. Instead, the deliverable could be the results of an exploratory data analysis, a collection of metrics or even the evidence that the proposed approach is infeasible. The important point for Science teams to consider when adopting Agile methodologies is that these ‘measurable deliverables’ can be reviewed, and feedback be given.

Agile for Science teams

As a data science manager at dunnhumby, I have had the opportunity to try out a hybrid Agile method called Scrumban with my science team and have learned what works well and what doesn’t work as well.

What works well?

Clearly defined tasks

A common pitfall of data science projects is lack of clarity of tasks and their expected deliverables, which can lead to scope creep or tasks overrunning due to pursuing avenues which are ultimately fruitless. With an Agile approach, each task is defined with a clear deliverable and timeline. This gives clarity both to the data scientist performing the work as well as the wider team who know what will be accomplished during the task. Additionally, regularly completing tasks can be very motivating compared to making incremental progress within a very large project.

Regular demos

Demo sessions are opportunities to share significant results or and milestones with the wider team. Having regular demos ensures that the whole team are engaged in the work being delivered and are great opportunities for cross-functional learning and ideation. Unlike traditional data science projects where internal stakeholders may only be involved at a monthly or quarterly cadence, demos give the opportunity for regular feedback and course correction (when necessary). It shouldn’t be necessary for everyone to demo each session, but only after a significant piece of work or milestone is achieved.

Retrospectives

Retrospective sessions are times for the team to regularly reflect on what has worked well, what hasn’t worked well and why. They offer a chance to make changes to the current process in order to improve the team’s ways of working. This is a big step forward from traditional data science projects where there is no formalised feedback system for data scientists to openly discuss ideas and approaches to better the team.

What doesn’t work as well?

Scoping can be very hard

Due to the uncertain nature of many data science projects, it can be very hard to accurately estimate the effort required. Before starting a project, it may not be clear where the pain points will lie. Some seemingly straightforward tasks may expand greatly in scope after initial investigations whereas other complicated-looking problems may be solved with a simple solution. Therefore, care must be taken when scoping tasks and it is common to have to rescope and/or create more sub-tasks once the work is underway.

Allowing space for innovation

With each task having a clear deliverable and timeline, data scientists could easily be overly focussed on finishing their tasks quickly rather than spending time exploring alternative approaches. While this may lead to rapid delivery, it may miss out on innovative solutions that could have large future impacts. Therefore, depending on the project, care should be taken to allow space for innovative thinking, allowing the data to dictate the approach.

Summary

Applying Agile approaches to Science teams can have great benefits. For example, having clearly defined data science tasks, regular demos and retrospectives drive clarity, engagement and ownership within teams. However, accurately scoping tasks can be difficult and the innovative, uncertain nature of data science means certain projects may not fit as easily within Agile frameworks as those which are closer to software engineering in nature.

Cookie	Description
cli_user_preference	The cookie is set by the GDPR Cookie Consent plugin and is used to store the yes/no selection the consent given for cookie usage. It does not store any personal data.
cookielawinfo-checkbox-advertisement	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Analytics" category .
cookielawinfo-checkbox-necessary	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
CookieLawInfoConsent	The cookie is set by the GDPR Cookie Consent plugin and is used to store the summary of the consent given for cookie usage. It does not store any personal data.
viewed_cookie_policy	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
wsaffinity	Set by the dunnhumby website, that allows all subsequent traffic and requests from an initial client session to be passed to the same server in the pool. Session affinity is also referred to as session persistence, server affinity, server persistence, or server sticky.

Cookie	Description
passster	Set by Passster to remember that a visitor has entered a correct password, so they don’t have to re-enter it across protected pages.
wordpress_test_cookie	WordPress cookie to read if cookies can be placed, and lasts for the session.
wp_lang	This cookie is used to remember the language chosen by the user while browsing.

Cookie	Description
fs_cid	Set by FullStory to correlate sessions for diagnostics and session consistency; not always set.
fs_lua	Set by FullStory to record the time of the user’s last activity, helping manage session timeouts.
fs_session	Set by FullStory to manage session flow and recording. Not always visible or applicable across all implementations.
fs_uid	Set by FullStory to uniquely identify a user’s browser. Used for session replay and user analytics. Does not contain personal data directly.
VISITOR_INFO1_LIVE	Set by YouTube to estimate user bandwidth and improve video quality by adjusting playback speed.
VISITOR_PRIVACY_METADATA	Set by YouTube to store privacy preferences and metadata related to user consent and settings.
vuid	Vimeo installs this cookie to collect tracking information by setting a unique ID to embed videos to the website.
YSC	Set by YouTube to track user sessions and maintain video playback state during a browser session.
_ga	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognise unique visitors.
_ga_*	Set by Google Analytics to persist session state.
_gid	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.
_lfa	This cookie is set by the provider Leadfeeder to identify the IP address of devices visiting the website, in order to retarget multiple users routing from the same IP address.
__Secure-ROLLOUT_TOKEN	YouTube sets this cookie via embedded videos to manage feature rollouts.

Cookie	Description
aam_uuid	Set by LinkedIn, for ID sync for Adobe Audience Manager.
AEC	Set by Google, ‘AEC’ cookies ensure that requests within a browsing session are made by the user, and not by other sites. These cookies prevent malicious sites from acting on behalf of a user without that user’s knowledge.
AMCVS_14215E3D5995C57C0A495C55%40AdobeOrg	Set by LinkedIn, indicates the start of a session for Adobe Experience Cloud.
AMCV_14215E3D5995C57C0A495C55%40AdobeOrg	Set by LinkedIn, Unique Identifier for Adobe Experience Cloud.
AnalyticsSyncHistory	Set by LinkedIn, used to store information about the time a sync with the lms_analytics cookie took place for users in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland).
bcookie	LinkedIn sets this cookie from LinkedIn share buttons and ad tags to recognise browser ID.
bscookie	LinkedIn sets this cookie to store performed actions on the website.
DV	Set by Google, used for the purpose of targeted advertising, to collect information about how visitors use our site.
gpv_pn	Set by LinkedIn, used to retain and fetch previous page visited in Adobe Analytics.
lang	Session-based cookie, set by LinkedIn, used to set default locale/language.
lidc	Set by LinkedIn, used for routing from Share buttons and ad tags.
lidc	LinkedIn sets the lidc cookie to facilitate data center selection.
li_gc	Set by LinkedIn to store consent of guests regarding the use of cookies for non-essential purposes.
li_sugr	Set by LinkedIn, used to make a probabilistic match of a user's identity outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland).
lms_analytics	Set by LinkedIn to identify LinkedIn Members in the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland) for analytics.
lpv[AccountID]	This cookie is set by Salesforce Marketing Cloud Account Engagement. Prevents counting multiple page views within a short window to avoid duplicate tracking.
NID	Set by Google, registers a unique ID that identifies a returning user’s device. The ID is used for targeted ads.
OGP / OGPC	Set by Google, cookie enables the functionality of Google Maps.
OTZ	Set by Google, used to support Google’s advertising services. This cookie is used by Google Analytics to provide an analysis of website visitors in aggregate.
s_cc	Set by LinkedIn, used to determine if cookies are enabled for Adobe Analytics.
s_ips	Set by LinkedIn, tracks percent of page viewed.
s_plt	Set by LinkedIn, this cookie tracks the time that the previous page took to load.
s_pltp	Set by LinkedIn, this cookie provides page name value (URL) for use by Adobe Analytics.
s_ppv	Set by LinkedIn, used by Adobe Analytics to retain and fetch what percentage of a page was viewed.
s_sq	Set by LinkedIn, used to store information about the previous link that was clicked on by the user by Adobe Analytics.
s_tp	Set by LinkedIn, this cookie measures a visitor’s scroll activity to see how much of a page they view before moving on to another page.
s_tslv	Set by LinkedIn, used to retain and fetch time since last visit in Adobe Analytics.
test_cookie	Set by doubleclick.net (part of Google), the purpose of the cookie is to determine if the users' browser supports cookies.
U	Set by LinkedIn, Browser Identifier for users outside the Designated Countries (which LinkedIn determines as European Union (EU), European Economic Area (EEA), and Switzerland).
UserMatchHistory	LinkedIn sets this cookie for LinkedIn Ads ID syncing.
UserMatchHistory	This cookie is used by LinkedIn Ads to help dunnhumby measure advertising performance. More information can be found in their cookie policy.
visitor_id[AccountID]	This cookie is set by Salesforce Marketing Cloud Account Engagement. Unique visitor identifier used to recognize returning visitors and track their behavior.
visitor_id[AccountID]-hash	This cookie is set by Salesforce Marketing Cloud Account Engagement. Secure hash of the visitor ID to validate the visitor and prevent tampering.
yt-remote-connected-devices	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
_gcl_au	Set by Google Tag Manager to store and track conversion events. It is typically associated with Google Ads, but may be set even if no active ad campaigns are running, especially when GTM is configured with default settings. The cookie helps measure the effectiveness of ad clicks in relation to site actions.

Agile: does it work for data science teams?

What is Agile?

How does data science work differ from software engineering?

Agile for Science teams

TOPICS

Get in touch

The latest insights from our experts around the world

Retail media: technology, trust and tailored experiences reshaping shopper decisions

Five outlooks for the future of AI governance in retail

Data signals for AI‑based decision making in retail and CPG

Agile: does it work for data science teams?

What is Agile?

How does data science work differ from software engineering?

Agile for Science teams

TOPICS

RELATED PRODUCTS

Get in touch

The latest insights from our experts around the world

Retail media: technology, trust and tailored experiences reshaping shopper decisions

Five outlooks for the future of AI governance in retail

Data signals for AI‑based decision making in retail and CPG