Source Files

Real-world data to put your theory into practice

Here at dunnhumby, we understand the importance of great data and the analysts who make sense of it. Uncovering patterns, predicting trends, validating theories — insight gained through analysing customer data is the foundation of our business and key to the success of every one of our clients.

But more than that, we just really love data. We love connecting the dots. We love the human stories data can help you tell. And we love the people who love data as much as we do. That’s why we created Source Files, a platform for sharing real-world datasets where fellow data geeks – from professors to students to data scientists – can easily access rich data sources. Whether you’re teaching a course, completing a class project, testing an algorithm, or running a hack-a-thon, Source Files is the place to go to put your theory into practice.

 

Contact us

For questions, comments or assistance, e-mail us.

Breakfast at the Frat

Using sales and promotion info on pretzels, frozen pizza, boxed cereal, and mouthwash gathered from a sample of stores over 156 weeks, this dataset facilitates time series analyses in areas including promotional effectiveness and price sensitivity.

Download

X

Breakfast at the Frat: A Time Series Analysis

What’s inside?

  • Sales and promotion info on the top five products from each of the top three brands within four selected categories: mouthwash, pretzels, frozen pizza, and boxed cereal, gathered from a sample of stores over 156 weeks
  • Unit sales, households, visits, and spend data by product, store, and week
  • Base Price and Shelf Price, to determine a product’s discount, if any
  • Promotional support details (e.g. sale tag, in-store display), if applicable

What’s it for?

This dataset is designed to facilitate time series analyses, including:

  • Price sensitivity analysis
  • Promotional effectiveness analysis
  • Comparing/contrasting results across products, categories, store groupings, or geographies

Who’s it from?

This dataset was developed with the assistance of:

Steven Lugauer
Assistant Professor of Economics,
Notre Dame


Steven Beuchler
Professor and Department Chair,
Department of Applied and Computational Mathematics and Statistics,
Notre Dame


Timothy Gilbride
Associate Professor of Marketing,
Notre Dame

How should I use it?

Check back soon for example exercises, case studies, and other helpful info from our professor partners at Notre Dame.

Download

Let's Get Sort-of-Real

The data's not real, but there sure is a lot of it. With 300M+ at-till transactions over 117 weeks, we’ve replicated the typical patterns found in real in-store sales data to help curious data scientists test their techniques and algorithms in a very real way.

Multiple download options available. Click to view.

X

Let’s Get Sort-of-Real: Dummy Data to Test Techniques and Algorithms

What's inside?

By the numbers

  • 117: Weeks of transactions at till dummy data
  • 300M: Total number of transactions
  • 47M: Total number of baskets
  • 400,000: Average number of baskets per week
  • 2.6M: Average number of transactions per week
  • ~500,000: Distinct number of customers
  • ~5,000: Distinct number of products
  • ~760: Distinct number of stores

What's it for?

We've replicated the typical patterns found in real in-store data to help data scientists test their techniques and algorithms in a (nearly) real-world environment.

A note on download times

Please remember, you're dealing with Big Data! Large file sizes can result in download times of five minutes or more. Please be patient.

Samples available
  • Data preview download
  • 2,000 baskets, randomly selected, over a period of two weeks download
  • All transactions for a randomly selected sample of 5,000 customers download
  • All transactions for a randomly selected sample of 50,000 customers download
Full dataset

Ready to get real? Grab the full 4.3GB dataset below (in nine ~500MB files, for your downloading convenience).

1  |  2  |  3  |  4  |  5  |  6  |  7  |  8  |  9

User guide

Downloading the full dataset? You'll want to check out our handy User Guide too.

Carbo-Loading

Carbo-Loading contains household level transactions over a period of two years from four categories: Pasta, Pasta Sauce, Syrup, and Pancake Mix. These categories were chosen so that interactions between the categories can be detected and studied.

Download

X

Carbo-Loading: A Relational Database

What’s inside?

  • Household level transactions over a period of two years from four categories: Pasta, Pasta Sauce, Syrup, and Pancake Mix

What’s it for?

  • Classroom projects and case studies
  • Understanding the process required to mine data
  • Learning how to merge data tables and aggregate data

Who's it from?

This dataset was developed with the assistance of:

Carrie Heilman
Associate Professor of Marketing,
University of Virginia

How should I use it?

Professors have had success asking students questions such as:

  • What is the household penetration of Product X? That is, out of all customers purchasing Pasta Sauce, what percent purchase Product X or Brand Z?
  • Did any customers first purchase an item or category using a coupon? If so, how many of these customers made additional purchases of the item or category?
  • In two complementary categories (e.g. Pasta and Pasta Sauce), what products, if any, are commonly purchased together?

Special considerations

Don't forget, you're dealing with Big Data! Large file sizes may take 5+ minutes to download, and importing the millions of rows of data contained within will require specialised software such as R, Microsoft Excel with PowerPivot, Microsoft Access, SAS, SPSS, SQL, etc.

 

Download

The Complete Journey

This dataset contains household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer. It contains all of each household’s purchases, not just those from a limited number of categories.

Download

X

The Complete Journey

What’s inside?

  • Household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer
  • All of a household’s purchases within the store, not just those from a limited number of categories
  • Demographics and direct marketing contact history for select households

What’s it for?

  • More advanced classroom settings
  • Academic research on the effects of direct marketing to customers

Who’s it from?

This dataset was developed with the assistance of:

Raj Venkatesan
Bank of America Research Professor
of Business Administration,
University of Virginia

How should I use it?

Professors have had success asking students questions such as:

  • How many customers are spending more/less over time?
  • Which demographic factors (e.g. household size, presence of children, income) appear to affect spend of the customer?
  • Is there evidence to suggest that direct marketing improves overall customer engagement?

Special considerations

Don't forget, you're dealing with Big Data! Large file sizes may take 5+ minutes to download, and importing the millions of rows of data contained within will require specialised software such as R, Microsoft Excel with PowerPivot, Microsoft Access, SAS, SPSS, SQL, etc.

Download

About dunnhumby

dunnhumby is a leading customer data science company. We analyse data and apply insights from nearly one billion shoppers across the globe to create personalised customer experiences in digital, mobile, and retail environments. Our strategic process, proprietary insights, and multichannel media capabilities build loyalty with customers to drive competitive advantage and sustained growth for clients. dunnhumby employs nearly 2,000 experts in offices throughout Europe, Asia, Africa, and the Americas and works with a prestigious group of companies including Whole Foods Market, Tesco, Monoprix, Raley’s, Meijer, Michael Kors, Coca-Cola, Procter & Gamble, and PepsiCo.

Careers at dunnhumby

We use data and science to discover what customers want, then we give it to them — but we could use your help. If you’re an analyst who shares our passion for pushing boundaries, asking why, and dreaming big, we’d love to talk.

University Programmes

Whether you’re looking for a valuable short-term experience or stepping into your first job, dunnhumby offers a number of opportunities tailored to smart, ambitious students and recent graduates. From summer internships to accelerated training and role rotation, we strive to make our analyst programmes challenging and rewarding.

Request download access for:

The Hungry Dentist: A Time Series Analysis

Yes, please contact me about new datasets and analysis opportunities.
I have read and agree to the dunnhumby Source Files terms of use. *