Real-world data
Here at dunnhumby, we understand the importance of great data and the analysts who make sense of it. Uncovering patterns, predicting trends, validating theories — insight gained through analysing customer data is the foundation of our business and key to the success of every one of our clients.
But more than that, we just really love data. We love connecting the dots. We love the human stories data can help you tell. And we love the people who love data as much as we do. That’s why we created Source Files, a platform for sharing real-world datasets where fellow data geeks – from professors to students to data scientists – can easily access rich data sources. Whether you’re teaching a course, completing a class project, testing an algorithm, or running a hack-a-thon, Source Files is the place to go to put your theory into practice.
Breakfast at the Frat
What’s inside?
- Sales and promotion info on the top five products from each of the top three brands within four selected categories: mouthwash, pretzels, frozen pizza, and boxed cereal, gathered from a sample of stores over 156 weeks
- Unit sales, households, visits, and spend data by product, store, and week
- Base Price and Shelf Price, to determine a product’s discount, if any
- Promotional support details (e.g. sale tag, in-store display), if applicable
What’s it for?
This dataset is designed to facilitate time series analyses, including:
- Price sensitivity analysis
- Promotional effectiveness analysis
- Comparing/contrasting results across products, categories, store groupings, or geographies
How should I use it?
Check back soon for example exercises, case studies, and other helpful info from our professor partners at Notre Dame.
Download 'Breakfast at the Frat: A Time Series Analysis'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Who’s it from?
This dataset was developed with the assistance of:

Steven Buechler 
Professor and Department Chair, Department of Applied and Computational Mathematics and Statistics, University of Notre Dame
Carbo-Loading
What’s inside?
- Household level transactions over a period of two years from four categories: Pasta, Pasta Sauce, Syrup, and Pancake Mix
What’s it for?
- Classroom projects and case studies
- Understanding the process required to mine data
- Learning how to merge data tables and aggregate data
How should I use it?
Professors have had success asking students questions such as:
- What is the household penetration of Product X? That is, out of all customers purchasing Pasta Sauce, what percent purchase Product X or Brand Z?
- Did any customers first purchase an item or category using a coupon? If so, how many of these customers made additional purchases of the item or category?
- In two complementary categories (e.g. Pasta and Pasta Sauce), what products, if any, are commonly purchased together?
Special considerations
Don’t forget, you’re dealing with Big Data! Large file sizes may take 5+ minutes to download, and importing the millions of rows of data contained within will require specialised software such as R, Microsoft Excel with PowerPivot, Microsoft Access, SAS, SPSS, SQL, etc.
Download 'Carbo-Loading: A Relational Database'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Who’s it from?
This dataset was developed with the assistance of:
The Complete Journey
What’s inside?
- Household level transactions over two years from a group of 2,500 households who are frequent shoppers at a retailer
- All of a household’s purchases within the store, not just those from a limited number of categories
- Demographics and direct marketing contact history for select households
What’s it for?
- More advanced classroom settings
- Academic research on the effects of direct marketing to customers
How should I use it?
Professors have had success asking students questions such as:
- How many customers are spending more/less over time?
- Which demographic factors (e.g. household size, presence of children, income) appear to affect spend of the customer?
- Is there evidence to suggest that direct marketing improves overall customer engagement?
Special considerations
Don’t forget, you’re dealing with Big Data! Large file sizes may take 5+ minutes to download, and importing the millions of rows of data contained within will require specialised software such as R, Microsoft Excel with PowerPivot, Microsoft Access, SAS, SPSS, SQL, etc.
Download 'The Complete Journey'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Who’s it from?
This dataset was developed with the assistance of:

Raj Venkatesan 
Bank of America Research Professor of Business Administration, University of Virginia
Let’s Get Sort-of-Real
What’s inside?
By the numbers
- 117: Weeks of transactions at till dummy data
- 300M: Total number of transactions
- 47M: Total number of baskets
- 400,000: Average number of baskets per week
- 2.6M: Average number of transactions per week
- ~500,000: Distinct number of customers
- ~5,000: Distinct number of products
- ~760: Distinct number of stores
What’s it for?
We’ve replicated the typical patterns found in real in-store data to help data scientists test their techniques and algorithms in a (nearly) real-world environment.
A note on download times
Please remember, you’re dealing with Big Data! Large file sizes can result in download times of five minutes or more. Please be patient.
Samples available
- Data preview
- 2,000 baskets, randomly selected, over a period of two weeks
- All transactions for a randomly selected sample of 5,000 customers
- All transactions for a randomly selected sample of 50,000 customers
User guide
Downloading the full dataset? You’ll want to check out our handy User Guide too.
Download 'Let's Get Sort-of-Real: Data Sample'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Sample 2K baskets'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Sample 5K customers'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Sample 50K customers'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Full dataset
Ready to get real? Grab the full 4.3GB dataset below (in nine ~500MB files, for your downloading convenience).
Download 'Let's Get Sort-of-Real: Part One'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Two'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Three'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Four'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Five'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Six'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Seven'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Eight'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.
Download 'Let's Get Sort-of-Real: Part Nine'
Something went wrong, please try again.
Thank you - your request was successful.
Your copy of sourcefile was downloaded automatically; in case you missed it, click here to download it again.