Predict your customers’ next purchase (Part 1)

Our day to day purchases and their details are saved to databases of our favorite restaurant, mall, or even coffee shop!

Since the rising interest in data science and machine learning, people have been adamant on finding the best way to notice purchase patterns and defining personal taste. Using these patterns we can find out things like (when can we expect the next purchase date to be, how does number of sales change with regard to discounts and offers, etc..)

In this article, we’ll figure out how to analyze customers purchases, finding insights, and building a fitting model to figure out what their next product of interest might be?

We’ll use the IBM Cognos Analytics dataset for a coffee shop, which will only have entries from April, 2019.
You can also download the dataset from kaggle.

The steps to our desired results are:
1- Exploratory analysis
2- Formulate our hypothesis
3- Feature engineering
4- Model selection
5- Results interpretation

Exploratory analysis:
We can use tableau to do some exploratory analysis and create visualizations to help formulate our hypothesis.
You can find mine here

In this process, my questions were:
1- What were the rush hours of different product categories?
2- What is the gender distribution of our customers?
3- Which products performed better than others? and does its price affect this?

4- Who is our most loyal customers? (longest subscription period and most purchases)

Product categories rush hours and gender distribution

These different graphs show mutual info and key differences between three of the most in-demand categories, seeing this pattern can explain for example (how coffee is ordered more than tea around noon, even though they have similar numbers for the rest of the day)

Best performing products overall

As for this graph, it shows count of certain product being ordered, and seeing how our best 10 or more products are on the cheap side of the menu can change our perspective on what is selling more, and how to measure that performance.

Now for the last bit of info we can extract from our data I used python and seaborn to make a heatmap to highlight the number of orders for each day on every hour.

sales['timestamp']= pd.to_datetime(sales['transaction_date'] + sales['transaction_time'], format='%Y-%m-%d%H:%M:%S')sales['Day'] = sales['timestamp'].dt.daysales['Hour'] = sales['timestamp'].dt.hourmap = sales.groupby([‘Day’,’Hour’]).order.sum().unstack().fillna(0)sns.set(rc={"figure.figsize":(16, 8)})sns.heatmap(map)
Result heatmap of orders distribution

Some other interesting things that I found using python were:
different generations and their order amount
generation — orders
Baby Boomers — 5876
Gen X — 5559
Older Millennials — 5345
Gen Z — 4184
Younger Millennials — 3301

Generations and their preferred serving size:
Baby Boomers — 16 oz. — 1778
Gen X — 16 oz. — 1706
Older Millennials — 16 oz. — 1665
Baby Boomers — 24 oz. — 1451
Gen Z — 16 oz. — 1280

Formulate our hypothesis:

Since our data has no details if there’s any discounts/new products/events, we can safely make assumptions about the data we explored earlier.
My basic idea was you can predict any customer’s next purchase depending on three or four conditions:
1- Serving size depending on their age
2- Product category depending on time of the day
3- Product depending on most frequently bought items by the same customer
4- Amount of spending depending if there’s discounts or it’s their birthday
(not enough data to verify this last condition since our dataset is only collected during one month)

Part 2
Where we’ll choose how represent our hypothesis and feature engineer some parameters than can be represent our customer’s personal taste better!




Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Miami rescinds its short-lived ban on shared electric scooters : Gadget Game News

How to Create Animated Plots in R

How to use Bamboolib for code-free data analysis

⚡️ Load the same CSV file 10X times faster and with 10X less memory⚡️

How much Math do you need to be a Data Scientist?

A Drive To Tests On Categorical Data

Emerging Data Science Trends for 2019 and Beyond

“MRMR” Explained Exactly How You Wished Someone Explained to You

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


More from Medium

Day 59: Urbanization by Economy

The Dashboard Doldrum

Usage of sets and dense rank while developing Top N reports in Tableau

Do Simple Graphs Have Context?— A look into Christmas Gift Spending