Skip to the content.

Customer Purchase Analysis: Project Overview

Customer data was analyzed to increase outreach and yield higher profits for an online grocery service. This project was a part of CareerFoundry’s Data Immersion curriculum and was evaluated by my tutor and mentor.

The histogram at the top left shows the total number of orders for each day of the week. Saturday (day 0) and Sunday (day 1) are the busiest days, while Tuesday (day 3) and Wednesday (day 4) are the slowest.

The histogram at the top right shows the total number of orders for each hour of the day. 10:00am and 11:00am are the most popular times. A decline in order frequency was observed starting at 3:00pm, continuing until 4:00am the following day.

The accompanying line graph shows user spending levels for each hour of the day. The y-axis represents the average dollar amount for products purchased at certain hours of the day (x-axis). Spending peaks during early morning hours, particularly around 2:00am to 3:00am, 4:00am to 5:00am, and 6:00am to 7:00am. Conversely, spending reached its lowest point from 7:00am to 10:00am. A gradual increase in spending occurs just before 3:00pm, and a final spike in spending around 10:00pm.

I measured the count of orders for each department while segmenting users based on several criteria, including the number of dependents, age group, loyalty status, region, and family status. The findings revealed that Produce, Dairy/Eggs, and Beverages consistently emerged as the most popular departments among all users, irrespective of user criteria.

Users were classified into various loyalty status categories based on their maximum order count. The bar chart shows the total number of users by loyalty status and region. Most users are “Regular Customers”, with a maximum order count ranging from 11 to 40 orders, and reside in Region 3.

Customer regions were defined as follows:

Region 1: Maine, New Hampshire, Vermont, Massachusetts, Rhode Island, Connecticut, New York, Pennsylvania, New Jersey

Region 2: Wisconsin, Michigan, Illinois, Indiana, Ohio, North Dakota, South Dakota, Nebraska, Kansas, Minnesota, Iowa, Missouri

Region 3: Delaware, Maryland, District of Columbia, Virginia, West Virginia, North Carolina, South Carolina, Georgia, Florida, Kentucky, Tennessee, Mississippi, Alabama, Oklahoma, Texas, Arkansas, Louisiana

Region 4: Idaho, Montana, Wyoming, Nevada, Utah, Colorado, Arizona, New Mexico, Alaska, Washington, Oregon, California, Hawaii

The box-and-whisker plot gauges the dispersion of total income within each age group, helping identify which age group possessed the most significant spending power. The senior age group (ages 62 and older) have the highest total income, while young users (ages 35 and younger) have the lowest total income.

However, middle-aged parents are the largest segment within the customer demographic.

Data Cleaning & Transformation

Tools & Skills

Python Version: 3.11

Packages: pandas, NumPy, Matplotlib, seaborn, SciPy, pickle

Recommendations & Findings

The Learning Experience

Learning a new programming language and effectively applying it was a substantial learning curve throughout this project. To master Python, I engaged in thorough research and employed troubleshooting techniques, in addition to consulting my mentor for guidance.

Datasets