Dataset: 50,000 transactions, Jan-Dec 2023
Analysis Tools: Python (pandas, matplotlib, seaborn)
Client: Regional retail chain (12 locations)
Objective: Identify factors driving sales variation across stores and seasons.
Data Cleaning Process: Removed 847 duplicate entries. Imputed missing values for 3% of transaction records (used median imputation for numeric fields). Standardized date formats across sources. Removed outliers beyond 3 standard deviations (1.2% of data).
Key Findings:
Seasonal Patterns: Sales peak in December (23% above annual average) and trough in February (18% below). Back-to-school season (August) shows second highest peak at 15% above average.
Store Performance: Top-performing store (Location D) outsells lowest (Location H) by 210%. Correlation analysis suggests store size and parking availability explain 67% of between-store variance.
Product Categories: Electronics (32% of revenue, 45% margin). Apparel (28% revenue, 52% margin). Home goods (18% revenue, 38% margin). Grocery (22% revenue, 12% margin).
Customer Insights: Repeat customers (2+ purchases) generate 68% of revenue but represent only 22% of unique customers. Average order value increases 40% from first to fifth purchase.
Recommendations: