top of page

Create Your First Project

Start adding your projects to your portfolio. Click on "Manage Projects" to get started

Advanced Pizza Sales Analytics & Delivery Delay Prediction Using Excel, SQL, Power BI & Python

Project type

Sales Analytics and Delivery Delay Prediction

Date

2025

Role

Data Analyst

This comprehensive project demonstrates a full data analysis and machine learning workflow applied to a simulated pizza delivery business dataset covering the years 2024 to 2025. The objective was to perform structured data cleaning, business intelligence reporting, exploratory data analysis, and predictive modeling to derive actionable insights and enhance operational efficiency.

Project Background:

In the context of increasing competition and operational complexity in the food delivery industry, this project addresses the need for data-driven decision-making. The dataset includes detailed records of pizza orders, delivery times, traffic conditions, and order characteristics. The aim was to simulate a real-world analytics pipeline that integrates statistical methods, data visualization, and machine learning to optimize business performance.

Tools and Technologies Used:

Excel: Used for initial data profiling, timestamp formatting, feature engineering (e.g., delivery gap calculation), and visual inspection using pivot tables and filters.

SQL (MySQL Workbench): Enabled advanced querying, aggregations, joins, and temporal analysis to extract business performance indicators and customer behavior trends.

Power BI: Delivered five interactive dashboards covering sales performance, delivery efficiency, customer preferences, operational KPIs, and delay analysis. These dashboards included slicers, filters, cards, and visual summaries suitable for executive reporting.

Python (Pandas, Matplotlib, Seaborn, Scikit-learn): Used for in-depth exploratory data analysis and building a machine learning model. Patterns in seasonality, traffic, and pizza complexity were analyzed, followed by model training and evaluation.

Key Insights and Findings:

Sales peak in November and December, with evening hours recording the highest order volumes.

Delivery delays were strongly associated with peak traffic periods and complex pizza types.

Larger pizza sizes (Large and XL) dominated order preferences, especially during evenings.

Cities and times with high delay rates were identified, offering opportunities for dispatch improvements.

Machine Learning Component:
A Random Forest Classifier was developed to predict whether an order would experience a delay based on operational variables such as traffic level, complexity of pizza, distance, and time of order.

Model Performance:

Accuracy: 99%

Precision (Delayed Orders): 1.00

Recall (Delayed Orders): 0.94

F1-Score: 0.97

Confusion Matrix showed 153 true negatives, 45 true positives, and only 3 false negatives

These metrics suggest the model is highly reliable for early identification of delivery delays, which can be leveraged in real-time decision systems.

Recommendations:

Integrate the trained machine learning model into the order dispatching system to flag high-risk deliveries.

Use traffic-aware assignment logic to prioritize complex or long-distance orders during off-peak hours.

Provide real-time ETA updates based on traffic predictions.

Incentivize customers to place orders during low-traffic periods through dynamic pricing or promotions.

Limitations:

Dataset is synthetic and not sourced from live delivery environments.

Lacks driver-specific data (e.g., shift patterns, experience), which may influence delivery time.

Model not yet tested in live systems; future A/B testing is recommended to validate practical deployment.

Conclusion:

This project highlights the ability to apply end-to-end data analysis and machine learning techniques across multiple tools to solve real-world business problems. It reflects proficiency in statistical reasoning, business intelligence, predictive modeling, and dashboard design. The work showcases a structured and impactful approach to turning raw data into operational intelligence, suitable for stakeholder decision-making and AI-driven process optimization.

bottom of page