Training and Optimising the Accuracy of Classification Models for Churn Prediction
Developed and optimised machine learning models to predict customer churn, focusing on enhancing model accuracy through data processing and hyperparameter tuning.
Overview
As part of a machine learning project at Teesside University, Petgrave.io's data scientist was tasked with developing and optimising classification models to predict customer churn. The goal was to leverage open-source data and industry-standard machine learning techniques to deliver a robust and accurate predictive solution.
The Challenge
The project required the data scientist to:
Collect, process, clean, and visualise the data to prepare it for model training
Train, test, and optimise various classification models, including K-Nearest Neighbors (KNN), Support Vector Machines (SVM), and Random Forest
Evaluate the performance of the models and provide recommendations for further improvement
Our Approach
Data Preparation: The data scientist utilised Python, Scikit-Learn, Pandas, and NumPy to source, clean, and transform the customer churn dataset, performing extensive exploratory data analysis (EDA) to gain a deep understanding of the data.
Model Training and Optimization: The data scientist then trained multiple classification models, including KNN, SVM, and Random Forest, using Scikit-Learn. They employed techniques such as cross-validation and hyperparameter tuning to optimise the models' accuracy and performance.
Reporting and Findings: The data scientist documented the entire process, from data preprocessing to model training and evaluation, in a comprehensive scientific report. This report included a detailed analysis of the models' performance, as well as recommendations for further improving the churn prediction capabilities.
Technologies and Tools
The Petgrave.io data scientist leveraged the following technologies and tools to complete the machine learning project:
Programming Language: Python was the primary programming language used for the project.
Machine Learning Libraries: Scikit-Learn was the primary library used for model training, validation, and evaluation.
Data Manipulation: Pandas and NumPy were used for data processing, cleaning, and transformation.
The Results
Developed a clean, well-structured dataset for churn prediction modelling
Trained, tested, and optimised multiple classification models, including KNN, SVM, and Random Forest
Produced a comprehensive scientific report detailing the project's methodology, findings, and recommendations