Project information
- Category: Python Skillset
- Description: Final Project Course
- Project date: June, 2021
- Project URL: Github
Breast cancer classification of feature correlation analysis with K-Nearest Neighbour algorithm.
Breast cancer is a type of cancer that starts in the breast. It’s important to understand that most breast lumps are benign and malignant. It is generally believed that benign tumors are of regular shape, mostly round or oval shape, and the tumor contour itself is relatively smooth. But malignant tumor is on the contrary. Therefore, compactness, elliptical compactness, and radial distance spectrum are extracted to reflect the complexity of tumor contour.
Python Skillset:- Data cleaning and preparation (Numpy, Pandas, Matplotlib, etc).
- Correlation analysis.
- Machine learning models.
- Data cleaning and preparation
- Correlation analysis
- Machine learning models
Tools: Python & Google Colaboratory
Change feature [diagnosis] to boolean data (1=malignant, 0=benign).
Balance dataset of malignant and benign data.
Tools: Python & Google Colaboratory
Search correlation between feature [diagnosis] and other features.
Training just use 20 features that have high correlation with [diagnosis].
Tools: Python & Google Colaboratory
Split dataset into training, validation, and testing data.
Build machine learning with KNN method using range of k from 1 to 10.
Search and choose k value with the best accuracy.
using the K-Nearest Neighbor method, with k=1, the accuracy obtained for making this prediction is close to 0.912 or 91.2%.
NB: The images above are just a few sample plots.