Project information

  • Category: Python Skillset
  • Description: Final Project Course
  • Project date: June, 2021
  • Project URL: Github

Breast cancer classification of feature correlation analysis with K-Nearest Neighbour algorithm.

Breast cancer is a type of cancer that starts in the breast. It’s important to understand that most breast lumps are benign and malignant. It is generally believed that benign tumors are of regular shape, mostly round or oval shape, and the tumor contour itself is relatively smooth. But malignant tumor is on the contrary. Therefore, compactness, elliptical compactness, and radial distance spectrum are extracted to reflect the complexity of tumor contour.

Python Skillset:
  1. Data cleaning and preparation (Numpy, Pandas, Matplotlib, etc).
  2. Correlation analysis.
  3. Machine learning models.
Step:
  1. Data cleaning and preparation

  2. Tools: Python & Google Colaboratory
    Change feature [diagnosis] to boolean data (1=malignant, 0=benign).


    Balance dataset of malignant and benign data.


  3. Correlation analysis

  4. Tools: Python & Google Colaboratory
    Search correlation between feature [diagnosis] and other features.


    Training just use 20 features that have high correlation with [diagnosis].

  5. Machine learning models

  6. Tools: Python & Google Colaboratory
    Split dataset into training, validation, and testing data.
    Build machine learning with KNN method using range of k from 1 to 10.
    Search and choose k value with the best accuracy.

    using the K-Nearest Neighbor method, with k=1, the accuracy obtained for making this prediction is close to 0.912 or 91.2%.



NB: The images above are just a few sample plots.