Feature selection: Principal Component Analysis

Principal component analysis (PCA) is a mathematical
procedure that transforms a number of (possibly)
correlated attributes into a (smaller) number of
uncorrelated attributes called principal components.

Link: Medium Article on Principal Component Analysis

This recipe includes the following topics:

  • Initialize PCA class with number of components to keep to 3
  • Call fit() to fit the model with X
  • Display principal axes in feature space
  • Call transform() to reduce X to the selected features


# 3. Principal Component Analysis
# import modules
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA

# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)

# convert into numpy array
pimaArr = pimaDf.values

# Though we won't be using the test set in this example
# Let's split our data into the usual train(X) and test(Y) set
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]

# initialize PCA class
# 1. set number of components to keep to 3
# 2. call fit() to run estimator and reduce features
pca = PCA(n_components=3).fit(X)

# display rfe attributes
print("Principal axes in feature space: %s" % pca.components_)
print('-'*60)

# call transform to reduce X to the selected features/columns
pcaArr = pca.transform(X)

# apply dimensionality reduction to X and print first 3 rows
print(pcaArr[:3,])
Principal axes in feature space: [[-2.02176587e-03  9.78115765e-02  1.60930503e-02  6.07566861e-02
   9.93110844e-01  1.40108085e-02  5.37167919e-04 -3.56474430e-03]
 [-2.26488861e-02 -9.72210040e-01 -1.41909330e-01  5.78614699e-02
   9.46266913e-02 -4.69729766e-02 -8.16804621e-04 -1.40168181e-01]
 [-2.24649003e-02  1.43428710e-01 -9.22467192e-01 -3.07013055e-01
   2.09773019e-02 -1.32444542e-01 -6.39983017e-04 -1.25454310e-01]]
------------------------------------------------------------
[[-75.71465491 -35.95078264  -7.26078895]
 [-82.3582676   28.90821322  -5.49667139]
 [-74.63064344 -67.90649647  19.46180812]]

Leave a Reply

Your email address will not be published. Required fields are marked *