Feature selection: Recursive Feature Elimination

Recursive feature elimination works by recursively removing attributes based on an external estimator. The example below using the logistic regression algorithm as an estimator.

This recipe includes the following topics:

  • Initialize external estimator: LogisticRegression class
  • Initialize RFE class with reduced output feature set to 3
  • Call fit() to run estimator and reduce features
  • Display of RFE attributes such as mask of selected features
  • Call transform() on the input data

# import modules
import pandas as pd
import numpy as np
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression

# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)

# convert into numpy array
pimaArr = pimaDf.values

# Let's split our data into the usual train(X) and test/target(Y) set
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]

# initialize external estimator
model = LogisticRegression()

# initialize RFE class
# 1. select LogisticRegression as estimator
# 2. set output of reduced feature to 3
# 3. call fit() to run estimator and reduce features
rfe = RFE(model, 3).fit(X, Y)

# display rfe attributes
print("Selected Features: %s" % rfe.support_)
print("Feature Ranking: %s" % rfe.ranking_)

# call transform to reduce X to the selected features/columns
rfeArr = rfe.transform(X)

# print first 3 rows of output with only the best 3 features/columns

Selected Features: [ True False False False False  True  True False]
Feature Ranking: [1 2 3 5 6 1 1 4]
[[ 6.    33.6    0.627]
 [ 1.    26.6    0.351]
 [ 8.    23.3    0.672]]

