Regression Algorithm: Elastic Net Regression

ElasticNet Regression is a form of regularization regression that combines the properties of both Ridge regression and LASSO regression. It reduces the model complexity by using both the
L2-norm (sum squared coefficient values) and the L1-norm (sum absolute coefficient values).

Coefficients are basically the weights assigned to the features, based on their importance.

Example: In a linear regression equation: (y = ax + b), ‘a’ is a coefficient.

Note: ElasticNet Regression is a linear machine learning(ML) algorithm which is simpler and faster than non-linear algorithms.

This recipe includes the following topics:

  • Load regression problem Boston house price dataset from github
  • Split columns into the usual feature columns(X) and target/prediction column(Y)
  • Split data using KFold() class with kFold:10, seed:7
  • Instantiate a regression model (ElasticNet)
  • Set scoring parameter to ‘neg_mean_squared_error’
  • Call cross_val_score() to run cross validation
  • Calculate Mean Squared Error from scores returned by cross_val_score()

Caveat: cross_val_score() reports scores in ascending order (largest score is best). But MSE is naturally descending scores (the smallest score is best). Thus we need to use ‘neg_mean_squared_error’ to invert the sorting. This also results in the score to be negative even though the value can never be negative.

# import modules
import pandas as pd
from sklearn.linear_model import ElasticNet
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

# read data file from github
# dataframe: houseDf
gitFileURL = ''
cols = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
houseDf = pd.read_csv(gitFileURL, delim_whitespace=True, names = cols)

# convert into numpy array for scikit-learn
houseArr = houseDf.values

# Let's split columns into the usual feature columns(X) and target column(Y)
# Y represents the target 'MEDV' column
# MEDV: median value of owner-occupied homes in $1000s
X = houseArr[:, 0:13]
Y = houseArr[:, 13]

# set k-fold count
folds = 10

# set seed to reproduce the same random data each time
seed = 7

# split data using KFold
kfold = KFold(n_splits=folds, random_state=seed)

# instantiate a regression model
model = ElasticNet()

# set scoring parameter to 'neg_mean_squared_error'
scoring = 'neg_mean_squared_error'

# call cross_val_score() to run cross validation
resultArr = cross_val_score(model, X, Y, cv=kfold, scoring=scoring)

# calculate mean of scores for all folds
mse = resultArr.mean()

# display Mean Squared Error
# descending score(smallest score is best) is denoted by negative even though the value is positive
print("Mean Squared Error: %.3f" % mse)
Mean Squared Error: -31.165

Leave a Reply

Your email address will not be published. Required fields are marked *