Hyperparameter optimization: Grid Search

Machine learning algorithms/models can have many parameters and finding the best combination is a problem. Hyperparameter optimization or tuning is the problem of searching a set of optimal hyperparameters for a learning algorithm.

Grid search is a tuning technique that simply performs an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm.

In this example, we are using Ridge Regression model where alpha is a hyperparameter which denotes regularization strength(must be a positive float). Regularization improves the conditioning of the problem and reduces the variance of the estimates.

Link: scikit-learn: Ridge documentation


This recipe includes the following topics:

  • Load the classification problem dataset (Pima Indians) from github
  • Split columns into the usual feature columns(X) and target column(Y)
  • Create a param_grid dictionary with parameters names
  • Instantiate the classification algorithm: Ridge
  • Instantiate the GridSearchCV class with estimator and param_grid
  • Find the mean cross-validated score
  • Find the (set of) parameter that achieved the best score


# import modules
import pandas as pd
import numpy as np
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV

# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)

# convert into numpy array for scikit-learn
pimaArr = pimaDf.values

# Let's split columns into the usual feature columns(X) and target column(Y)
# Y represents the target 'class' column whose value is either '0' or '1'
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]

# create a param_grid dictionary with parameters names
alphas = np.array([1,0.1,0.01,0.001,0.0001,0])
param_grid = {'alpha': alphas}

# instantiate the classification algorithm: Ridge()
model = Ridge()

# perform a Grid Search to find the best (combination) hyperparameters
grid = GridSearchCV(estimator=model, param_grid=param_grid)

# call fit() to train the grid search using X and Y data
grid.fit(X, Y)

# Find the mean cross-validated score of the best_estimator
bestScore = grid.best_score_

# Find the (set of) parameter that achieved the best score
bestAlpha = grid.best_estimator_.alpha

print("Best Score: %.5f, Best Alpha(Hyperparameter): %f" % (bestScore, bestAlpha))
Best Score: 0.27962, Best Alpha(Hyperparameter): 1.000000

Leave a Reply

Your email address will not be published. Required fields are marked *