Classification Model evaluation: Classification Report

There are 3 different APIs for model evaluation:
1. Estimator score method: Estimator/model object has a ‘score()’ method that provides a default evaluation
2. Scoring parameter: Predefined scoring parameter that can be passed into cross_val_score() method
3. Metric function: Functions defined in the metrics module

Classification Report is an example of Metric function API.
It builds a text report showing the main classification metrics.


This recipe includes the following topics:

  • Load data/file from github
  • Split columns into the usual feature columns(X) and target column(Y)
  • Set set test_size to 33%
  • Set seed to reproduce the same random data each time
  • Split data using train_test_split()
  • Instantiate a classification model (LogisticRegression)
  • Call fit() to train model using X_train and Y_train
  • Call predict() to use the model for making predictions
  • Generate classification report using classification_report()


# import modules
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report

# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)

# convert into numpy array for scikit-learn
pimaArr = pimaDf.values

# Let's split columns into the usual feature columns(X) and target column(Y)
# Y represents the target 'class' column whose value is either '0' or '1'
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]

# set test_size to 33%
test_size = 0.33

# set seed to reproduce the same random data each time
seed = 7

# split data using train_test_split() helper method
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

# instantiate a classification model
model = LogisticRegression()

# call fit() to train model using X_train and Y_train
model.fit(X_train, Y_train)

# call predict() to use the model for making predictions
predicted = model.predict(X_test)

# generate classification report
report = classification_report(Y_test, predicted)

# print confusion matrix
print(report)
             precision    recall  f1-score   support

        0.0       0.77      0.87      0.82       162
        1.0       0.71      0.55      0.62        92

avg / total       0.75      0.76      0.75       254

Leave a Reply

Your email address will not be published. Required fields are marked *