Save and Load Model using pickle

The pickle module implements binary protocols for serializing and de-serializing a Python object structure useful for saving, distributing and reusing trained machine learning models.


This recipe includes the following topics:

  • Load classification problem dataset (Pima Indians) from github
  • Split columns into the usual feature columns(X) and target column(Y)
  • Split data into train and test subset using train_test_split
  • Instantiate the classification algorithm: LogisticRegression
  • Call fit() to train the model on the test dataset
  • Save model to disk using pickle: dump
  • Load model from disk using pickle: load
  • Evaluate the model by calling score() on the unseen dataset


# import modules
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from pickle import dump
from pickle import load

# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)

# convert into numpy array for scikit-learn
pimaArr = pimaDf.values

# Let's split columns into the usual feature columns(X) and target column(Y)
# Y represents the target 'class' column whose value is either '0' or '1'
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]

# set test size to 33%
test_size = 0.33

# set seed to create a reproducible set of random data
seed = 7

# split data into train and test subset
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)

# instantiate the classification algorithm: LogisticRegression
model = LogisticRegression()

# call fit() to train the model
model.fit(X_train, Y_train)

# save model to disk using pickle: dump
# wb: opened for writing in binary mode
filename = 'trained_model_1.sav'
dump(model, open(filename, 'wb'))


# in a different notebook

# load the saved model from disk
# rb: opened for reading in binary mode
loaded_model = load(open(filename, 'rb'))

# evaluate the model on unseen data
accuracy = loaded_model.score(X_test, Y_test)

# display mean estimated accuracy
print("Accuracy: %.3f%%" % accuracy)
Accuracy: 0.756%

Leave a Reply

Your email address will not be published. Required fields are marked *