**Naive Bayes** calculates the probability of each class and the conditional probability(the probability of an event ‘A’, given that another ‘B’ has already occurred) of each class which is then used by **Bayes Theorem** to make predictions.

**Gaussian distribution** is assumed so that you can easily estimate these probabilities.

Note: **Naive Bayes** is a **non-linear** machine learning(ML) algorithm. It’s called **naive** because it assumes that each input variable is **independent** even though it is unrealistic for real data. But it is still very effective.

Medium Post: Top 10 algorithms for ML newbies

This **recipe** includes the following topics:

- Load
**classification problem**dataset (Pima Indians) from github - Split columns into the usual feature columns(X) and target column(Y)
- Set k-fold count to 10
- Set
**seed**to reproduce the same random data each time - Split data using
**KFold()**class - Instantiate the classification algorithm:
**GaussianNB** - Call
**cross_val_score()**to run cross validation - Calculate
**mean estimated accuracy**from scores returned by**cross_val_score()**

```
# import modules
import pandas as pd
import numpy as np
from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score
# read data file from github
# dataframe: pimaDf
gitFileURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']
pimaDf = pd.read_csv(gitFileURL, names = cols)
# convert into numpy array for scikit-learn
pimaArr = pimaDf.values
# Let's split columns into the usual feature columns(X) and target column(Y)
# Y represents the target 'class' column whose value is either '0' or '1'
X = pimaArr[:, 0:8]
Y = pimaArr[:, 8]
# set k-fold count
folds = 10
# set seed to reproduce the same random data each time
seed = 7
# split data using KFold
kfold = KFold(n_splits=folds, random_state=seed)
# instantiate the classification algorithm
model = GaussianNB()
# call cross_val_score() to run cross validation
resultArr = cross_val_score(model, X, Y, cv=kfold)
# calculate mean of scores for all folds
meanAccuracy = resultArr.mean() * 100
# display mean estimated accuracy
print("Mean estimated accuracy: %.3f%%" % meanAccuracy)
```

```
Mean estimated accuracy: 75.518%
```