Generate descriptive statistics

Descriptive statistics summarize the central tendency, dispersion, and shape of a dataset’s distribution.

This recipe includes the following topics:

  • Load csv using Pandas
  • Display shape of data
  • Display data types for each attribute
  • Set display options
  • Generate descriptive statistics


# import module
import pandas as pd

fileGitURL = 'https://raw.githubusercontent.com/andrewgurung/data-repository/master/pima-indians-diabetes.data.csv'

# define column names
cols = ['preg', 'plas', 'pres', 'skin', 'test', 'mass', 'pedi', 'age', 'class']

# load file as a Pandas DataFrame
pimaDf = pd.read_csv(fileGitURL, names=cols)

# get shape (row, columns size)
shape = pimaDf.shape

# get data types for each attribute
types = pimaDf.dtypes

# set options
pd.set_option('precision', 3)
pd.set_option('display.width', 100)

# generate descriptive statistics
stats = pimaDf.describe()

# display results
print(shape)
print(types)
print(stats)
(768, 9)

preg       int64
plas       int64
pres       int64
skin       int64
test       int64
mass     float64
pedi     float64
age        int64
class      int64
dtype: object

          preg     plas     pres     skin     test     mass     pedi      age    class
count  768.000  768.000  768.000  768.000  768.000  768.000  768.000  768.000  768.000
mean     3.845  120.895   69.105   20.536   79.799   31.993    0.472   33.241    0.349
std      3.370   31.973   19.356   15.952  115.244    7.884    0.331   11.760    0.477
min      0.000    0.000    0.000    0.000    0.000    0.000    0.078   21.000    0.000
25%      1.000   99.000   62.000    0.000    0.000   27.300    0.244   24.000    0.000
50%      3.000  117.000   72.000   23.000   30.500   32.000    0.372   29.000    0.000
75%      6.000  140.250   80.000   32.000  127.250   36.600    0.626   41.000    1.000
max     17.000  199.000  122.000   99.000  846.000   67.100    2.420   81.000    1.000

Leave a Reply

Your email address will not be published. Required fields are marked *