Summary statistics are numbers that summarize properties of the data. i.e - Mean, Spread, tendency etc. We will see each one by one.

Let's take a input dataset -

Input: 45, 67, 23, 12, 9, 43, 12, 17, 91

Sorted: 9, 12, 12, 17, 23, 43, 45, 67, 91

**Frequency:**The frequency of an attribute value is the percentage of time the value occurs in the data set.

In our dataset, Frequency of 12 is 2.

**Mode:**The mode of a an attribute is the most frequent attribute value

Mode for our dataset is 2 as 12 is the most frequent item which occurs 2 time

*Things to remember:*

*i- There is no mode if all the values are same*

*ii - Same is applicable if all values occurrence is 1*

*Usually, Mode and Frequency are used for categorical data*

**Percentiles:**This used for continuous data.

Given an ordinal or continuous attribute x and a number p between 0 and 100, the pth percentile

is a value x

_{p}of x such that p% of the observed values of x are less than x

_{p}.

*How to calculate the Percentile:**1. Count the total item in dataset = N*

*2. Multiply the percentile p with total no of items = N*p*

*3. This will give you a no which can be a float or integer*

*4. If it is a float, round off it to nearest integer, named p*

^{th}no*i. Sort the data into increasing order*

*ii. Now, p*

^{th}no in this dataset is your percentile value*5. If it is an integer no*

*i. Sort the data into increasing order*

*ii. Now, average of p*

^{th}no and (p+1)

^{th}

*no*

*in this dataset is your percentile value*

*So when we say, 20% means -*

No of items in dataset = 9

No of items which should be less than x

_{p}. - 9*20% = 1.8

Round off this to nearest integer - 2

Our dataset is already sorted in increasing order, so check the 2nd value - 12

likewise, 25%, 50% and 75% is - 9*25%, 9*50%, 9*75% = 2.25

*, 4.5*

^{th}*, 6.75*

^{th}

^{th}2

*, 5*

^{th}*, 7*

^{th}*- 12, 23, 45*

^{th}This is one way to calculate the percentile, If you use calculator or some other method, it might be slightly different.

**Mean or Average:**Sum(all items) / Total no of element

Mean - (9+12+12+17+23+43+45+67+91)/9 = 34.4

However, the mean is very sensitive to outliers. So to understand the data tendency, we go for median rather than means.

**Median:**Median is 50 percentile, or middle value

*How to get Median/Middle value -*

*a. Sort the data into increasing order*

*b. Get total no of elements - N*

*if N is even - median = ( N/2th element + [N/2 + 1]th element) / 2*

*if N is odd - median = ceil(N/2)th element*

For our case, N = 9, which is odd, so ceil(9/2) = ceil(4.5) = 5th element

Median = 23

**Range:**Difference between Max and Min is called range.

Input dataset range - 91-9 = 82

**Variance:**The variance or standard deviation is the most common measure of the spread of a set of points.

`variance(x) = \sigma^2 = \frac{1}{n-1}\Sigma_{i=1}^n(x_i-\bar{x})^2`

where `\bar{x}` is Mean of all value of x

m = total no of items in dataset

`\sigma` is standard deviation

*Like the below page to get update*

*https://www.facebook.com/datastage4you*

*https://twitter.com/datagenx*

*https://plus.google.com/+AtulSingh0/posts*

*https://datagenx.slack.com/messages/datascience/*