My e-Notes about DataScience, Machine Learning, Python, Data Analytics, DataStage, DWH and ETL Concepts

Breaking

Sunday, 19 March 2017

5 Number Summary - Statistics Basics


What is 5 no summary?

5 no summary is an statistical measure to get the idea about the data tendency.

It includes :

1.  Minimum
2.  Q1 (25 percentile)
3.  Median (middle value or 50 percentile)
4.  Q3 (75 percentile)
5.  Maximum


5 number summary

How to calculate or get these values??

Input data :  45, 67, 23, 12, 9, 43, 12, 17, 91

Step1:  Sort the data

9, 12, 12, 17, 23, 43, 45, 67, 91

Step2:  You can easily get the minimum and maximum no

Min : 9
Max : 91

Step 3: Finding the median - Finding the middle value, dont confuse with Mean or Average. 

How to get Median/Middle value - 
a. Sort the data into increasing order
b. Get total no of elements - N
     if N is even -  median =   ( N/2th element + [N/2 + 1]th element) / 2
     if N is odd - median = ceil(N/2)th element

For our case, N = 9, which is odd, so ceil(9/2) = ceil(4.5) = 5th element 
Median = 23

Step 4: Finding our the Q1 and Q3 (called Quantile) is very easy. Divide the element list into 2 list by Median value - 

 (9, 12, 12, 17), 23, (43, 45, 67, 91) 

Now, Find out the Median for 1st list which is Q1 and Median for 2nd list which is Q3

As we can see, list1 and list2 both are having even no of elements so  - 

Median of list1 (Q1) =  ( N/2th element + [N/2 + 1]th element) / 2
                                  =  ( 4/2th element + [4/2 +1]th element) / 2
                                  =  ( 2nd element  + 3rd element ) /2
                                  =  (12 + 12 ) / 2 
                            Q1 = 12

Median of list2 (Q3) = ( 45 + 67 ) / 2
                                  = 112 / 2
                                  = 56 

We got the Q1 (12) and Q3 (56). 

Our 5 no summary is calculated which is -  

min, Q1, median, Q3, max 
9,     12,  23,         56, 91




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/


Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.