Showing posts with label ML. Show all posts
Showing posts with label ML. Show all posts

Sunday, 19 March 2017

5 Number Summary - Statistics Basics


What is 5 no summary?

5 no summary is an statistical measure to get the idea about the data tendency.

It includes :

1.  Minimum
2.  Q1 (25 percentile)
3.  Median (middle value or 50 percentile)
4.  Q3 (75 percentile)
5.  Maximum


5 number summary

How to calculate or get these values??

Input data :  45, 67, 23, 12, 9, 43, 12, 17, 91

Step1:  Sort the data

9, 12, 12, 17, 23, 43, 45, 67, 91

Step2:  You can easily get the minimum and maximum no

Min : 9
Max : 91

Step 3: Finding the median - Finding the middle value, dont confuse with Mean or Average. 

How to get Median/Middle value - 
a. Sort the data into increasing order
b. Get total no of elements - N
     if N is even -  median =   ( N/2th element + [N/2 + 1]th element) / 2
     if N is odd - median = ceil(N/2)th element

For our case, N = 9, which is odd, so ceil(9/2) = ceil(4.5) = 5th element 
Median = 23

Step 4: Finding our the Q1 and Q3 (called Quantile) is very easy. Divide the element list into 2 list by Median value - 

 (9, 12, 12, 17), 23, (43, 45, 67, 91) 

Now, Find out the Median for 1st list which is Q1 and Median for 2nd list which is Q3

As we can see, list1 and list2 both are having even no of elements so  - 

Median of list1 (Q1) =  ( N/2th element + [N/2 + 1]th element) / 2
                                  =  ( 4/2th element + [4/2 +1]th element) / 2
                                  =  ( 2nd element  + 3rd element ) /2
                                  =  (12 + 12 ) / 2 
                            Q1 = 12

Median of list2 (Q3) = ( 45 + 67 ) / 2
                                  = 112 / 2
                                  = 56 

We got the Q1 (12) and Q3 (56). 

Our 5 no summary is calculated which is -  

min, Q1, median, Q3, max 
9,     12,  23,         56, 91




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/


Saturday, 14 January 2017

Learning Numpy #2

Thursday, 12 January 2017

Learning Numpy #1


Numpy is a python library used for numerical calculations and this is better performant than pure python. In this notebook, I have shared some basics of Numpy and will share more in next few posts. I hope you find these useful.





Click Here for Next Tutorial ~

Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 11 January 2017

My Learning Path for Machine Learning


I am a Python Lover guy so my way includes lots of Python points. If you dont know the basics of this wonderful language, start it from HERE else you can follow the links which I am going to share.

Learning ML is not only studying ML algorithms, it includes Basic Algebra, Statistics, Algorithms, Programming and lot more. But no need to afraid as such :-) we need to start from somewhere.....

This is my github repo, you can fork it and follow me with these 2 links --

Fork Fork
Follow - Follow @atulsingh0
I am still updating this list and welcome you to update this as well.



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 6 January 2017

10 minutes with pandas library

Thursday, 5 January 2017

Learning Pandas #5 - read & write data from file

Wednesday, 4 January 2017

Learning Pandas #4 - Hierarchical Indexing

Sunday, 1 January 2017

Learning Pandas #3 - Working on Summary & MissingData

Saturday, 31 December 2016

Learning Pandas - DataFrame #2

Friday, 30 December 2016

Learning Pandas - Series #1

Wednesday, 28 December 2016

Learning Graphlab - SFrame #2

In last post Learning Graphlab - SFrame #1, we have learn basics of SFrame, like how to create, add or delete the columns in SFrame. In this post, we will revise it once again and learn some advance features of SFrame. Have a good learnng !!!

You can view the Jupyter Notebook for the same HERE




Sunday, 18 December 2016

R Points #1 - Matrix & Factor Basics

Saturday, 17 December 2016

R Points #0 - Basics n Vector




=======================================================
=======================================================



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 30 November 2016

Learning Graphlab - SFrame #1


Hoping you guys went through the last post (Lnk -> Getting Started with Graphlab), In this post we will do some handson SFrame datatype of Graphlab which is same as dataframe of pandas python library.

i. Reading the CSV file
==
rdCSV

ii. save DataSet 
==

iii. load DataSet
==


iv. Check Total Rows and Columns
==
rowNum

v. Check Columns data type and Name
==
colTypes

vi. Add new column
==
addCol

vii. Delete column
==

viii. Rename column
==
renameCol

ix. Column Swapping (location)
==






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 12 September 2016

Python Points #15 - Exceptions