Showing posts with label Cleanse. Show all posts
Showing posts with label Cleanse. Show all posts

Saturday, 25 March 2017

Check if Python Pandas DataFrame Column is having NaN or NULL


Before implementing any algorithm on the given data, It is a best practice to explore it first so that you can get an idea about the data. Today, we will learn how to check for missing/Nan/NULL values in data.

1. Reading the data
Reading the csv data into storing it into a pandas dataframe.


2. Exploring data
Checking out the data, how it looks by using head command which fetch me some top rows from dataframe.


3. Checking NULLs
Pandas is proving two methods to check NULLs - isnull() and notnull()
These two returns TRUE and FALSE respectively if the value is NULL. So let's check what it will return for our data

isnull() test

notnull() test

Check 0th row, LoanAmount Column - In isnull() test it is TRUE and in notnull() test it is FALSE. It mean, this row/column is holding null.

But we will not prefer this way for large dataset, as this will return TRUE/FALSE matrix for each data point, instead we would interested to know the counts or a simple check if dataset is holding NULL or not.

Use any()
Python also provide any() method which returns TRUE if there is at least single data point which is true for checked condition.


Use all()
Returns TRUE if all the data points follow the condition.


Now, as we know that there are some nulls/NaN values in our data frame, let's check those out - 

data.isnull().sum() - this will return the count of NULLs/NaN values in each column.


If you want to get total no of NaN values, need to take sum once again -

data.isnull().sum().sum()


If you want to get any particular column's NaN calculations - 




Here, I have attached the complete Jupyter Notebook for you -



If you want to download the data, You can get it from HERE.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/