Before implementing any algorithm on the given data, It is a best practice to explore it first so that you can get an idea about the data. Today, we will learn how to check for missing/Nan/NULL values in data.
1. Reading the data
Reading the csv data into storing it into a pandas dataframe.
Checking out the data, how it looks by using head command which fetch me some top rows from dataframe.
3. Checking NULLs
Pandas is proving two methods to check NULLs - isnull() and notnull()
These two returns TRUE and FALSE respectively if the value is NULL. So let's check what it will return for our data
Check 0th row, LoanAmount Column - In isnull() test it is TRUE and in notnull() test it is FALSE. It mean, this row/column is holding null.
But we will not prefer this way for large dataset, as this will return TRUE/FALSE matrix for each data point, instead we would interested to know the counts or a simple check if dataset is holding NULL or not.
Python also provide any() method which returns TRUE if there is at least single data point which is true for checked condition.
If you want to get any particular column's NaN calculations -
Here, I have attached the complete Jupyter Notebook for you -
If you want to download the data, You can get it from HERE.