Tuesday, 25 April 2017

Graphical Display of Basic Stats of Data



After a long time, got a chance to share somethings with you guys, so feeling awesome :-), Today we are gonna see the Graphical Display of Data Stats or sometime we call it Exploratory Data Analysis as well, This is the best way to understand your data in very less time and set your analysis path for it. So without doing more chats, let's start -

1. Scatter Plot

Very Basic, Very Easy and Most Used EDA(Exploratory Data Analysis) technique. It is 2-D plot between X and Y variables where X or Y can be numeric data features or columns.
               With this plot we can easily see if there is any relationship, pattern or trends between between these 2 features or any data outlier existing. It is also useful to explore possibility of correlation relationships. Correlation can be positive, negative or neutral.

Now, let's look into a scatter plot -
I am using IRIS dataset and Python matplotlib library for this illustration - -

scatter iris

2. Histogram

Histogram plot is one of the oldest plotting technique to summarize the data distribution of a attribute X. X can be numerical feature and height of bar is frequency. Resulting plot is also called Bar Chart.

histogram bar chart iris

3. Quantile Plot (Bar Charts)

Quantile Plot or Bar Charts also used to display the uni-variate variables data distribution as well as plot the percentile information with outlier detection.

qunatile box plot

Keep looking for this space for further update.

Happy Learning




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 5 April 2017

NULL Handling in Sequential File Stage



DataStage has a mechanism for denoting NULL field values. It is slightly different in server and parallel jobs. In the sequential file stage a character or string may be used to represent NULL column values. Here's how represent NULL with the character "~":

Server Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. On the right enter the "~" next to "Default NULL string:"

Parallel Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. Right click on "Field defaults" ==> "Add sub-property" and select "Null field value"
4. Enter the "~" in the newly created field.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Saturday, 1 April 2017

Get Over Running DataStage Job Details


With the script below, we will get a list of jobs which are taking more time to complete than last run time. By some tweaks, we can use this script to monitor any kind of process.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/