Saturday, 31 December 2016

Learning Pandas - DataFrame #2

Friday, 30 December 2016

Learning Pandas - Series #1

Wednesday, 28 December 2016

Learning Graphlab - SFrame #2

In last post Learning Graphlab - SFrame #1, we have learn basics of SFrame, like how to create, add or delete the columns in SFrame. In this post, we will revise it once again and learn some advance features of SFrame. Have a good learnng !!!

You can view the Jupyter Notebook for the same HERE

Thursday, 22 December 2016

DataStage Scenario #17 - Get Transitive relation between columns

Goal : To get the data from two columns which have transitive relationship between them

A -> B
B -> C


A -> C

Col1 Col2
a b
b c
s u
u p
1 2
2 3

Col1 Col2
a c
s p
1 3

Like the below page to get update  

Wednesday, 21 December 2016

R Points #2 - DataFrame & List Basics

Sunday, 18 December 2016

R Points #1 - Matrix & Factor Basics

Saturday, 17 December 2016

R Points #0 - Basics n Vector


Like the below page to get update

Monday, 12 December 2016

vmware player powering on internal error

vmware workstation logoLast night, I struggled with "vmware player powering on internal error" for almost an hour when I was trying to run a vmware os on my machine and followed so many tweaks to resolve this but no success.
After lots of googling, I found one solution which worked for me So sharing the same here if it can help someone stuck like me :)

Thursday, 8 December 2016

Notepad++ tip - Format JSON file

Notepad++ is a very powerful tool with lots of plugins and functionality which can reduce a lot of our work. Today, we will see how to deal with JSON data in Notepad++.

1. First of all, whenever you are opening any data or code file, always select respective language style. How you do it is below -

Open data/code file --> Go to Language Menu --> Select respective language setting (in our case, its J - JSON )

After doing this, you will see that code/data text is more visible to your eyes.

2. Install some plugins, for JSON, install below ones -
a. JSON Viewer
b. JSToolNpp

Go to Plugins --> Plugin Manager --> Show Plugin Manager --> Available ---> Select & Install

3. For formatting JSON, select all content, now use Ctrl+Alt+Shift+M or Ctrl+Alt+J
4. Your JSON file has been formatted :-)

You can download some useful plugins from here ->
If dont have access, Use this -

Place this plugin folder into your notepad++ installation directory and restart the notepad++.

Like the below page to get update

Wednesday, 7 December 2016

Import the jobs from DS windows client

As we have discussed a script which can export the datastage jobs from your client system ( likewise we can write another one to import the jobs. Let's see how -

DsImportJobsClient.bat :

This Script read all the *.dsx job name from the specified Directory and Sub-Directory and import to the Specified project. It can also build (Only BUILD) the existing package created on Information Server Manager and send it to the specified location on client machine.

To use the build feature you need to make sure the package has been created with all the needed jobs, saved and closed. Only update to the selected job will be taken care automatically. To add/delete a job, you need to do manually.

Modify the and ImportJobList.txt file and Go the .bat dir and then execute the importAndBuild.bat. :

ImportJobList.txt :

DsImportJobsClient.bat :

Like the below page to get update

Wednesday, 30 November 2016

Learning Graphlab - SFrame #1

Hoping you guys went through the last post (Lnk -> Getting Started with Graphlab), In this post we will do some handson SFrame datatype of Graphlab which is same as dataframe of pandas python library.

i. Reading the CSV file

ii. save DataSet 

iii. load DataSet

iv. Check Total Rows and Columns

v. Check Columns data type and Name

vi. Add new column

vii. Delete column

viii. Rename column

ix. Column Swapping (location)

Like the below page to get update

Sunday, 27 November 2016

Getting Started with Graphlab - A Python library for Machine Learning

Before Starting with Graphlab, We have to configure our system with some basic tools such as Python, Jupyter Notebook etc. You can find 'How-To' on this link -

What is GraphLab ??
GraphLab Create is a Python library, backed by a C++ engine, for quickly building large-scale, high-performance data products. Some key features of GraphLab Create are:
  • Analyze terabyte scale data at interactive speeds, on your desktop.
  • A Single platform for tabular data, graphs, text, and images.
  • State of the art machine learning algorithms including deep learning, boosted trees, and factorization machines.
  • Run the same code on your laptop or in a distributed system, using a Hadoop Yarn or EC2 cluster.
  • Focus on tasks or machine learning with the flexible API.
  • Visualize data for exploration and production monitoring.
After the installation of Graphlab library we can use it as any python library.

Use Jupyter Notebook for starter, Open a Python notebook in Jupyter Notebook and execute below commands to see graphlab working -

 a. Importing Graphlab - 


b. Reading CSV file
This method will parse the input file and convert it into a SFrame variable


c. Getting Started with SFrame 

i. View content of SFrame variable sf


ii. View Head lines (top lines) 


ii. View Tail lines (last lines)

Like the below page to get update

Monday, 21 November 2016

Reading DSParam - datastage parameter file

I am sharing a utility which can help you to read DSParam file which holds all the environmental datastage parameters.

Utility to view contents of DSParams file. Useful when trying to see what all the customer has set at the project level.

$ cat DSParams | ./ | more
$ cat DSParams | ./ > outputfile

1. copy script text below to a file ( on a UNIX system
2. Set execute permissions on this file. chmod 777
3. Usually perl is in /usr/bin/perl but you might have to adjust this path if neccessary. (hint "which perl" should tell you which one to use)
4. cat the DSParams file from the project you are concerned with and redirect the output to this script. You may have to put the Fully Qualified Path for this file.
5. capture the output to screen or file. File may be useful to have the customer send the info to you in email.

Like the below page to get update

Monday, 14 November 2016

DataStage Partitioning #3

Best allocation of Partitions in DataStage for storage area

No of Ways
Volume of Data
Best way of Partition
Allocation of Configuration File (Node)
DB2 EEE  extraction in serial
DB2 EEE extraction in parallel
Node number = current node (key)
64 (Depends on how many nodes are allocated)
Partition or Repartition in the Stages of DataStage
Modulus (It should be single key that to integer)
Hash (Any number of keys with different data type)
8 (Depends on how many nodes are allocated for the job)
Writing into DB2
Writing into Dataset
1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
Writing into Sequential File


Best allocation of Partitions in DataStage for each stage

S. No
Best way of Partition
Important points
Left and Right link: Hash or Modulus
All the input links should be sorted based on the joining key and partitioned with higher key order.

Main link: Hash or same
Reference link: Entire
Both the links need not be in the sorted order

Master and update link: Hash or Modulus
All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.

Remove Duplicate, Aggregator
Hash or Modulus
If the input link is in sorted order based on the key it will perform better.

Hash or Modulus
Sorting happens after partitioning

Transformer, Funnel, Copy, Filter
Change Capture
Left and Right link: Hash or Modulus
Both the input links should be in the sorted order based on the key and partitioned with higher key order.

Like the below page to get update