Thursday, 14 September 2017

Evaluation Sequence in Transformer Stage - A Quick DataStage Recipe



Recipe:

What is evaluation sequence in Transformer Stage Or Order of Stage & Loop Variable and Derivations

Ingredients:

1. Transformer Stage
     a. Stage Variables
     b. Loop Variables
     c. Derivations


How To:

Evaluate each stage variable initial value
For each input row to process:
Evaluate each stage variable derivation value, unless the derivation is empty
For each output link:
Evaluate each column derivation value
Write the output record
Next output link
Next input row


** The stage variables and the columns within a link are evaluated in the order in which they are displayed in the Transformer editor. Similarly, the output links are also evaluated in the order in which they are displayed




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 12 September 2017

A Newer Version of Jupyter Notebook - Jupyter Lab


We all use Jupyter Notebook (previously known as IPython Notebook) a lot when researching on something or doing stuff on stuff :-)
For those, Who dont know what it is, It is a browser based Notebook which holds the Python code as well as executed Output. You can export it to html, pdf or its native format (*.ipynb).

So coming back to topic, there is a next generation of Jupyter Notebook is available, try it once, I am sure you will fell in love with it. So let's quickly check how you can get it -
Installation:
$ pip install jupyterlab

Execution:
$ jupyter lab

Features which I like the most:
1. You will get a file browser in left side of notebook window for easy access on files
2. Provide 5 quick access button at Leftmost panel (Files, Running, Commands, CellTools & Tabs)
3. Each Notebook will open in same browser tab, means there will be one browser tab and inside that tab there will be multiple jupyter notebook tab will open


For more details, visit - https://github.com/jupyterlab/jupyterlab





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 11 September 2017

Datastage Calling script in Remote Server


Step 1: Setting up UNIX server to automatically login without prompt-ing a password.

1. SSH must be installed in both servers. (primary and remote)
2. User ID for both servers

You can find the Step by Step detail on this Link - Configuring_SSH_on_Linux


Step 2: Creating Datastage job to run script in a remote server

1. Create a new sequencer job
2. Add an Execute Command stage
3. In the Command text value in the ExecCommand tab,

type -

 ssh UserB@ServerB ksh /home/b/test.ksh

This command will execute a script in the remote server.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 8 September 2017

BI Report Testing Trends


Extending my quite old post (ETL Testing - Trends and Challenge - link) by sharing my thoughts on BI Report Testing Issues and Solutions. Feel free to add your views and comments if any.

Business Intelligence
Business Intelligence (BI) and Data Warehouse (DW) systems allow companies to use their data for data analysis, business scenarios and forecasting, business planning, operation optimization and financial management and compliance. To construct Data Warehouses, Extract, Transform and Load (ETL) technologies help to collect data from various sources, transform the data depending on business rules and needs, and load the data into a destination database.

Consolidating data into a single corporate view enables the gathering of intelligence about your business, pulling from different data sources, such as your Physician, Hospitals, Labs and Claims systems.


  • BI Report Testing Trends
Once the ETL part is tested , the data being showed onto the reports hold utmost importance. QA team should verify the data reported with the source data for consistency and accuracy.
  • Verify Report data from source (DWH tables/views)
QA should verify the report data (field/column level) from source by creating required SQL at their own based on different filter criteria (as available on report filter page).
  • Creating SQLs 
Create SQL queries to fetch and verify the data from Source and Target. Sometimes it’s not possible to do the complex transformations done in ETL. In such a case the data can be transferred to some file and calculations can be performed.
  • GUI & Layout
Verifying Report GUI (selection page) and layout (report output layout).
  • Performance verification
Verifying Report’s performance (report’s response time should be under predefined time limit as specified by business need). Also report’s performance can be tested for multiple users (those # of users are expected to access the report at same time and this limit should be defined by business need)
  • Security verification
Verifying that only authorized users can access the report OR some specific part of report (if that part should not allow to any general user)





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Saturday, 12 August 2017

Delete all lines in Notepad++ except lines containing a pattern


Who dont want to keep the only items which needed and we want to have when working with lots of junk data and want to fetch only which requires :-)

This can be done very easily in NotePad++ .

1. Use Ctrl-F to open Search box and  Select "Mark"

2. Put the Pattern in "Find What" box (in my case, I want to keep "Singh" only)




3. Then Click on Mark All to Mark all the lines which is having "Singh" in it.

4. When Marked, it will look like below -






5. After marking, Close this pop-up and Go To Search --> Bookmark --> Remove Unmarked Lines



As soon as Click on Remove Unmarked Lines, it will delete the all lines other than marked line.



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 26 July 2017

View Hidden File in WinSCP


Recently I have stuck with the problem where I have to download one of configuration file from server which is hidden. Usually we use WinSCP or Filezilla to do these kind of task quickly so I went for WinSCP.  But I was unable to locate the file on WinSCP as this is hidden.

Google helped me once again... This is how we can view the hidden file in WinSCP


Short Way:
  • Use Alt-Clt-H key combination to view the hidden files.

Long Way:
  • Go to Options --> Preferences --> Panels
  • At Right Hand Side, Check the Show Hidden Files Option





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 14 July 2017

Utility to Get Last Run Log of DataStage Job


Love the new feature of IBM DataStage to fetch the last run log of any jobs with the help of "dsjob" command.
In older version of DataStage, it is very tedious to get the last run log but from/after v9.1 IBM added an additional feature in dsjob command. Lets see hows this works -


  • To Fetch Last Run Log:
  • To Fetch Second Last Run Log
  • To Fetch Third Last Run Log
  • To Fetch Nth Last Run Log





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/