Showing posts with label Datastage. Show all posts
Showing posts with label Datastage. Show all posts

Monday, 11 September 2017

Datastage Calling script in Remote Server


Step 1: Setting up UNIX server to automatically login without prompt-ing a password.

1. SSH must be installed in both servers. (primary and remote)
2. User ID for both servers

You can find the Step by Step detail on this Link - Configuring_SSH_on_Linux


Step 2: Creating Datastage job to run script in a remote server

1. Create a new sequencer job
2. Add an Execute Command stage
3. In the Command text value in the ExecCommand tab,

type -

 ssh UserB@ServerB ksh /home/b/test.ksh

This command will execute a script in the remote server.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 14 July 2017

Utility to Get Last Run Log of DataStage Job


Love the new feature of IBM DataStage to fetch the last run log of any jobs with the help of "dsjob" command.
In older version of DataStage, it is very tedious to get the last run log but from/after v9.1 IBM added an additional feature in dsjob command. Lets see hows this works -


  • To Fetch Last Run Log:
  • To Fetch Second Last Run Log
  • To Fetch Third Last Run Log
  • To Fetch Nth Last Run Log





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 7 July 2017

ICONV mystery - the UV function


Iconv (Internal CONVersion) is a function supported by UniVerse DB (UV db) to convert the DATA, not only DATE, into internal format. DataStage Server Jobs are using lots of UV functions to manipulate the data.

Today, I will try to unwrap the mystery behind the Iconv function and try to put the details in simpler words. We will not go into data conversion details but date conversion which is used by DataStage :-)

Like most of other date functions (Parallel one), Iconv also accept the date(string) and its format.

Suppose, Date =   June 17, 2017

To Convert this date into internal format, we have to use -

Iconv("2017-06-17", D-YMD)  = 18066
Iconv("2017/06/17", D/YMD)   = 18066
Iconv("2017:17:06", D:YDM)  = 18066
Iconv("17-06-17", D-Y2MD)    = 18066



D-  --> D for Delimiter followed by delimiter char
Y --> year in YYYY
M --> month in MM
D --> date in DD

As we can see, if we provide the date format with date string, Iconv convert the date to an integer no and it is very important to do because now datastage can understand the given date and we can use Oconv function to re-format the date as required.

I will cover Oconv in next post, till then Keep Learning !!




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 5 July 2017

Conditionally Aborting Jobs with Transformer Stage


How to develop a job which will stop processing when FROM_DATE and TO_DATE is equal in data? Or
I want to abort the job when reject row count is more than 50?

Above scenarios can be implemented using Transformer Stage but How? Let's check this out -

  • The Transformer can be used to conditionally abort a job when incoming data matches a specific rule. 
    • In our case 1, it is FROM_DATE  = TO_DATE 
    • In our case 2, it is some reject condition 
  • Create a new output link that will handle rows that match the abort rule. 
  • Within the link constraints dialog box, apply the abort rule to this output link
  • Set the “Abort After Rows” count to the number of rows allowed before the job should be aborted .
    • In case 1, it should be 1. as we want to abort the job when FROM_DATE is equal to TO_DATE
    • In case 2, it should be 50 as we want to abort the job when reject condition have more than 50 records
xfm

But, since the Transformer will abort the entire job flow immediately, it is possible that valid rows will not have been flushed from Sequential File (export) buffers, or committed to database tables.
It is important to set the Sequential File buffer flush  or database commit parameters otherwise we have to manually remove the data which has been inserted into sequential file or database.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 5 April 2017

NULL Handling in Sequential File Stage



DataStage has a mechanism for denoting NULL field values. It is slightly different in server and parallel jobs. In the sequential file stage a character or string may be used to represent NULL column values. Here's how represent NULL with the character "~":

Server Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. On the right enter the "~" next to "Default NULL string:"

Parallel Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. Right click on "Field defaults" ==> "Add sub-property" and select "Null field value"
4. Enter the "~" in the newly created field.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Saturday, 1 April 2017

Get Over Running DataStage Job Details #iLoveScripting


With the script below, we will get a list of jobs which are taking more time to complete than last run time. By some tweaks, we can use this script to monitor any kind of process.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 5 March 2017

How to get DataStage Project Settings


DSListProjectSettings.bat is used to report all available settings of existing projects.

USAGE: DSListProjectSettings <SERVER> <USER> <PASSWORD> <PROJECT>

The DSListProjectSettings batch file reports all available project settings and environment variable names and values to standard output. The first section shows available DataStage project properties and their settings. The second section shows environment variable names and values. When run successfully, each section will report a status code of zero.



..


Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Thursday, 23 February 2017

Shell script to access DataStage Director ETL joblogs



We have various data warehouses hosted on AIX/Linux/Unix operating systems. DataStage Director is one of the ETL monitoring & scheduling tool used in numerous data warehouses. In case of ETL job failures we need to login to DataStage Director and check the log for error messages. The ETL job logs can tend to span into several pages with various types of informative messages. Usually we need to locate the error message under ‘Fatal’ message type. Doing this task by searching the DataStage log manually can be very time consuming and tiring.




This shell script facilitates the user to access the error messages from the ease of the  Linux screen. The script also has facility to email the filtered log messages to the user’s mail box.


Ø  Accept jobname and other parameters while script execution.
Ø  Establish proper environment settings for local DataStage use
Ø  Locate event ids for fatal errors in the ETL joblog.
Ø  Extract detail log for fatal errors
Ø  Mail the filtered job log with exact fatal error message to the user.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 18 January 2017

5 Tips For Better DataStage Design #17





**  There is an automap button in some stages,it can maps fields with the same names.

**  When you add a shared container into your job you need to map the columns of the container to your job link. What you might miss is the extra option you get on the Columns tab "Load" button. In addition to the normal column load you get "Load from Container" which is a quick way to load the container metadata into your job.

**  Don't create a job from an empty canvas. Always copy and use an existing job. Don't create shared containers from a blank canvas, always build and test a full job and then turn part of it into a container.



**  If you want to copy and paste settings between jobs,you had better open two Designers,then you can have two property windows open at the same time and copy or compare them more easily.As most property windows in DataStage are modal and you can only have one property window open per Designer session.

**  You can load metadata into a stage by using the "Load" button on the column tab or by dragging and dropping a table definition from the Designer repository window onto a link in your job. For sequential file stages the drag and drop is faster as it loads both the column names and the format values in one go. If you used the load button you would need to load the column names and then the format details separately.

**  Maybe you often meet a Modify stage or stage function working incorrectly, trial and error should be often the only way to work out the syntax of a function. If you do this in a large and complex job, it can be consumed a lot of times to debug it. The better way is have a couple test jobs in your project with a row generator, a modify or transformer stage and a peek stage. Have a column of each type in this test job. Use this throughout your project as a quick way to test a function or conversion. By the way, to correctly running the transformer stage need install the c++ compiler.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Thursday, 22 December 2016

DataStage Scenario #17 - Get Transitive relation between columns


Goal : To get the data from two columns which have transitive relationship between them

A -> B
B -> C

then 

A -> C



Input
Col1 Col2
a b
b c
s u
u p
1 2
2 3






Output
Col1 Col2
a c
s p
1 3




Like the below page to get update  

Wednesday, 7 December 2016

Import the jobs from DS windows client #iLoveScripting


As we have discussed a script which can export the datastage jobs from your client system (http://bit.ly/2frNPKj) likewise we can write another one to import the jobs. Let's see how -

DsImportJobsClient.bat :

This Script read all the *.dsx job name from the specified Directory and Sub-Directory and import to the Specified project. It can also build (Only BUILD) the existing package created on Information Server Manager and send it to the specified location on client machine.

To use the build feature you need to make sure the package has been created with all the needed jobs, saved and closed. Only update to the selected job will be taken care automatically. To add/delete a job, you need to do manually.

Modify the Import.properties and ImportJobList.txt file and Go the .bat dir and then execute the importAndBuild.bat.




Import.properties :


ImportJobList.txt :


DsImportJobsClient.bat :





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 21 November 2016

Reading DSParam - datastage parameter file



I am sharing a utility which can help you to read DSParam file which holds all the environmental datastage parameters.

Utility to view contents of DSParams file. Useful when trying to see what all the customer has set at the project level.



Usage:
$ cat DSParams | ./DSParamReader.pl | more
or
$ cat DSParams | ./DSParamReader.pl > outputfile


Instructions:
1. copy script text below to a file (DSParamReader.pl) on a UNIX system
2. Set execute permissions on this file. chmod 777 envvar.pl
3. Usually perl is in /usr/bin/perl but you might have to adjust this path if neccessary. (hint "which perl" should tell you which one to use)
4. cat the DSParams file from the project you are concerned with and redirect the output to this script. You may have to put the Fully Qualified Path for this file.
5. capture the output to screen or file. File may be useful to have the customer send the info to you in email.








Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 14 November 2016

DataStage Partitioning #3



Best allocation of Partitions in DataStage for storage area

Srno
No of Ways
Volume of Data
Best way of Partition
Allocation of Configuration File (Node)
1
DB2 EEE  extraction in serial
Low
-
1
2
DB2 EEE extraction in parallel
High
Node number = current node (key)
64 (Depends on how many nodes are allocated)
3
Partition or Repartition in the Stages of DataStage
Any
Modulus (It should be single key that to integer)
Hash (Any number of keys with different data type)
8 (Depends on how many nodes are allocated for the job)
4
Writing into DB2
Any
DB2
-
5
Writing into Dataset
Any
Same
1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
6
Writing into Sequential File
Low
-
1

 

Best allocation of Partitions in DataStage for each stage

S. No
Stage
Best way of Partition
Important points
1
Join
Left and Right link: Hash or Modulus
All the input links should be sorted based on the joining key and partitioned with higher key order.

  1.  
Lookup
Main link: Hash or same
Reference link: Entire
Both the links need not be in the sorted order

  1.  
Merge
Master and update link: Hash or Modulus
All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.

  1.  
Remove Duplicate, Aggregator
Hash or Modulus
If the input link is in sorted order based on the key it will perform better.

  1.  
Sort
Hash or Modulus
Sorting happens after partitioning


Transformer, Funnel, Copy, Filter
Same
None
7
Change Capture
Left and Right link: Hash or Modulus
Both the input links should be in the sorted order based on the key and partitioned with higher key order.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/