## Friday, 30 October 2015

### UpCase and DownCase String Function : DataStage Transformation

Downcase(column):
This function converts all the input character to LOWER case where input can be in any case.

input = 'This is me'
input1 = 'this is me'
input2 = 'THIS IS ME'


DownCase(input)    =  'this is me'
DownCase(input1)   =  'this is me'
DownCase(input2)   =  'this is me'

Upcase(column):
This function converts all the input character to UPPER case where input can be in any case.

input = 'This is upper case function'
input1 = 'this is me'
input2 = 'THIS IS ME'


UpCase(input)    =  'THIS IS UPPER CASE FUNCTION'
UpCase(input1)   =  'THIS IS ME'
UpCase(input2)   =  'THIS IS ME'

## Wednesday, 28 October 2015

### Left and Right String Function - DataStage Transformation

Today, we will discuss two more datastage transformer functions - Left & Right.  These two functions use to get or cut the string from the input data, Let's have a look -

Left(input, number)
input : - input column
number :- the no of characters you want to fetch from Left most of string

input = 'This is a test of fuctions."

Left(input,9)  =    'This is a'
Left(input, 3) =    'Thi'

Right(input, number)
input : - input column
number :- the no of characters you want to fetch from Right most of string

input = 'This is a test of fuctions."

Right(input,9)  =    'fuctions.'
Right(input, 14) =    't of fuctions.'

Use of Left/Right function to get SubString -

These functions used to get substring from an input if the start and end character location known or can be derived. Let's see an example -

we need to cut input from 2nd to 6th character

input = 'This is a test of fuctions."

left(input, 6) = 'This i'           #left function cut the left 6 character
Right(left(input,5),5) = 'his i'    #Use right func, to cut right 6-2+1 = 5 char 

## Tuesday, 27 October 2015

### 5 Tips For Better DataStage Design #3

• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.

• Use Annotations for describing steps done at stages. Use Description Annotation as job title; as Description Annotation also appears in Job properties>Short Job Description and also in the Job Report when generated.

• When mapping a decimal field to a char field or vice versa , it is always better to convert the value in the field using the "Type Conversion" functions  "DecimalToString" or "StringToDecimal" as applicable while mapping.

• Always use Hash Partition in Join and Aggregator stages. The hash key should be the same as the key used to join/aggregate. If Join/Aggregator stages do not produce desirable results, try running in sequential mode (verify results; if still incorrect problem is with data/logic) and then run in parallel using Hash partition.

• In Job sequences; always use "Reset if required, then run" option in Job Activity stages. (Note: This is not a default option)

## Friday, 23 October 2015

### authfile in dsjob command

While working on Datastage CLI commands, in some cases when you are working on server which is having multiple DataStage server install Or your user id is don't have sufficient permission to login on DataStage, we have to give DataStage server details like DataStage server host, username, password etc in the command line. But, this creates a small issue, anyone can see the username and password details which you have used in the command line by checking the server command history or other ways.

IBM Datastage provides a way to secure your username and password with authfile parameter. Today, we will see what it is and how we can use in our command line statement.

authfile:

How to use in commands:

dsjob -authfile c:\auth_file.txt -lprojects
dsjob -authfile c:\auth_file.txt -ljobs dstage1

## Wednesday, 21 October 2015

### DataStage Scenario #8 - Populate date between given two

Scenario :

Design a job with 2 job parameters -

1. Start Date   ( DD-MM-YYYY )
2. End Date  ( DD-MM-YYYY )

Read the parameter and populate all the date between these 2 which is not "SUNDAY"

## Tuesday, 20 October 2015

### NULL behavior in datastage transformer

NULL - It is always a challenge for developers or architect while dealing with NULL in data. Every projects need some base rules to process NULLs.
Today, we will see how the NULL behave in datastage, I hope, this will help you design a better job or flow to process the NULLs.

1. Null values can now be included in any expression

2. Null values no longer need to be explicitly handled

3. A null value in an expression will return a null value result. Example:
1 + null = null
"abc":null = null
trim(null) = null

4. Exception: IsNull(null) and IsNotNull(null) will return true or false as expected

5. Any comparison involving null is unknown (i.e. is not true). Example:
1 < null is not true
1 >= null is not true
null == null is not true
null != null is not true

6. Logical operators - null is unknown (i.e. is not true)
True AND null is not true
True OR null is true
False OR null is not true

7. As long as the target column is nullable records will not be dropped

8. Stage Variables are now always nullable

9. Stage Variable derivations can now include any input fields

## Friday, 16 October 2015

### Datastage job processes type at server side

When we are running datastage jobs in the designer, director, by command line or by any third party tool, it invoked 4 type of process at the server side. Let's have a look on those -

dsapi_slave -  server side process for DataStage clients like Designer
osh -          Parallel Engine process
DSD.StageRun - Server Engine Stage process ( when server job is running )
DSD.RUN -      DataStage supervisor process used to initialize Parallel Engine and Server Engine jobs. There is one DSD.RUN process for each active job ( when parallel job is running )

* No of dsapi_slave process -  how many clients are connected to datastage server. It can be Administrator, Designer or Director.
* No of DSD.RUN process -  Total parallel jobs running
* No of DSD.StageRun - Total server jobs running

## Thursday, 15 October 2015

### Behavior of Multi-Instance job in Director Client

Multi-Instance Job:
DataStage is supporting Multi-Instance jobs which can be run at same time with different invocation id.
Today, we will discuss the behavior of multi-instance datastage job in Director.

Running Jobs:
When we run the multi-instance job, it will ask for Invocation Id to be passed, when the job is running, it will display a new job in director in format <JOB_Name>.<Invok_Id>. Nothing change in original job, it is still in compiled status. So, if we invoke job 3 times with 3 invocation id, it will generate 3 jobs in director -

Jobname.InvkId1
Jobname.InvkId3
Jobname.InvkId3

Monitoring Jobs:
We can monitor each invoked job as it is been generated and visible in the Director with invocation id. But the tool is using the same RT_LOGnn file to write the job log for all invocation id, So we can see the n instance in director and its log in the Director but in backend, it is a single file. We can monitor, stop and check individual job log.

Deleting Jobs:
If we delete a job instance from director, it will be deleted and other instances will remain there But the job log for this instance is still with RT_LOGnn file ( we can access through from Datastage Command Line but not in Datastage Director as job instance has been deleted).

Purging Job logs:
If we purge the job log in datastage, it will delete the job instances as well as job logs from RT_LOGnn file. So the difference here is that the Director delete action only deletes records from RT_STATUS whereas the purging mechanism deletes records from RT_LOG.

## Wednesday, 14 October 2015

### DataStage Scenario #7 - Count data in each column

Count the total rows and cells which is having some data in it.

Input:

 col1 col2 col3 a {NULL} b f k {NULL} h {NULL} n i d {NULL} {NULL} s {NULL} g u m l x o m {NULL} {NULL} c d z

Output:

 Count Rows Count_col1 Count_col2 Count_col3 9 8 6 5

## Tuesday, 13 October 2015

### 5 Tips for better DataStage Design & Administration #2

1. In a DataStage server environment with more than one version of DataStage installed in the dsjob command you must specify the server name and port number of the DataStage version that you want the ksh script to run on.

2.  While doing column mapping in designer, always checks whether source/target column have same Null properties, if not, it should be handled explicitly.

3.  Nulls are a curse when it comes to using functions/routines or normal equality type expressions.
E.g. NULL = NULL doesn’t work; neither does concatenation when one of the fields is null. Changing the nulls to 0 or “” before performing operations is recommended to avoid erroneous outcomes.

4. When using String functions on decimal always use Trim function to avoid as String functions interpret an extra Space used for sign in decimal.

5. When you need to get a substring (e.g. first 2 characters from the left) of a character field:
Use <Field Name>[1,2]
Similarly for a decimal field then:
Use Trim(<Field Name>)[1,2]

## Monday, 12 October 2015

### How to Encrypt Password for Datastage ?

As datastage user,  we need lots of scripts help our daily  jobs,  most of these scripts need authenticate by datastage engine,  that means we need to pass our Datastage user password in script or command line.
But there is a problem if we are using "Password" here, It can be view by checking the content of script or command history. To secure password, there is a way provided by datastage and that is ENCRYPT the password before using. Let's see how we can do that ---

==============================================================================================
==============================================================================================

Now we can use this encrypted password ( {iisenc}lUXDuyv1LXoCYUHcOtxfCA== ) wherever we want to use whether in the command line or datastage scripts. This can not be de-crypt back to get the real password.

## Friday, 9 October 2015

### Shell Script for listing out Running Datastage jobs

This script is using the logic/commands from Get currently running DataStage jobs (link). You can run this script in your linux terminal with the argument and keep getting update which job is running at server. You can give a small no to refresh time for quick refresh.

========================================================================

========================================================================

You may need to change some command according to your linux flavour such as 'sort' command or you can simply remove it whatever suits you better.

You can find many more post like this HERE

## Thursday, 8 October 2015

### dsjob #3 - Managing Datastage Jobs from Command Line

0. Sourcing DSHOME in Windows or Linux
In Windows - Add the DSENGINE/bin path to Windows PATH

In Linux - Add the DSENGINE/bin path to Linux PATH or source the dsenv file

1. Listing Project:
dsjob –lprojects         ---List of all projects

2. Listing Jobs:
dsjob –ljobs project    ---List of all Jobs under a project
dstage1 - project name

3. Listing Stages:
dsjob –lstages [useid] project job|job_id --- List of all stages in a job
xfm7 -- jobname

dsjob –links [useid] project job|job_id stage  --- List of the links within a job design
xfm7 - jobname
Transformer_1 - stagename

5. Listing Parameters:
dsjob –lparams [useid] project job|job_id  ---List of all parameters used within a Job
This job is not having any parameter so <none> in output.

6. Displaying Job Information:
dsjob –jobinfo [useid] project job|job_id

7. Displaying stage information
dsjob –stageinfo [useid] project job|job_id

dsjob –log [–info | –warn] [useid] project job|job_id

10. Displaying a Short Log Entry
dsjob –logsum [–type type] [ –max n ] [useid] project job|job_id

11. Generating a Report:
dsjob –report [useid] project job|job_id [report_type]

You can find many more post like this HERE

## Wednesday, 7 October 2015

### Shell Script Scenario #2 - Solution

You can find other scenarios HERE

==================================================================================
==================================================================================

## Tuesday, 6 October 2015

### dsjob #2 - Managing Datastage Jobs from Command Line

Continued from dsjob - Managing Datastage Jobs from Command Line #1

Listing Projects, Jobs, Stages, Links, and Parameters :

We can list projects, jobs, stages, links, and job parameters using the

dsjob command. The different versions of the syntax are described in
the following sections.

The following syntax works for listing information about a Datastage job as below:

•     dsjob –lprojects         ---List of all projects
•     dsjob –ljobs project    ---List of all Jobs under a project
project is the name of the project containing the jobs to list.

•     dsjob –lstages [useid] project job|job_id --- List of all stages in a job
1.    useid specify this if we intend to use a job alias (jobid) rather than a
2.    job name (job) to identify the job.
3.    project is the name of the project containing job.
4.    job is the name of the job containing the stages to list. To identify a
5.    job invocation, use the format job.invocation_id.
6.    job_id is an alias for the job that has been set using the dsjob -jobid
7.    command

•     dsjob –links [useid] project job|job_id stage  --- List of the links within a job design
1.    useid specify this if we intend to use a job alias (jobid) rather than a
2.    job name (job) to identify the job.
3.    project is the name of the project containing job.
4.    job is the name of the job containing stage. To identify a job
5.    invocation, use the format job.invocation_id.
6.    job_id is an alias for the job that has been set using the dsjob -jobid
7.    stage is the name of the stage containing the links to list.

•     dsjob –lparams [useid] project job|job_id  ---List of all parameters used within a Job

Setting an Alias for a Job:

The dsjob command can be used to specify own ID for a

DataStage job.Other commands can then use that alias to refer to the
job.

Syntax: dsjob –jobid [my_ID] project job

Explanations of the available options

1. my_ID is the alias we want to set for the job. If we omit my_ID, the command will return the current alias for the specified job. An alias must be unique within the project, if the alias already exists an error message is displayed project is the name of the project containing job.
2. job is the name of the job. To identify a job invocation, use the format job.invocation_id

Retrieving Information:

The dsjob command can be used to retrieve and display the available information about specific projects, jobs, stages, or links. The different

versions of the syntax are described in the following sections.

Displaying Job Information

Syntax: dsjob –jobinfo [useid] project job|job_id

Explanations of the available options:

1. useid specify this if we intend to use a job alias (jobid) rather than a job name (job) to identify the job.
2. project is the name of the project containing job.
3. job is the name of the job. To identify a job invocation, use the format job.invocation_id.

The following information is displayed:
1. The current status of the job
2. The name of any controlling job for the job
3. The date and time when the job started
4. The wave number of the last or current run (internal DataStage reference number)
5. User status

Displaying Stage Information:

Syntax: dsjob –stageinfo [useid] project job|job_id stage

Explanations of the available options:

1. useid specify this if we intend to use a job alias (jobid) rather than a
2. job name (job) to identify the job.
3. project is the name of the project containing job.
4. job is the name of the job containing stage. To identify a job
5. invocation, use the format job.invocation_id.
6. job_id is an alias for the job that has been set using the dsjob -jobid command.
7. stage is the name of the stage.

The following information is displayed:
1. The last error message reported from any link to or from the stage
2. The stage type name, for example, Transformer or Aggregator
3. The primary links input row number

Explanations of the available options:

1. useid specify this if we intend to use a job alias (jobid) rather than a
2. job name (job) to identify the job.
3. project is the name of the project containing job.
4. job is the name of the job containing stage. To identify a job
5. invocation, use the format job.invocation_id.
6. job_id is an alias for the job that has been set using the dsjob -jobid command.
7. stage is the name of the stage containing link.
8. link is the name of the stage.

The following information is displayed:
1. The last error message reported by the link
2. The number of rows that have passed down a link

Accessing Log Files:

Syntax: dsjob –log [–info | –warn] [useid] project job|job_id

Explanations of the available options:
1. info specifies an information message. This is the default if no log entry type is specified.
2. -warn specifies a warning message.
3. useid specify this if we intend to use a job alias (jobid) rather than a
4. job name (job) to identify the job.
5. project is the name of the project containing job.
6. job is the name of the job that the log entry refers to. To identify a job invocation, use the format job.invocation_id.
7. job_id is an alias for the job that has been set using the dsjob -jobid command.

Displaying a Short Log Entry

Syntax: dsjob –logsum [–type type] [ –max n ] [useid] project job|job_id

type specifies the type of log entry to retrieve.

If –type is not specified, all the entries are retrieved.

Importing Job Executables:

The dsjob command can be used to import job executables from a

DSX file into a specified project.

Syntax: dsjob –import project DSXfilename [-OVERWRITE] [-JOB[S] jobname …] |[-LIST]

1.  DSXfilename is the DSX file containing the job executables.
2. -OVERWRITE specifies that any existing jobs in the project with the same name will be overwritten.
3. -JOB[S] jobname specifies that one or more named job executables should be imported (otherwise all the executable in the DSX file are imported).
4. -LIST causes DataStage to list the executables in a DSX file rather than import them.

Generating a Report:

The dsjob command can be used to generate an XML format report

containing job, stage, and link information.

Syntax: dsjob –report [useid] project job|job_id [report_type]

Report_type is one of the following:

BASIC – Text string containing start/end time, time elapsed and status of job.

DETAIL – As basic report, but also contains information about individual stages and links within the job.
LIST – Text string containing

Examples are available HERE

## Monday, 5 October 2015

### Hashing & Sorting Criteria in stages

As we all aware about the best partitioning method is Round Robin but this method distribute the whole data to all the partition irrespective of Key ( Round Robin is Keyless partitioning method) which is usually we do not want and when we consider the key, It's Hash.

DataStage sorting and hashing improves the data processing speed which is one of our targets to achieve in projects. So, let's create a list of some important stages and see whether they need the partitioning or sorting to perform better.

 Stages Partition(Hash) Sort Sort Yes No Aggregator Yes Yes Join Yes Yes Remove Duplicate No No Merge Yes Yes Lookup No No

Like the below page to get update

## Sunday, 4 October 2015

### dsjob #1- Managing Datastage Jobs from Command Line

Datastage Command Line Interface:

The DataStage CLI comprises three groups of commands, those concerned with running DataStage Jobs, those concerned with administering DataStage projects, and those concerned with searching jobs.

dsjob command:

The dsjob command can be used from the UNIX command line to execute of the following tasks, All output from the dsjob command is in plain text without column headings on lists, or any other sort of description. This enables the command to be used in shell or batch scripts without extra processing.

These options are described in the following topics:

•      The logon clause
•      Starting a job
•      Stopping a job
•      Listing projects, jobs, stages, links, and parameters
•      Setting an alias for a job
•      Retrieving information
•      Accessing log files
•      Importing job executables
•      Generating a report

Logon clause:

By default, the DataStage CLI connects to the DataStage server engine on the local system using the user name and password of the user invoking the command. We can specify a different server, user name, or password using the logon clause.

Syntax :

servername specifies a different server to log on to.
username specifies a different user name to use when logging on.

We can use the logon clause with any dsjob command.

Starting a Job:

We can start, stop, validate, and reset Datastage jobs using the –run option from the UNIX command line

Syntax :

dsjob –run
[ –mode [ NORMAL | RESET | VALIDATE ] ]
[ –param name=value ]
[ –warn n ]
[ –rows n ]
[ –wait ]
[ –stop ]
[ –jobstatus]
[–userstatus]
[–local]
[-disableprjhandler]
[-disablejobhandler]
[useid] project job| job_id

Explanations of the available options

mode specifies the type of job run. NORMAL starts a job run, RESET resets the job and VALIDATE validates the job. If –mode is not specified, a normal job run is started.

param specifies a parameter value to pass to the job. The value is in the format name=value, where name is the parameter name, and value is the value to be set. If we use this to pass a value of an environment variable for a job (as we may do for parallel jobs), we need to quote the environment variable and its value, for example -param '\$APT_CONFIG_FILE=sayan.apt' otherwise the current value of the environment variable will be used.

warn n sets warning limits to the value specified by n

rows n sets row limits to the value specified by n.

wait waits for the job to complete.

stop terminates a running job.

jobstatus waits for the job to complete, then returns an exit code
derived from the job status.

userstatus waits for the job to complete, then returns an exit code derived from the user status if that status is defined.
-local use this when running a DataStage job from within a shellscript on a UNIX server.

-opmetadata use this to have the job generate operational meta data as it runs.  If we specify FALSE, the job will not generate operational meta data.

-disableprjhandler use this to disable any error message handler that has been set on a project wide basis.

-disablejobhandler use this to disable any error message handler that has been set for this job.

-useid specify this if we intend to use a job alias (jobid) rather than a job name (job) to identify the job.

-project is the name of the project containing the job. job is the name of the job. To run a job invocation, use the format job.invocation_id.

job_id is an alias for the job that has been set using the dsjob -jobid command.

Stopping a Job:

We can stop a Datastage job using the –stop option from UNIX command prompt

Syntax:
dsjob –stop [useid] project job| job_id

Next part is available HERE