Showing posts with label Job. Show all posts
Showing posts with label Job. Show all posts

Friday, 14 July 2017

Utility to Get Last Run Log of DataStage Job


Love the new feature of IBM DataStage to fetch the last run log of any jobs with the help of "dsjob" command.
In older version of DataStage, it is very tedious to get the last run log but from/after v9.1 IBM added an additional feature in dsjob command. Lets see hows this works -


  • To Fetch Last Run Log:
  • To Fetch Second Last Run Log
  • To Fetch Third Last Run Log
  • To Fetch Nth Last Run Log





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 5 April 2017

NULL Handling in Sequential File Stage



DataStage has a mechanism for denoting NULL field values. It is slightly different in server and parallel jobs. In the sequential file stage a character or string may be used to represent NULL column values. Here's how represent NULL with the character "~":

Server Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. On the right enter the "~" next to "Default NULL string:"

Parallel Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. Right click on "Field defaults" ==> "Add sub-property" and select "Null field value"
4. Enter the "~" in the newly created field.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 7 December 2016

Import the jobs from DS windows client #iLoveScripting


As we have discussed a script which can export the datastage jobs from your client system (http://bit.ly/2frNPKj) likewise we can write another one to import the jobs. Let's see how -

DsImportJobsClient.bat :

This Script read all the *.dsx job name from the specified Directory and Sub-Directory and import to the Specified project. It can also build (Only BUILD) the existing package created on Information Server Manager and send it to the specified location on client machine.

To use the build feature you need to make sure the package has been created with all the needed jobs, saved and closed. Only update to the selected job will be taken care automatically. To add/delete a job, you need to do manually.

Modify the Import.properties and ImportJobList.txt file and Go the .bat dir and then execute the importAndBuild.bat.




Import.properties :


ImportJobList.txt :


DsImportJobsClient.bat :





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 6 November 2016

Export the jobs from DS windows client



Datastage jobs Export/Import are occasional activity (Deployment time :-)) for a developer But it becomes very tedious if the job list are long or it's daily routine to export or import jobs.

So I have written a batch script (windows script) which we can execute from the Client Machine (where datastage clients are installed) and automate this process.



DSjobExportClient.bat :
Export Script read the job name from the file (ExportJobList.txt) and exports the jobName.dsx from the project to the export base location and maintain the folder structure specified in the ExportJobList.txt file. File "ExportJobList.txt" and “Export.properties” should be updated before running the export script.

Copy the above file on any location of your DS windows client machine and update the "ExportJobList.txt" and “Export.properties” files.
-    Export.properties
-    ExportJobList.txt
-    ExportJobs.bat   



Export.properties :


ExportJobList.txt :


DsExportJobsClient.bat :




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 11 October 2016

5 Tips For Better DataStage Design #16


1. Use 4-node configuration file for unit testing/system testing the job.
2. If there are multiple jobs to be run for the same module. Archive the source files in the after job routine of the last job.
3. Check whether the file exists in the landing directory before moving the sequential file. The ‘mv’ command will move the landing directory if the file is not found.

4. Ensure that the unix files created by any Datastage job is created by the same unix user who has run the job.
5. Make sure that the Short Job Description is filled using ‘Description Annotation’ and it contains the job name as part of the description. Don’t use Annotation for putting the job description.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 23 February 2016

DataStage Scenario #15 - Get First & Last Date of Last Month



Design a job which can generate the First and Last date of Last Month and pass this into an SQL which executes inside a parallel job?









Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 1 February 2016

5 Tips For Better DataStage Design #8



#1. Templates have to be created to enhance reusability and enforce coding standard. Jobs should be created using templates.
#2. The template should contain the standard job flow along with proper naming conventions of components, proper Job level annotation and short/long description. Change record section should be kept in log description to keep track.



#3. Don't copy the job design only. copy using 'save as' or create copy option at the job level.
#4. The DataStage connection should be logged off after completion of work to avoid locked jobs.
#5. Creation of common lookup jobs
#6. Some extraction jobs can be created to created reference datasets. The datasets can then be used in different conversion modules





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Thursday, 28 January 2016

How to get table list used in DataStage jobs ?


While developing jobs in DataStage, sometimes we face this requirement to get all the table list which was used by our DataStage jobs, but unfortunately, there is no direct way to get that.

DataStage is not having any command which can give us the table list. But, there are some ways by which can get the table list. All are the steps, which we are going to discuss, are needs one-time setup or development.




1) Setting up a universe query -


We can tweak this query to get the table list for all the DataStage jobs.

2) Parsing job export XML -
a) We can parse the tables from job export XML file. We can write a shell script and parse the XML to get the table name
b) Or we can develop a DataStage job which reads this XML and parses all the tables

Make use of these practices while developement  - 

3) While doing the development of DataStage project, make a practice to maintain a table which is having table and job information. This will help a lot afterward. 

4) Before using any table in any job, store that metadata into DataStage Repository folder. This will help you to do the Usage Analysis afterward.



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Sunday, 17 January 2016

Count of Jobs - A Quick DataStage Recipe



What to Cook:
How to count number of jobs in DS project

Ingredients:
Use dsjobs "-ljobs" command



How to Cook:
Go to DS Administrator "projects" tab
Click on command button
Enter following command to execute:
SH -c "dsjob -ljobs <Project Name> | wc -l"

<Project Name> - Enter your project name






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Wednesday, 23 December 2015

5 Tips For Better DataStage Design #6



#1. If you are using a copy or a filter stage either immediately after or immediately before a transformer stage, you are reducing the efficiency by using more stages because a transformer does the job of both copy stage as well as a filter stage

#2. Work done by "COPY Stage"
a) Columns order can be altered.
b) And columns can be dropped.
c) We can change the column names.



#3. When you need to run the same sequence of jobs again and again, better create a sequencer with all the jobs that you need to run. Running this sequencer will run all the jobs. You can provide the sequence as per your requirement.

#4. Sort the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. Avoid the work done by DataStage which is possible in DB. But it doesn't mean you have to put all the complexity in SQL only, for that we are using datastage.

#5. Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 16 November 2015

List DataStage jobs which used this Parameter



Open the DataStage Administrator Client

Click on the Projects tab and select the project you would like to generate a list for.

Click the Command Button

In the command entry box type:

LIST DS_JOBS WITH JOBTYPE = 3 AND EVAL "TRANS('DS_JOBOBJECTS','J\':@RECORD<5>:'\ROOT',14,'X')" LIKE ...<VARNAME>...

<VARNAME> should be the name of the parameter or environment variable



Example:

LIST DS_JOBS WITH JOBTYPE = 3 AND EVAL "TRANS('DS_JOBOBJECTS','J\':@RECORD<5>:'\ROOT',14,'X')" LIKE ...TMPDIR...

Click Execute

If the output is on more than one page, click Next to page done and click Close when finished.


In this example, a job type of 3 is a parallel job. Valid job types value are:
0 = Server
1 = Mainframe
2 = Sequence
3 = Parallel





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Tuesday, 10 November 2015

Check Memory Utilization by Datastage processes



While we are running lots of DataStage job on a Linux DataStage server or different environment are sharing the same server which cause the resource crunch at server-side which affect the job performance.

It's always preferable to have an eye on resource utilization while we are running jobs. Mostly, DataStage admin set a cron with a resource monitoring script which will invoke in every five min ( or more) and check the resource statistics on server and notify them accordingly.

The following processes are started on the DataStage Engine server as follows:

dsapi_slave - server side process for DataStage clients like Designer

osh - Parallel Engine process
DSD.StageRun - Server Engine Stage process
DSD.RUN - DataStage supervisor process used to initialize Parallel Engine and Server Engine jobs. There is one DSD.RUN process for each active job

ps auxw | head -1;ps auxw | grep dsapi_slave | sort -rn -k5  | head -10
USER   PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
atul 38846  0.0  0.0 103308   856 pts/0    S+   07:20   0:00 grep dsapi_slave


The example shown lists the top 10 dsapi_slave processes from a memory utilization perspective. We can substitute or add an appropriate argument for grep like osh, DSD.RUN or even the user name that was used to invoke a DataStage task to get a list that matches your criteria.



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 30 October 2015

UpCase and DownCase String Function : DataStage Transformation


Downcase(column):
                  This function converts all the input character to LOWER case where input can be in any case.

input = 'This is me'
input1 = 'this is me'
input2 = 'THIS IS ME'

DownCase(input)    =  'this is me'
DownCase(input1)   =  'this is me'
DownCase(input2)   =  'this is me'


Upcase(column): 
                  This function converts all the input character to UPPER case where input can be in any case.

input = 'This is upper case function'
input1 = 'this is me'
input2 = 'THIS IS ME'

UpCase(input)    =  'THIS IS UPPER CASE FUNCTION'
UpCase(input1)   =  'THIS IS ME'
UpCase(input2)   =  'THIS IS ME'





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Wednesday, 28 October 2015

Left and Right String Function - DataStage Transformation



Today, we will discuss two more datastage transformer functions - Left & Right.  These two functions use to get or cut the string from the input data, Let's have a look -


Left(input, number)  
input : - input column
number :- the no of characters you want to fetch from Left most of string


input = 'This is a test of fuctions."

Left(input,9)  =    'This is a'
Left(input, 3) =    'Thi'

Right(input, number)
input : - input column
number :- the no of characters you want to fetch from Right most of string


input = 'This is a test of fuctions."

Right(input,9)  =    'fuctions.'
Right(input, 14) =    't of fuctions.'


Use of Left/Right function to get SubString -

These functions used to get substring from an input if the start and end character location known or can be derived. Let's see an example -

we need to cut input from 2nd to 6th character

input = 'This is a test of fuctions."

left(input, 6) = 'This i'           #left function cut the left 6 character
Right(left(input,5),5) = 'his i'    #Use right func, to cut right 6-2+1 = 5 char 






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 23 October 2015

authfile in dsjob command


While working on Datastage CLI commands, in some cases when you are working on server which is having multiple DataStage server install Or your user id is don't have sufficient permission to login on DataStage, we have to give DataStage server details like DataStage server host, username, password etc in the command line. But, this creates a small issue, anyone can see the username and password details which you have used in the command line by checking the server command history or other ways.

                           IBM Datastage provides a way to secure your username and password with authfile parameter. Today, we will see what it is and how we can use in our command line statement.



authfile:



How to use in commands:

dsjob -authfile c:\auth_file.txt -lprojects
dsjob -authfile c:\auth_file.txt -ljobs dstage1





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Wednesday, 21 October 2015

DataStage Scenario #8 - Populate date between given two



Scenario :

Design a job with 2 job parameters -



1. Start Date   ( DD-MM-YYYY )
2. End Date  ( DD-MM-YYYY )

Read the parameter and populate all the date between these 2 which is not "SUNDAY"





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 16 October 2015

Datastage job processes type at server side



When we are running datastage jobs in the designer, director, by command line or by any third party tool, it invoked 4 type of process at the server side. Let's have a look on those -

dsapi_slave -  server side process for DataStage clients like Designer
osh -          Parallel Engine process
DSD.StageRun - Server Engine Stage process ( when server job is running )
DSD.RUN -      DataStage supervisor process used to initialize Parallel Engine and Server Engine jobs. There is one DSD.RUN process for each active job ( when parallel job is running )




* No of dsapi_slave process -  how many clients are connected to datastage server. It can be Administrator, Designer or Director.
* No of DSD.RUN process -  Total parallel jobs running
* No of DSD.StageRun - Total server jobs running




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Thursday, 15 October 2015

Behavior of Multi-Instance job in Director Client



Multi-Instance Job:
                DataStage is supporting Multi-Instance jobs which can be run at same time with different invocation id.
      Today, we will discuss the behavior of multi-instance datastage job in Director.


Running Jobs:
                When we run the multi-instance job, it will ask for Invocation Id to be passed, when the job is running, it will display a new job in director in format <JOB_Name>.<Invok_Id>. Nothing change in original job, it is still in compiled status. So, if we invoke job 3 times with 3 invocation id, it will generate 3 jobs in director -

Jobname.InvkId1
Jobname.InvkId3
Jobname.InvkId3


Monitoring Jobs: 
                 We can monitor each invoked job as it is been generated and visible in the Director with invocation id. But the tool is using the same RT_LOGnn file to write the job log for all invocation id, So we can see the n instance in director and its log in the Director but in backend, it is a single file. We can monitor, stop and check individual job log.


Deleting Jobs:
                If we delete a job instance from director, it will be deleted and other instances will remain there But the job log for this instance is still with RT_LOGnn file ( we can access through from Datastage Command Line but not in Datastage Director as job instance has been deleted).


Purging Job logs:
               If we purge the job log in datastage, it will delete the job instances as well as job logs from RT_LOGnn file. So the difference here is that the Director delete action only deletes records from RT_STATUS whereas the purging mechanism deletes records from RT_LOG.
                  





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 9 October 2015

Shell Script for listing out Running Datastage jobs



This script is using the logic/commands from Get currently running DataStage jobs (link). You can run this script in your linux terminal with the argument and keep getting update which job is running at server. You can give a small no to refresh time for quick refresh.




========================================================================

========================================================================

You may need to change some command according to your linux flavour such as 'sort' command or you can simply remove it whatever suits you better.



You can find many more post like this HERE




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Thursday, 8 October 2015

dsjob #3 - Managing Datastage Jobs from Command Line




0. Sourcing DSHOME in Windows or Linux
In Windows - Add the DSENGINE/bin path to Windows PATH

dsjob-path

In Linux - Add the DSENGINE/bin path to Linux PATH or source the dsenv file



1. Listing Project:
dsjob –lprojects         ---List of all projects
dsjob -authfile filename -lprojects  ---If needed username and password for dsjob

dsjob-listproject

2. Listing Jobs:
dsjob –ljobs project    ---List of all Jobs under a project
dstage1 - project name

dsjob-listjobs

3. Listing Stages:
dsjob –lstages [useid] project job|job_id --- List of all stages in a job
xfm7 -- jobname

dsjob-lstages

4. Listing Links:
dsjob –links [useid] project job|job_id stage  --- List of the links within a job design
xfm7 - jobname
Transformer_1 - stagename

dsjob-llinks

5. Listing Parameters:
dsjob –lparams [useid] project job|job_id  ---List of all parameters used within a Job
This job is not having any parameter so <none> in output.

dsjob-lparams

6. Displaying Job Information:
dsjob –jobinfo [useid] project job|job_id

dsjob-jobinfo

7. Displaying stage information
dsjob –stageinfo [useid] project job|job_id

dsjob-stageinfo

8. Displaying Link Information
dsjob –linkinfo [useid] project job|job_id stage link

dsjob-linkinfo

9. Adding a Log Entry
dsjob –log [–info | –warn] [useid] project job|job_id

dsjob-log

10. Displaying a Short Log Entry
dsjob –logsum [–type type] [ –max n ] [useid] project job|job_id

dsjob-logsum

11. Generating a Report:
dsjob –report [useid] project job|job_id [report_type]

dsjob-report


You can find many more post like this HERE



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx