Showing posts with label Parameter. Show all posts
Showing posts with label Parameter. Show all posts

Sunday, 19 November 2017

Get Job Parameters for all DataStage Job


Feels awesome while writing this post after 7 week, In this post I am going to share a script which will fetch the job parameters for all the datastage job available in given project.

This week, one of my colleague ask to compare all the jobs parameter from one project (older) to another project (migrated),  So, wrote down this script which worked like charm for me and hoping will work for you as well :-)





=


Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 21 November 2016

Reading DSParam - datastage parameter file



I am sharing a utility which can help you to read DSParam file which holds all the environmental datastage parameters.

Utility to view contents of DSParams file. Useful when trying to see what all the customer has set at the project level.



Usage:
$ cat DSParams | ./DSParamReader.pl | more
or
$ cat DSParams | ./DSParamReader.pl > outputfile


Instructions:
1. copy script text below to a file (DSParamReader.pl) on a UNIX system
2. Set execute permissions on this file. chmod 777 envvar.pl
3. Usually perl is in /usr/bin/perl but you might have to adjust this path if neccessary. (hint "which perl" should tell you which one to use)
4. cat the DSParams file from the project you are concerned with and redirect the output to this script. You may have to put the Fully Qualified Path for this file.
5. capture the output to screen or file. File may be useful to have the customer send the info to you in email.








Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Thursday, 14 April 2016

ETL Code Review Checklist




Guideline
Design jobs for restartability/ if not designed then what is the reason ?
Do not follow Sequential File stage with "Same" partitioning.
check if the APT_CONFIG_FILE parameter is added. This is required to change the number of nodes  during runtime.
Do not hard-code parameters.
ETL Code Review Checklist
Do not hard-code directory paths.
Do not use fork-joins to generate lookup data sets.
Use "Hash" aggregation for limited distinct key values.  Outputs after all rows are read.
Use "Sort" aggregation for large number of distinct key values.  Data must be pre-sorted.  Outputs after each aggregation group.
Use multiple aggregators to reduce collection time when aggregating all rows.  Define a constant key column using row generator.  First aggregator sums in parallel.  Second aggregator sums sequentially.
Make sure sequences are not too long.  Break up into logical units of work.
Is the error handling done properly? It is prefered to propogate errors from lower jobs to the highest level( ex a sequence)
What is the volume of extract data( is there a where clause in the SQL)
Are the correct scripts to clean up datasets after job complete revoked ?
Is there a reject process in place ?
Can we combine or split so we can reduce number of jobs or complexity respectively?
It is not recommended to have an increase in the number of nodes if there are too many stages in the job( this increases the number of processes spun off)
Volume information and growth information for the Lookup/Join tables?
Check if there is a select * in any of the queries. It is not advised to have select * , instead the required columns have to be added in the statement
Check the paritioning and sorting at each stage
When a sequence is used make sure none of the parameters passed are left blank
Check if there are separate jobs for atleast extract, transform and load 
Check if there is annotation for each stage and the job, the job properties should have the author,date etc filled out
Check for naming convention of the jobs, stages and links
Try avoiding peeks in production jobs, peeks are generally used for debug in the development
Make sure the developer has not suppressed many warnings that are valid
Verify that the jobs conform to the Flat File and Dataset naming specification.  This is especially important for cleaning up files and logging errors appropriately.
Verify that all fields are written to the Reject flat files.  This is necessary for debugging and reconciliation.







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 15 February 2016

5 Tips For Better DataStage Design #9



#1. Always save the metadata (for source, target or lookup definitions) in the repository to ensure re-usability and consistency.

#2. Make sure that the pathname/format details are not hard coded and job parameters are used for the same. These details are generally set as environmental variable.




#3. Ensure that all file names from external source are parameterized. This will prevent the developer from the trouble of changing the job or file name if the file name is changed. File names/Datasets created in the job for intermediate purpose can be hard coded.

#4. Ensure that the environment variable $APT_DISABLE_COMBINATION is set to ‘False’.
Ensure that $APT_STRING_PADCHAR is set to spaces.

#5. The parameters used across the jobs should be with same name. This helps to avoid unnecessary confusions

#6. Be consistent with where the slashes in the path live. Either in the design or the variableThomas McNicol




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 23 December 2015

5 Tips For Better DataStage Design #6



#1. If you are using a copy or a filter stage either immediately after or immediately before a transformer stage, you are reducing the efficiency by using more stages because a transformer does the job of both copy stage as well as a filter stage

#2. Work done by "COPY Stage"
a) Columns order can be altered.
b) And columns can be dropped.
c) We can change the column names.



#3. When you need to run the same sequence of jobs again and again, better create a sequencer with all the jobs that you need to run. Running this sequencer will run all the jobs. You can provide the sequence as per your requirement.

#4. Sort the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. Avoid the work done by DataStage which is possible in DB. But it doesn't mean you have to put all the complexity in SQL only, for that we are using datastage.

#5. Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 16 November 2015

List DataStage jobs which used this Parameter



Open the DataStage Administrator Client

Click on the Projects tab and select the project you would like to generate a list for.

Click the Command Button

In the command entry box type:

LIST DS_JOBS WITH JOBTYPE = 3 AND EVAL "TRANS('DS_JOBOBJECTS','J\':@RECORD<5>:'\ROOT',14,'X')" LIKE ...<VARNAME>...

<VARNAME> should be the name of the parameter or environment variable



Example:

LIST DS_JOBS WITH JOBTYPE = 3 AND EVAL "TRANS('DS_JOBOBJECTS','J\':@RECORD<5>:'\ROOT',14,'X')" LIKE ...TMPDIR...

Click Execute

If the output is on more than one page, click Next to page done and click Close when finished.


In this example, a job type of 3 is a parallel job. Valid job types value are:
0 = Server
1 = Mainframe
2 = Sequence
3 = Parallel





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 2 November 2015

5 Tips For Better DataStage Design #4



1) While using AsInteger() function in datastage transformer always trim the imput column before passing it to function because if there are extra spaces or unwanted characters which generates zeros when actual integer values are expected. We should use APT_STRING_PADCHAR=0x20 (space) env var for fixed field padding.

2) The Len(col) will return the incorrect length if the input column is having some non-ASCII or double byte characters in it. So check your NLS settings for the job to fix this.



3) To remove embedded spaces from decimal data, use StripWhiteSpace(input.field) function to remove all spaces.

4) To get the datastage job no, Open the log view of the job in datastage director and double click on any entry of the log. The job number will be listed under the field "Job Number:"

5) Set these 2 parameters APT_NO_PART_INSERTION, APT_NO_SORT_INSERTION to TRUE to avoid datastage to insert partitioning or sorting method to improve the job performance at compile job. This will remove the warning also "When checking operator: User inserted sort "<name>" does not fulfill the sort requirements of the downstream operator "<name>""





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx