Showing posts with label transformation. Show all posts
Showing posts with label transformation. Show all posts

Monday, 22 August 2016

5 Tips For Better DataStage Design #15



1. Stage variable does not accept null value. Hence no null able column should be directly mapped to stage variable without null handling.

2. Use of SetNull() function in stage variables should be avoided because it causes compilation error.

http://www.datagenx.net/2016/08/5-tips-for-better-datastage-design-15.html

3. If input links are not already partitioned on join key then they should be hash partitioned on the join key in join stage. In case of multiple join key it is recommended to partition on one key and sort by the other keys.

4. If there is a need to do the repartition on an input link then we need to clear the preserve partitioning flag in the previous stage. Otherwise it will generate warning in job log.

5. If database table has less volume of data as a reference then it is good to use lookup stage.

6. It is always advisable to avoid Transformation stage. Because the Transformation stage is not written in DataStage native language, instead it is written in c. So every time you compile a job it embeds the c code with the native code in the executable file, which degrades the performance of the job.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 10 April 2016

5 Tips For Better DataStage Design #12



1. Minimum number of sort stages should be use in a datastage job. “Don’t sort if previously sorted” in sort Stage, this option should be set this to “true”, which improves the Sort Stage performance. The same Hash key should be used.  In Transformer Stage “Preserve Sort Order” can be used to maintain sort order option.

2. Minimum number of stages should be used in a job; otherwise it affects the performance of the job.
If a job is having more stages then the job should be decomposed into smaller number of small jobs. The use of container is a best way for better visualize and readability. If the existing active stages occupy almost all the CPU resources, the performance can be improved by running multiple parallel copies of the same stage process. This is done by using a share container.





3. Use of minimum of Stage variables in transformer is a good practice. The performance degrades when more stage variables are used.

4. The use of column propagation should be taken care . Columns, which are not needed in the job flow, should not be propagated from one Stage to another and from one job to the next. The best option is to disable the RCP.

5. When there is a need of renaming columns or addition of new columns, use of copy or modify stage is good practice.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 22 March 2016

Transformer Stage alternative - A Quick DataStage Recipe



What to use instead of "Transformer" Stage

Copy Stage

Use "Copy" stage instead of "Transformer" Stage for following:
Renaming columns
Dropping columns
Default type conversions
Job design placeholder between stages


Modify Stage

Use "Modify" stage
Non default type conversions
Null handling
Character string trimming



Filter Stage

Use "Filter" Stage
Using constraints on output data


Will add more.......






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 11 December 2015

DataStage Scenario #11 - Get numeric or alphabets only



Goal - Extract numeric part and alpha part from a string as below

Input:

Source
ATUL1234
SINGH374
I23S
C343LEAR





Output -

Source Part1  Part2
ATUL1234 ATUL 1234
SINGH374 SINGH 374
I23S IS 23
C343LEAR CLEAR 343








Like the below page to get update  



Friday, 30 October 2015

UpCase and DownCase String Function : DataStage Transformation


Downcase(column):
                  This function converts all the input character to LOWER case where input can be in any case.

input = 'This is me'
input1 = 'this is me'
input2 = 'THIS IS ME'

DownCase(input)    =  'this is me'
DownCase(input1)   =  'this is me'
DownCase(input2)   =  'this is me'


Upcase(column): 
                  This function converts all the input character to UPPER case where input can be in any case.

input = 'This is upper case function'
input1 = 'this is me'
input2 = 'THIS IS ME'

UpCase(input)    =  'THIS IS UPPER CASE FUNCTION'
UpCase(input1)   =  'THIS IS ME'
UpCase(input2)   =  'THIS IS ME'





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Wednesday, 28 October 2015

Left and Right String Function - DataStage Transformation



Today, we will discuss two more datastage transformer functions - Left & Right.  These two functions use to get or cut the string from the input data, Let's have a look -


Left(input, number)  
input : - input column
number :- the no of characters you want to fetch from Left most of string


input = 'This is a test of fuctions."

Left(input,9)  =    'This is a'
Left(input, 3) =    'Thi'

Right(input, number)
input : - input column
number :- the no of characters you want to fetch from Right most of string


input = 'This is a test of fuctions."

Right(input,9)  =    'fuctions.'
Right(input, 14) =    't of fuctions.'


Use of Left/Right function to get SubString -

These functions used to get substring from an input if the start and end character location known or can be derived. Let's see an example -

we need to cut input from 2nd to 6th character

input = 'This is a test of fuctions."

left(input, 6) = 'This i'           #left function cut the left 6 character
Right(left(input,5),5) = 'his i'    #Use right func, to cut right 6-2+1 = 5 char 






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Tuesday, 20 October 2015

NULL behavior in datastage transformer


NULL - It is always a challenge for developers or architect while dealing with NULL in data. Every projects need some base rules to process NULLs.
     Today, we will see how the NULL behave in datastage, I hope, this will help you design a better job or flow to process the NULLs.



1. Null values can now be included in any expression

2. Null values no longer need to be explicitly handled

3. A null value in an expression will return a null value result. Example:
           1 + null = null
           "abc":null = null
           trim(null) = null

4. Exception: IsNull(null) and IsNotNull(null) will return true or false as expected

5. Any comparison involving null is unknown (i.e. is not true). Example:
          1 < null is not true
          1 >= null is not true
         null == null is not true
         null != null is not true

6. Logical operators - null is unknown (i.e. is not true)
        True AND null is not true
        True OR null is true
        False OR null is not true

7. As long as the target column is nullable records will not be dropped

8. Stage Variables are now always nullable

9. Stage Variable derivations can now include any input fields




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 28 August 2015

Datastage Transformer String Functions - Index





Index(String, Substring, Occurrence):   
   
It will returns starting character position of substring. We can use it as a 'grep' command of Linux when we have to check if given character or substring is existed in input string or not. 
        
Ex:-
1. Suppose, we have to check whether substring 'le' is existing in input column 'Incol' or not?

Transformation derivation  -
If Index(Incol, 'le', 1) > 0 then 'Exist' Else 'Not Exists'


2. Let's assume we are getting 'India' as a input string, what will be the output of below derivations?
     
Transformation derivation  -
Index('India', 'a', 1)         =  5
Index('India', 'i', 1)         =   4   # this will not give 1 as matching is case sensitive
Index('India', 'nd', 1)      =   2