Showing posts with label transformer. Show all posts
Showing posts with label transformer. Show all posts

Thursday, 14 September 2017

Evaluation Sequence in Transformer Stage - A Quick DataStage Recipe



Recipe:

What is evaluation sequence in Transformer Stage Or Order of Stage & Loop Variable and Derivations

Ingredients:

1. Transformer Stage
     a. Stage Variables
     b. Loop Variables
     c. Derivations


How To:

Evaluate each stage variable initial value
For each input row to process:
Evaluate each stage variable derivation value, unless the derivation is empty
For each output link:
Evaluate each column derivation value
Write the output record
Next output link
Next input row


** The stage variables and the columns within a link are evaluated in the order in which they are displayed in the Transformer editor. Similarly, the output links are also evaluated in the order in which they are displayed




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 5 July 2017

Conditionally Aborting Jobs with Transformer Stage


How to develop a job which will stop processing when FROM_DATE and TO_DATE is equal in data? Or
I want to abort the job when reject row count is more than 50?

Above scenarios can be implemented using Transformer Stage but How? Let's check this out -

  • The Transformer can be used to conditionally abort a job when incoming data matches a specific rule. 
    • In our case 1, it is FROM_DATE  = TO_DATE 
    • In our case 2, it is some reject condition 
  • Create a new output link that will handle rows that match the abort rule. 
  • Within the link constraints dialog box, apply the abort rule to this output link
  • Set the “Abort After Rows” count to the number of rows allowed before the job should be aborted .
    • In case 1, it should be 1. as we want to abort the job when FROM_DATE is equal to TO_DATE
    • In case 2, it should be 50 as we want to abort the job when reject condition have more than 50 records
xfm

But, since the Transformer will abort the entire job flow immediately, it is possible that valid rows will not have been flushed from Sequential File (export) buffers, or committed to database tables.
It is important to set the Sequential File buffer flush  or database commit parameters otherwise we have to manually remove the data which has been inserted into sequential file or database.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 6 May 2016

DS Fatal Error: Destination "APT_TRinput0Rec0" is already bound


Fatal Error: Destination "APT_TRinput0Rec0" is already bound - Transformer Stage Error


Solutions:
* Check if the output stage is having identical column names
* Check if RCP is enabled in input links

If yes,
Rename the target output name accordingly
or Disable the RCP>





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 3 May 2016

Otherwise Constraint - A Quick DataStage Recipe


Recipe:

How to use "Otherwise" constraint in Transformer Stage


www.datagenx.net

How To:

To use "Otherwise" constraint in Transformer stage, Order of link is important.
Typically link with "Otherwise" constraint should be last in Transformer stage link order




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 10 April 2016

5 Tips For Better DataStage Design #12



1. Minimum number of sort stages should be use in a datastage job. “Don’t sort if previously sorted” in sort Stage, this option should be set this to “true”, which improves the Sort Stage performance. The same Hash key should be used.  In Transformer Stage “Preserve Sort Order” can be used to maintain sort order option.

2. Minimum number of stages should be used in a job; otherwise it affects the performance of the job.
If a job is having more stages then the job should be decomposed into smaller number of small jobs. The use of container is a best way for better visualize and readability. If the existing active stages occupy almost all the CPU resources, the performance can be improved by running multiple parallel copies of the same stage process. This is done by using a share container.





3. Use of minimum of Stage variables in transformer is a good practice. The performance degrades when more stage variables are used.

4. The use of column propagation should be taken care . Columns, which are not needed in the job flow, should not be propagated from one Stage to another and from one job to the next. The best option is to disable the RCP.

5. When there is a need of renaming columns or addition of new columns, use of copy or modify stage is good practice.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 22 March 2016

Transformer Stage alternative - A Quick DataStage Recipe



What to use instead of "Transformer" Stage

Copy Stage

Use "Copy" stage instead of "Transformer" Stage for following:
Renaming columns
Dropping columns
Default type conversions
Job design placeholder between stages


Modify Stage

Use "Modify" stage
Non default type conversions
Null handling
Character string trimming



Filter Stage

Use "Filter" Stage
Using constraints on output data


Will add more.......






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 11 December 2015

DataStage Scenario #11 - Get numeric or alphabets only



Goal - Extract numeric part and alpha part from a string as below

Input:

Source
ATUL1234
SINGH374
I23S
C343LEAR





Output -

Source Part1  Part2
ATUL1234 ATUL 1234
SINGH374 SINGH 374
I23S IS 23
C343LEAR CLEAR 343








Like the below page to get update  



Wednesday, 21 October 2015

DataStage Scenario #8 - Populate date between given two



Scenario :

Design a job with 2 job parameters -



1. Start Date   ( DD-MM-YYYY )
2. End Date  ( DD-MM-YYYY )

Read the parameter and populate all the date between these 2 which is not "SUNDAY"





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Tuesday, 20 October 2015

NULL behavior in datastage transformer


NULL - It is always a challenge for developers or architect while dealing with NULL in data. Every projects need some base rules to process NULLs.
     Today, we will see how the NULL behave in datastage, I hope, this will help you design a better job or flow to process the NULLs.



1. Null values can now be included in any expression

2. Null values no longer need to be explicitly handled

3. A null value in an expression will return a null value result. Example:
           1 + null = null
           "abc":null = null
           trim(null) = null

4. Exception: IsNull(null) and IsNotNull(null) will return true or false as expected

5. Any comparison involving null is unknown (i.e. is not true). Example:
          1 < null is not true
          1 >= null is not true
         null == null is not true
         null != null is not true

6. Logical operators - null is unknown (i.e. is not true)
        True AND null is not true
        True OR null is true
        False OR null is not true

7. As long as the target column is nullable records will not be dropped

8. Stage Variables are now always nullable

9. Stage Variable derivations can now include any input fields




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Tuesday, 29 September 2015

DataStage Scenario #6 - Populate 'Not Found'



Scenario:
       Design a job which read a flat file and populate the output as NOT FOUND which is not available in the other file.

Example Input:

Input1 Input2
Atul Amruta
Neeraj Atul
Anita Divya
Amruta Santosh
Divya Pramod
Swapnil
Pramod
Vivek
Ashish
Amit
Santosh



Output:

Output
Neeraj
Anita
Swapnil
Vivek
Ashish
Amit



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 14 September 2015

DataStage Scenario #2 - Find Palindrome Number





Develop a datastage job to identify whether input data is a palindrome number or not, If it is, need to print the output as below - 






Input:
77 
161 
168 
252 
259 
343 
434 
525 
595 
616 
686 
700 
707 
770 
777 
861



Output:
77
161
252
343
434
525
595
616
686
707
777


For more Scenario -  CLICK HERE




Friday, 28 August 2015

Datastage Transformer String Functions - Index





Index(String, Substring, Occurrence):   
   
It will returns starting character position of substring. We can use it as a 'grep' command of Linux when we have to check if given character or substring is existed in input string or not. 
        
Ex:-
1. Suppose, we have to check whether substring 'le' is existing in input column 'Incol' or not?

Transformation derivation  -
If Index(Incol, 'le', 1) > 0 then 'Exist' Else 'Not Exists'


2. Let's assume we are getting 'India' as a input string, what will be the output of below derivations?
     
Transformation derivation  -
Index('India', 'a', 1)         =  5
Index('India', 'i', 1)         =   4   # this will not give 1 as matching is case sensitive
Index('India', 'nd', 1)      =   2