5 Tips For Better DataStage Design #17

by Atul Singh on December 01, 2020 in Column, Datastage, design, designer, File, sequential, Stage, tips

1. For data being passed between Jobs, such as parallel jobs in a sequence, Datasets must be used and not sequential files.

2. Utility jobs to dump datasets to sequential files can be used for debugging

3. When processing Sequential Files with fixed length columns larger than 500 MB or with more than 50 columns
     - Define just one field with the total record length in the sequential file stage, then add a COLUMN IMPORT stage right after it to describe each column, data type, scale, etc.
     - Use the COLUMN EXPORT stage to create sequential files in the same manner (does the opposite of the COLUMN IMPORT stage).
      - Add the option 'Read from multiple nodes' to the sequential file stage and set to the number of partitions the job will be running with.

4. Full path names must be used when referencing files or scripts

5. Always use a file extension in filename.

Like the below page to get update
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

About Atul Singh
I am a Data Consultant at a Canadian financial firm. My keen interests varies from Data Analytics, ML, Kubernetes, NLP to ETL. I love to blog and travel in my spare time. If you’d like to get in touch, feel free to say hello through any of the social links.

DataGenX - Atul's Scratchpad

Breaking

Tuesday, December 1, 2020

5 Tips For Better DataStage Design #17

No comments:

Post a Comment

-

Follow Us

Search This Blog

Blog Archive

Disclaimer