DataStage Partitioning #3

by Atul Singh on November 13, 2016 in Concept, Datastage, Hash, Modulus, Partitioning, Same, Stage, Standards, storage, technique

Best allocation of Partitions in DataStage for storage area

Srno	No of Ways	Volume of Data	Best way of Partition	Allocation of Configuration File (Node)
1	DB2 EEE extraction in serial	Low	-	1
2	DB2 EEE extraction in parallel	High	Node number = current node (key)	64 (Depends on how many nodes are allocated)
3	Partition or Repartition in the Stages of DataStage	Any	Modulus (It should be single key that to integer) Hash (Any number of keys with different data type)	8 (Depends on how many nodes are allocated for the job)
4	Writing into DB2	Any	DB2	-
5	Writing into Dataset	Any	Same	1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
6	Writing into Sequential File	Low	-	1

Best allocation of Partitions in DataStage for each stage

S. No	Stage	Best way of Partition	Important points
1	Join	Left and Right link: Hash or Modulus	All the input links should be sorted based on the joining key and partitioned with higher key order.
	Lookup	Main link: Hash or same Reference link: Entire	Both the links need not be in the sorted order
	Merge	Master and update link: Hash or Modulus	All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.
	Remove Duplicate, Aggregator	Hash or Modulus	If the input link is in sorted order based on the key it will perform better.
	Sort	Hash or Modulus	Sorting happens after partitioning
	Transformer, Funnel, Copy, Filter	Same	None
7	Change Capture	Left and Right link: Hash or Modulus	Both the input links should be in the sorted order based on the key and partitioned with higher key order.

Like the below page to get update
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

About Atul Singh
I am a Data Consultant at a Canadian financial firm. My keen interests varies from Data Analytics, ML, Kubernetes, NLP to ETL. I love to blog and travel in my spare time. If you’d like to get in touch, feel free to say hello through any of the social links.

DataGenX - Atul's Scratchpad

Breaking

Sunday, November 13, 2016

DataStage Partitioning #3

Best allocation of Partitions in DataStage for storage area

Best allocation of Partitions in DataStage for each stage

No comments:

Post a Comment

-

Follow Us

Search This Blog

Blog Archive

Disclaimer