Showing posts with label Same. Show all posts
Showing posts with label Same. Show all posts

Monday, 14 November 2016

DataStage Partitioning #3



Best allocation of Partitions in DataStage for storage area

Srno
No of Ways
Volume of Data
Best way of Partition
Allocation of Configuration File (Node)
1
DB2 EEE  extraction in serial
Low
-
1
2
DB2 EEE extraction in parallel
High
Node number = current node (key)
64 (Depends on how many nodes are allocated)
3
Partition or Repartition in the Stages of DataStage
Any
Modulus (It should be single key that to integer)
Hash (Any number of keys with different data type)
8 (Depends on how many nodes are allocated for the job)
4
Writing into DB2
Any
DB2
-
5
Writing into Dataset
Any
Same
1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
6
Writing into Sequential File
Low
-
1

 

Best allocation of Partitions in DataStage for each stage

S. No
Stage
Best way of Partition
Important points
1
Join
Left and Right link: Hash or Modulus
All the input links should be sorted based on the joining key and partitioned with higher key order.

  1.  
Lookup
Main link: Hash or same
Reference link: Entire
Both the links need not be in the sorted order

  1.  
Merge
Master and update link: Hash or Modulus
All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.

  1.  
Remove Duplicate, Aggregator
Hash or Modulus
If the input link is in sorted order based on the key it will perform better.

  1.  
Sort
Hash or Modulus
Sorting happens after partitioning


Transformer, Funnel, Copy, Filter
Same
None
7
Change Capture
Left and Right link: Hash or Modulus
Both the input links should be in the sorted order based on the key and partitioned with higher key order.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Friday, 11 November 2016

DataStage Partitioning #2


Keyless partition
    Round Robin
    Entire
    Same
    Random

Round Robin
The first record goes to the first processing node, the second to the second processing node, and so on. When DataStage reaches the last processing node in the system, it starts over. This method is useful for resizing partitions of an input data set that are not equal in size. The round robin method always creates approximately equal-sized partitions. This method is the one normally used when DataStage initially partitions data.
Example: Assume 8 nodes are allocated to store the records then 1st record will go into the first node and the 2nd record will go into the second node ……8th record will go into the eighth node and the 9th record will go into the first node and so on….

Entire
Send all rows down all partitions.
Example: Assume 8 nodes are allocated, then in all the 8 nodes all the records will be passed.

Same
Preserve the same partitioning.
Example: Two stages in a job (Sort and Dataset). In sort stage you have done “Hash” partition and in the dataset you have given “Same” partition. In the dataset the data will be preserved with the hash partition.

Random
DataStage uses a random algorithm to choose where the rows goes. The result of Random is that you cannot know where a row will end up.


Application Execution: Parallel jobs can be executed in two ways
> Sequential
> Parallel
In the first slot of the below figure sequential execution is shown. Parallel job can be executed in two processing, SMP and MPP, in the second and third slot of the below figure it is shown.




Ref - www.ibm.com


Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/