Showing posts with label Random. Show all posts
Showing posts with label Random. Show all posts

Friday, 11 November 2016

DataStage Partitioning #2


Keyless partition
    Round Robin
    Entire
    Same
    Random

Round Robin
The first record goes to the first processing node, the second to the second processing node, and so on. When DataStage reaches the last processing node in the system, it starts over. This method is useful for resizing partitions of an input data set that are not equal in size. The round robin method always creates approximately equal-sized partitions. This method is the one normally used when DataStage initially partitions data.
Example: Assume 8 nodes are allocated to store the records then 1st record will go into the first node and the 2nd record will go into the second node ……8th record will go into the eighth node and the 9th record will go into the first node and so on….

Entire
Send all rows down all partitions.
Example: Assume 8 nodes are allocated, then in all the 8 nodes all the records will be passed.

Same
Preserve the same partitioning.
Example: Two stages in a job (Sort and Dataset). In sort stage you have done “Hash” partition and in the dataset you have given “Same” partition. In the dataset the data will be preserved with the hash partition.

Random
DataStage uses a random algorithm to choose where the rows goes. The result of Random is that you cannot know where a row will end up.


Application Execution: Parallel jobs can be executed in two ways
> Sequential
> Parallel
In the first slot of the below figure sequential execution is shown. Parallel job can be executed in two processing, SMP and MPP, in the second and third slot of the below figure it is shown.




Ref - www.ibm.com


Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/