Monday, 5 October 2015

Hashing & Sorting Criteria in stages

As we all aware about the best partitioning method is Round Robin but this method distribute the whole data to all the partition irrespective of Key ( Round Robin is Keyless partitioning method) which is usually we do not want and when we consider the key, It's Hash.

              DataStage sorting and hashing improves the data processing speed which is one of our targets to achieve in projects. So, let's create a list of some important stages and see whether they need the partitioning or sorting to perform better.

Stages Partition(Hash) Sort
Sort Yes No
Aggregator Yes Yes
Join Yes Yes
Remove Duplicate No No
Merge Yes Yes
Lookup No No

