My e-Notes about DataScience, Machine Learning, Python, Data Analytics, DataStage, DWH and ETL Concepts

Breaking

Sunday, 4 October 2015

Hashing & Sorting Criteria in stages


As we all aware about the best partitioning method is Round Robin but this method distribute the whole data to all the partition irrespective of Key ( Round Robin is Keyless partitioning method) which is usually we do not want and when we consider the key, It's Hash.

              DataStage sorting and hashing improves the data processing speed which is one of our targets to achieve in projects. So, let's create a list of some important stages and see whether they need the partitioning or sorting to perform better.



Stages Partition(Hash) Sort
Sort Yes No
Aggregator Yes Yes
Join Yes Yes
Remove Duplicate No No
Merge Yes Yes
Lookup No No








Like the below page to get update  

No comments:

Post a comment

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.