Showing posts with label Standards. Show all posts
Showing posts with label Standards. Show all posts

Monday, 14 November 2016

DataStage Partitioning #3



Best allocation of Partitions in DataStage for storage area

Srno
No of Ways
Volume of Data
Best way of Partition
Allocation of Configuration File (Node)
1
DB2 EEE  extraction in serial
Low
-
1
2
DB2 EEE extraction in parallel
High
Node number = current node (key)
64 (Depends on how many nodes are allocated)
3
Partition or Repartition in the Stages of DataStage
Any
Modulus (It should be single key that to integer)
Hash (Any number of keys with different data type)
8 (Depends on how many nodes are allocated for the job)
4
Writing into DB2
Any
DB2
-
5
Writing into Dataset
Any
Same
1,2,4,8,16,32,64 etc… (Based on the incoming records it writes into it.)
6
Writing into Sequential File
Low
-
1

 

Best allocation of Partitions in DataStage for each stage

S. No
Stage
Best way of Partition
Important points
1
Join
Left and Right link: Hash or Modulus
All the input links should be sorted based on the joining key and partitioned with higher key order.

  1.  
Lookup
Main link: Hash or same
Reference link: Entire
Both the links need not be in the sorted order

  1.  
Merge
Master and update link: Hash or Modulus
All the input links should be sorted based on the merging key and partitioned with higher key order. Pre-sort makes merge “lightweight” for memory.

  1.  
Remove Duplicate, Aggregator
Hash or Modulus
If the input link is in sorted order based on the key it will perform better.

  1.  
Sort
Hash or Modulus
Sorting happens after partitioning


Transformer, Funnel, Copy, Filter
Same
None
7
Change Capture
Left and Right link: Hash or Modulus
Both the input links should be in the sorted order based on the key and partitioned with higher key order.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 1 March 2016

ETL Development Standards



These development standards provide consistency in the artifacts the ETL developers create.  This consistency improves testing, operational support, maintenance, and performance.  

Code Development Guidelines


1 Commenting in Code and Objects

As a primary guideline where it is possible and does not interfere with the operation of the applications, all code must contain a developer comment/note.  
All ETL jobs must have a proper annotation (short description of the functionality of the job).
The target output files (.csv files) should not contain any leading or trailing spaces.
While deciding record level delimiter, “Delimiter Collision” issue needs to be considered. No such delimiter should be used as a field defaults that is present as a part of data.

2 ETL Naming Standards

The standardized naming conventions ease the burden on developers switching from one project to another.  Knowing the names and where things are located are very useful to understand before the occurrence of the design and development phases.

The following table identifies DataStage elements and their standard naming convention.


2.1 Job and Properties Naming Conventions

GUI Component Entity Convention
Designer Parallel Job <<Application>>_<<job_Name>>_JOB
Designer Sequence  <<Application>>_<<job_Name>>_SEQ
Designer Server Job  <<Application>>_<<job_Name>>_SVR
Designer Parameter  <<Application>>_<<job_Name>>_PARM

2.2 Job Processing Stage Naming Conventions

GUI Component Entity Convention
Designer Aggregator  AGG_<<PrimaryFunction>>
Designer Copy  CP_<<PrimaryFunction>>
Designer Filter  FLT_<<PrimaryFunction>>
Designer Funnel  FNL_<<PrimaryFunction>>
Designer Join (Inner)  JN_<<PrimaryFunction>>
Designer FTP Enterprise FTP_<<PrimaryFunction>>
Designer Lookup  LKP_<< Value Name or table Name>>
Designer Merge  MRG_<<PrimaryFunction>>
Designer Modify  MOD_<<PrimaryFunction>>
Designer Sort  SRT_<<PrimaryFunction>>

2.3 Links Naming Conventions

GUI Component Entity Convention
Designer Reference (Lookup)  Lnk_Ref_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Reject (Lookup, File, DB)  Lnk_Rej_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Input  Lnk_In_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Output  Lnk_Out_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Delete  Lnk_Del_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Insert  Lnk_Ins_<<Number or Additional descriptor, if needed to form a unique object name>>
Designer Update  Lnk_Upd_<<Number or Additional descriptor, if needed to form a unique object name>>

2.4 Data Store Naming Conventions:

In the case of a data store, the class word refers to the type of data store (e.g. Dataset, Sequential File, Table, View, and so forth).

GUI Component Entity Convention
Designer Database  DB_<<DatabasName>>
Designer Table  TBL_<<TableName>>
Designer View  VIEW_<<ViewName>>
Designer Dimension  DM_<<TableName>>
Designer Fact  TRAN_<<TableName>>
Designer Source SRC_<<Table or Object Name>>
Designer  Target  TRGT_<<Table or objectName>>

2.5 File Stage Naming Conventions:

GUI Component Entity Convention
Designer Sequential File  SEQ_
Designer Complex Flat File  CFF_
Designer Parallel dataset  DS_







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/