Wednesday, 30 September 2015

Shell Script Scenario #5 - Letters order

Design a script to get below output -
Requirement - Check whether all the letters in word is in order or not.


almost is in ORDER
cereal is not in ORDER 

For more scenario -  CLICK HERE

Tuesday, 29 September 2015

DataStage Scenario #6 - Populate 'Not Found'

       Design a job which read a flat file and populate the output as NOT FOUND which is not available in the other file.

Example Input:

Input1 Input2
Atul Amruta
Neeraj Atul
Anita Divya
Amruta Santosh
Divya Pramod



Like the below page to get update!forum/datagenx

Monday, 28 September 2015

Tips - DataStage Annotation Stage is disabled or grayed out

Sometimes, we have faced this problem that Annotation and Description Annotation is disabled or grayed out and we are unable to use it in our new design.

We can fix this by following below steps -

a. Open a blank parallel design workspace.
b. Now go to Menu -> Diagram -> Click on 'Show Annotations' (pencil icon)

Like the below page to get update!forum/datagenx

Friday, 25 September 2015

5 Tips for better DataStage Design #1

01.  Always try to complete the requirement first, Don't spoil your time to develop an optimized job which is practically useless until requirement is completed.

02.  To re-arrange an existing job design, or insert new stage types into an existing job flow, first disconnect the links from the stage to be changed, then the links will retain any meta-data associated with them.

03.  A Lookup stage can only have one input stream, one output stream, and, optionally, one reject stream. Depending on the type of lookup, it can have several reference links. To change the use of particular Lookup   links in an existing job flow, disconnect the links from the Lookup stage and then right-click to change the link type, for example, Stream to Reference.

04.  The Copy stage is a good place-holder between stages if you anticipate that new stages or logic will be needed in the future without damaging existing properties and derivations. When inserting a new stage, simply drag the input and output links from the Copy place-holder to the new stage. Unless the Force property is set in the Copy stage, InfoSphere DataStage optimizes the actual copy out at runtime.

05.  DataStage takes longer to start-up a job for reasons such as it validates the entire environment, nodes and database connectivity before starting the ETL job. By doing so you have overheads upfront by it ensures robust and reliable data loading and integration. Parallel jobs are not recommended for small volume of data and serial fashion of ETL design, as there is an overhead is starting PX processes.

Like the below page to get update!forum/datagenx

Thursday, 24 September 2015

Informatica Server Variables Default Values

Domain --> select Integration Service(IS)-->Processes--> General Properties 

and then edit the relevant properies to point correct path. 
You may need to restart the IS service to get effect of these changes 

Like the below page to get update!forum/datagenx

Wednesday, 23 September 2015

Shell Script Scenario #6 - Shuffle the input

- Design a shell script which take some character or number or string as a input separated by space and generate a shuffled output from the input. The output should be random.

1 2 3 4 5 6 7

Output1: #when we run script 1st time (it can differ from your output)
7 1 4 2 3 6 5

Output2: #when we run script next time (it can differ from your output)
5 1 7 3 2 6 4


a p s t u b

Output: #when we run script 1st time (it can differ from your output)
t b p s a u

Like the below page to get update!forum/datagenx

Tuesday, 22 September 2015

Linux Shell Script Scenario #4

- Create a script which counts the number of users currently logged in.
- Create a script which backs up all the dotfiles for the current user.

- Create a script which converts all file names to lowercase and removes whitespaces.
- Create a script which check whether .bashrc or .kshrc file has been setup for user or not.

For more scenario -  CLICK HERE

Monday, 21 September 2015

DataStage Scenario #5 - Oracle Rank & Rownum


a. Design a job where we read the data from squential file.
b. Analyze the data into job and generate these 3 columns

Example Input:


Output Expected:

Alpha            1      1            1
Alpha            2      1            1
Alpha            3      1            1
Beta            4      4            2
Charlie            5      5            3
Charlie            6      5            3
Delta            7      7            4
Gama            8      8            5

Like the below page to get update!forum/datagenx

Monday, 14 September 2015

DataStage Scenario #2 - Find Palindrome Number

Develop a datastage job to identify whether input data is a palindrome number or not, If it is, need to print the output as below - 



For more Scenario -  CLICK HERE

Friday, 11 September 2015

Linux Shell Script Scenario #3

Design a Shell Script which have to work for below scenario :-

We are having two files, File1 is having a 20,000 lines of data and File2 is having few lines, Now the requirement is - Read the File2 and remove the line from File1 which is having those word.

Suppose -  

File1 ---
TCS, Tata Consultancy Services
IBM, International Business Machine
TD, Teradata
IM, Information Management
WM, WareHouse Management
WC, Work Center


Output :
TD, Teradata
IM, Information Management
WC, Work Center

For more scenario -  CLICK HERE

Column auto mapping in Informatica

Informatica PowerCenter Designer can automatically link ports ( Columns) between transformations both by position and by name.

Go to Manu bar --> click Layout > Autolink by Name ( Or Autolink by position)
select the mapping object that you want to link (e.g. source) and drag the selected ports to another mapping object ( e.g. target)

Like the below page to get update!forum/datagenx

Thursday, 10 September 2015

DataStage Scenario #4 - Insert the 3rd Highest Salary

Scenario :
          Design a job which will read a sequential file having employee data ( name, job, salary, departmenr) and load the 3rd highest salary in a department to the target.

Example Input :

 name  salary  department
Atul 45000 ETL
Neeraj 40000 ETL
Swapna 62012 Oracle
Divya 84699 Reporting
Raj 17386 SAP
Venki 30073 Reporting
Partha 52760 ETL
Akhil 75447 SAP
Nitin 29834 Reporting
Kumar 84221 Oracle
Amish 38608 SAP
Kawshik 51023 Reporting
Surya 63438 ETL
Murali 75853 Reporting
Rajesh 88268 SAP

Output Expected

 name  salary  department
Partha 52760 ETL
Kawshik 51023 Reporting
Akhil 75447 SAP

For more Scenario -  CLICK HERE

Like the below page to get update!forum/datagenx
Follow my blog with Bloglovin

Wednesday, 9 September 2015

DataStage Terminology

 DataStage Term                     Description
DataStage Administrator Client program that is used to manage the project of DataStage.
DataStage server DataStage server engine component that links to DataStage client programs.
DataStage Director Execution and monitoring of the program that you want to use DataStage job.
DataStage Designer Graphical design tool that developers use to design and development of the DataStage job.
DataStage Manager Program used to manage the contents of the repository and see DataStage. Please refer to the "DataStage Manager Guide."
Stage Component that represents the data source for DataStage job, and process steps.
Source I pointed to the source of data extraction. STEPS, such as, for example.
Target Refers to the destination of the data. For example, point to the file to be loaded to AML (output by DataStage).
Category Category names used to categorize the jobs in DataStage.
Container Process of calling a common process.
Job Program that defines how to do data extraction, transformation, integration, and loading data into the target database.
Job templates And job processing is performed similar role model.
Job parameters Variables that are included in the job design. (For example,. Is the file name and password, for example)
Job sequence Control is a job for a start and run other jobs.
Scratch disk Disk space to store the data set, such as virtual record.
Table definition Definition that describes the required data includes information about the associated data tables and columns. Also referred to as metadata.
Partitioning For high-speed processing of large amounts of data, the mechanism of DataStage to perform split of the data.
Parallel engine Engine running on multiple nodes to control jobs DataStage.
Parallel jobs Available to parallel processing, the DataStage job type.
Project Jobs and a collection of components required to develop and run. The entry level to DataStage from the client. DataStage project must be licensed.
Metadata Data about the data. For example, a table definition that describes the columns that are building data.
Link Each stage of the job combined to form a data flow or reference lookup.
Routine Functions that are used in common.
Column definition Define the columns to be included in the data table. Contains the names and data types of the columns that make up the column.
Environmental parameters Variables that are included in the job design. For example, the file name and password.
DB2 stage Stage to be able to read and write to DB2 database.
FileSet stage Collection of files used to store the data.
Lookup Stage Performing processing such as a file reference to be used in DataStage or text file or table.
LookupFileSet Stage The process of storing a lookup table.
Sequential Stage Want to manipulate text files.
Custom Stage The process can not be implemented in stages, which is provided as standard DataStage, the process of implementing the language C.
Copy Stage The process of copying the data set.
Stage Generator The process of generating a dummy data set.
DataSet stage Data file to be used in the parallel engine.
Funnel Stage The process of copying the data from one set of multiple data sets.
Filter Stage Processing to extract records from the input data set.
Merge Stage Processing the join more than one input record.
LinkCorrector Stage Process of collecting the data that was previously divided.
RemoveDuplicate Stage The process of removing duplicate entries from the data set.
ChangeCapture Stage The process of comparing the two data sets, the difference between the two records.
RowImport Stage Process of importing a column from a string or binary column.
RowExport Stage Processing to export a column of another type string or binary column.
Transformer Stage Processing to edit item, or type conversion.
Modify Stage Process of conversion to the specified data type, the conversion of the value to the specified value NULL.
XML input stage Reading and processing of XML file, the extraction of the required element from the XML data.
Sort Stage Process to sort data in ascending or descending order.
Join Stage Processing the join more than one input record.
RowGenerator Stage Process of adding a line to an existing dataset
ColumnGenerator Stage Process of adding a column to an existing dataset
Aggregator stage The process of aggregating the data.
Pivot Stage The process of converting multiple columns into multiple rows of records.
Peek Stage Treatment to destroy the records.
Stream link Link that represents the flow of data.
Reference Links Input link that represents the reference data.
Reject link Link to output the data that do not meet the criteria you specify.
Integer type Data type that represents the integer value.
Decimal type Data type represents a number that contains the value of the decimal point.
NULL value Specific value indicating the unknown value. 0, not the same as a blank or empty sequence.
DataSet Collection of data.
SQLStatement Statement to manipulate the data in the table.
TWS Stands for Tivoli Workload Scheduler. The name of the product you want to create a job net.
Hash One way of dividing the data specified in the partitioning function of DataStage. Partitioning is performed by using a hash value.
Modulus One way of dividing the data specified in the partitioning function of DataStage. I do partitioning by dividing the number of partitions in the specified value.
Same One way of dividing the data specified in the partitioning function of DataStage. Processing as input partition without subdivision has been output by the previous stage.
Job Properties Property sheet to make settings for the job.

Like the below page to get update!forum/datagenx