Showing posts with label tricks. Show all posts
Showing posts with label tricks. Show all posts

Monday, 11 January 2016

Cartesian Join - A Quick DataStage Recipe


What to Cook:
How to do Cartesian join -> Join every row of a table to every row of other table





Ingredients:
Use "column generator" stage

How to Cook:
Add both tables in DS job
Add dummy column using "Column Generator" stage each for both tables. Make sure dummy column values are same.
Join both tables using these  dummy columns






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Sunday, 10 January 2016

A Quick DataStage Recipe


Under this series, I am trying to cook some quick solution for DataStage problems, issues, technical implementation of re-usable logics which we face in day to day task.

       
        Hope you will find them useful and helping. Keep looking for this space.

A Quick DataStage Recipe -> http://www.datagenx.net/search/label/aQDsR?max-results=12





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 8 January 2016

5 Tips For Better DataStage Design #7



#1. In case the partition type for the next immediate stage is to be changed then the ‘Propagate partition’ should be set to ‘Clear’ in the current stage.



#2. Make sure that appropriate partitioning and sorting are used in the stages, where ever possible. This enhances the performances. Make sure that you understand the partitioning being used. Otherwise leave it auto.

#3. For fixed width files, final delimiter should be set to 'none' in the file format property.

#4. If any processing stage requires a key ( like remove duplicate, merge, join, etc ) the Keys, sorting keys and Partitioning keys should be same and in the same order

#5. To improve Funnel, all the input links must be hash partitioned on the sort keys.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 28 December 2015

Python Points #5 - Lists

Tuesday, 22 December 2015

Notepad++ tip - Find out the non-ascii characters


Working on some code and when try to compile or run arrrrrr, got a non-ascii char error ?????
Now how to resolve this, here is the way if you are using notepad++ as a text editor.



1. Ctrl-F ( View -> Find )
2. put [^\x00-\x7F]+ in search box
3. Select search mode as 'Regular expression'
4. Volla !!

This will help you to track or replace all non-ascii charater in text file.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 20 November 2015

5 Tips For Better DataStage Design #5



#1. Use the Data Set Management utility, which is available in the Tools menu of the DataStage Designer or the DataStage Manager, to examine the schema, look at row counts, and delete a Parallel Data Set. You can also view the data itself.

#2. Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping options and sort indicator options.

#3. for quick checking if DS job is running on Server or not, from UNIX
ps -ef | grep 'DSD.RUN'



#4. Make use of Order By clause when a DB stage is being used in join. The intention is to make use of Database power for sorting instead of Data Stage resources. Keep the join partitioning as Auto. Indicate don’t sort option between DB stage and join stage using sort stage when using order by clause.

#5. There are two types of variables - string and encrypted. If you create an encrypted environment variable it will appears as the string "*******" in the Administrator tool.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 2 November 2015

5 Tips For Better DataStage Design #4



1) While using AsInteger() function in datastage transformer always trim the imput column before passing it to function because if there are extra spaces or unwanted characters which generates zeros when actual integer values are expected. We should use APT_STRING_PADCHAR=0x20 (space) env var for fixed field padding.

2) The Len(col) will return the incorrect length if the input column is having some non-ASCII or double byte characters in it. So check your NLS settings for the job to fix this.



3) To remove embedded spaces from decimal data, use StripWhiteSpace(input.field) function to remove all spaces.

4) To get the datastage job no, Open the log view of the job in datastage director and double click on any entry of the log. The job number will be listed under the field "Job Number:"

5) Set these 2 parameters APT_NO_PART_INSERTION, APT_NO_SORT_INSERTION to TRUE to avoid datastage to insert partitioning or sorting method to improve the job performance at compile job. This will remove the warning also "When checking operator: User inserted sort "<name>" does not fulfill the sort requirements of the downstream operator "<name>""





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Tuesday, 27 October 2015

5 Tips For Better DataStage Design #3



• Stage Variables allow you to hold data from a previous record when the next record, allowing you to compare between previous and current records. Stage variables also allow you return multiple errors for a record of information. By being able to evaluate all data in a record and not just error on the first exception that is found, the cleanup of data is more efficient and requires less iteration.

• Use Annotations for describing steps done at stages. Use Description Annotation as job title; as Description Annotation also appears in Job properties>Short Job Description and also in the Job Report when generated.



• When mapping a decimal field to a char field or vice versa , it is always better to convert the value in the field using the "Type Conversion" functions  "DecimalToString" or "StringToDecimal" as applicable while mapping.

• Always use Hash Partition in Join and Aggregator stages. The hash key should be the same as the key used to join/aggregate. If Join/Aggregator stages do not produce desirable results, try running in sequential mode (verify results; if still incorrect problem is with data/logic) and then run in parallel using Hash partition.

• In Job sequences; always use "Reset if required, then run" option in Job Activity stages. (Note: This is not a default option)





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 26 October 2015

Linux One Liners #3

Tuesday, 20 October 2015

NULL behavior in datastage transformer


NULL - It is always a challenge for developers or architect while dealing with NULL in data. Every projects need some base rules to process NULLs.
     Today, we will see how the NULL behave in datastage, I hope, this will help you design a better job or flow to process the NULLs.



1. Null values can now be included in any expression

2. Null values no longer need to be explicitly handled

3. A null value in an expression will return a null value result. Example:
           1 + null = null
           "abc":null = null
           trim(null) = null

4. Exception: IsNull(null) and IsNotNull(null) will return true or false as expected

5. Any comparison involving null is unknown (i.e. is not true). Example:
          1 < null is not true
          1 >= null is not true
         null == null is not true
         null != null is not true

6. Logical operators - null is unknown (i.e. is not true)
        True AND null is not true
        True OR null is true
        False OR null is not true

7. As long as the target column is nullable records will not be dropped

8. Stage Variables are now always nullable

9. Stage Variable derivations can now include any input fields




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Tuesday, 13 October 2015

5 Tips for better DataStage Design & Administration #2


1. In a DataStage server environment with more than one version of DataStage installed in the dsjob command you must specify the server name and port number of the DataStage version that you want the ksh script to run on.
dsjob -domain NONE -user username -password password -server servername:portno -run script1



2.  While doing column mapping in designer, always checks whether source/target column have same Null properties, if not, it should be handled explicitly.

3.  Nulls are a curse when it comes to using functions/routines or normal equality type expressions.
E.g. NULL = NULL doesn’t work; neither does concatenation when one of the fields is null. Changing the nulls to 0 or “” before performing operations is recommended to avoid erroneous outcomes.

4. When using String functions on decimal always use Trim function to avoid as String functions interpret an extra Space used for sign in decimal.

5. When you need to get a substring (e.g. first 2 characters from the left) of a character field:
Use <Field Name>[1,2]
Similarly for a decimal field then:
Use Trim(<Field Name>)[1,2]





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Thursday, 1 October 2015

Get currently running DataStage jobs



To get a list of DataStage jobs that are running use a command :

ps -ef | grep DSD.RUN


i.e. - 

>ps -ef | grep -v grep | grep DSD.RUN



"Inv_Rep3_word" and "stand1" is the running job names.

We can use this command in shell script to get the list of running jobs whenever we want or can schedule in cron to get hourly report.

To get the job list only, use below command - 

ps -ef | grep DSD.RUN | grep -v grep | awk '{print $10}' 

This will give you the running job list, awk command need to be modify as per your server installation host whether it is Linux or Windows.


Script:
Shell Script for listing out Running Datastage jobs





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 28 September 2015

Tips - DataStage Annotation Stage is disabled or grayed out



Sometimes, we have faced this problem that Annotation and Description Annotation is disabled or grayed out and we are unable to use it in our new design.

We can fix this by following below steps -

a. Open a blank parallel design workspace.
b. Now go to Menu -> Diagram -> Click on 'Show Annotations' (pencil icon)





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 25 September 2015

5 Tips for better DataStage Design #1



01.  Always try to complete the requirement first, Don't spoil your time to develop an optimized job which is practically useless until requirement is completed.

02.  To re-arrange an existing job design, or insert new stage types into an existing job flow, first disconnect the links from the stage to be changed, then the links will retain any meta-data associated with them.



03.  A Lookup stage can only have one input stream, one output stream, and, optionally, one reject stream. Depending on the type of lookup, it can have several reference links. To change the use of particular Lookup   links in an existing job flow, disconnect the links from the Lookup stage and then right-click to change the link type, for example, Stream to Reference.

04.  The Copy stage is a good place-holder between stages if you anticipate that new stages or logic will be needed in the future without damaging existing properties and derivations. When inserting a new stage, simply drag the input and output links from the Copy place-holder to the new stage. Unless the Force property is set in the Copy stage, InfoSphere DataStage optimizes the actual copy out at runtime.

05.  DataStage takes longer to start-up a job for reasons such as it validates the entire environment, nodes and database connectivity before starting the ETL job. By doing so you have overheads upfront by it ensures robust and reliable data loading and integration. Parallel jobs are not recommended for small volume of data and serial fashion of ETL design, as there is an overhead is starting PX processes.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Thursday, 24 September 2015

Informatica Server Variables Default Values



Domain --> select Integration Service(IS)-->Processes--> General Properties 

and then edit the relevant properies to point correct path. 
You may need to restart the IS service to get effect of these changes 




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 11 September 2015

Column auto mapping in Informatica



Informatica PowerCenter Designer can automatically link ports ( Columns) between transformations both by position and by name.

Go to Manu bar --> click Layout > Autolink by Name ( Or Autolink by position)
select the mapping object that you want to link (e.g. source) and drag the selected ports to another mapping object ( e.g. target)





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 31 August 2015

Linux Shell Scripting Best Practices





What is a script? 
A script is a set of commands, either in a file, or typed in at the command line, that perform multiple operations on a file or files.


How do they work? 
To run a file as a script, the file must have the execution bit set



Best Practices
  
¨     Writing clean code, using indents and using comments to explain parts of your code.
¨     Defining variables properly and avoid using unnecessary ones.
¨   Proper usage of functions. This would help edit, replace parts of the scripts easily without   proper functioning of the base script.
¨     Using log functions as when needed
¨     Putting usernames/passwords in scripts should be avoided.
¨     Hard-coding hostnames, files directory paths in a script should be avoided.
¨   The same form should be maintained throughout for the code. Inconsistent coding   styles make debugging a lot harder regardless of who's doing it.
¨     Proper documentation is very much needed.
¨  Proper testing needs to be done on Non-Production before implementing the scripts in a production environment.
¨     Generic scripts should have a common and understandable name.
¨    Understand the requirement and scope as where shell scripts can be implemented and where not.

Tuesday, 18 August 2015

Putty - Command Line Magic


We all are using ssh client PUTTY in day to day task and it is very irritating to login in different server again and again. Today, I come up with Putty command line by which we can make this so easy. Wanna to login, Just a click and Voila !!!



So let's start --

a.  First of all make a shortcut of your putty.exe file by right click --> Send to --> Desktop
b. ,This will display like below, just rename it with your server name or address to know which server is going to connect when we click on it.

c. Here, I have used my linux server  192,168,37.129
d. Now, Right click on Putty Shortcut  ---> Properties. This will display like below -


e. We have to edit the Target command  -- 

For me  :-
Server Address - 192.168.37.129
User Name - atul
Password - atul

edited command ---    -ssh user@server -pw password
For my case  ---    -ssh atul@192.168.37.129 -pw atul

Add this command to Target value , after whatever is existed there.  So Target's new value is ( in my case ) -   C:\_mine\putty.exe -ssh atul@192.168.37.129 -pw atul 



f. Click on Apply and OK.
g. For accessing 192.168.37.129 without entering username and password, simply click on edited shortcut.


** Caution ** :  Do not edit putty in public computer as your username and password is normal text which can be misused by anyone.