Showing posts with label Pointer. Show all posts
Showing posts with label Pointer. Show all posts

Sunday, 18 December 2016

R Points #1 - Matrix & Factor Basics

Saturday, 17 December 2016

R Points #0 - Basics n Vector




=======================================================
=======================================================



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Thursday, 19 May 2016

5 Tips For Better DataStage Design #13



1. The query used in the database should be in such a way that required number of rows are fetched. Do not extract the columns which are not required.

2. For parallel jobs, sequential File should not be read using same partitioning.


http://www.datagenx.net/2016/05/5-tips-for-better-datastage-design-13.html


3. For huge amount of data, use of sequential file stage is not a good practice. This stage also should not be used for intermediate storage between jobs. It degrades the performance of the job.

4. The number of lookups in a job design should be minimum. Join stage is a good alternative to lookup stage.

5. For parallel jobs, proper portioning method is to be used for better job performance and accurate flow of data.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 7 February 2016

Python Points #10b - Reading Files

Saturday, 6 February 2016

Python Points #10a - Reading Files

Wednesday, 27 January 2016

Python Points #8 - Dictionary

Tuesday, 12 January 2016

Python Points #6 - Strings

Monday, 28 December 2015

Python Points #5 - Lists

Wednesday, 23 December 2015

5 Tips For Better DataStage Design #6



#1. If you are using a copy or a filter stage either immediately after or immediately before a transformer stage, you are reducing the efficiency by using more stages because a transformer does the job of both copy stage as well as a filter stage

#2. Work done by "COPY Stage"
a) Columns order can be altered.
b) And columns can be dropped.
c) We can change the column names.



#3. When you need to run the same sequence of jobs again and again, better create a sequencer with all the jobs that you need to run. Running this sequencer will run all the jobs. You can provide the sequence as per your requirement.

#4. Sort the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. Avoid the work done by DataStage which is possible in DB. But it doesn't mean you have to put all the complexity in SQL only, for that we are using datastage.

#5. Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 23 November 2015

Python points #3 - Comparison

Friday, 20 November 2015

5 Tips For Better DataStage Design #5



#1. Use the Data Set Management utility, which is available in the Tools menu of the DataStage Designer or the DataStage Manager, to examine the schema, look at row counts, and delete a Parallel Data Set. You can also view the data itself.

#2. Use Sort stages instead of Remove duplicate stages. Sort stage has got more grouping options and sort indicator options.

#3. for quick checking if DS job is running on Server or not, from UNIX
ps -ef | grep 'DSD.RUN'



#4. Make use of Order By clause when a DB stage is being used in join. The intention is to make use of Database power for sorting instead of Data Stage resources. Keep the join partitioning as Auto. Indicate don’t sort option between DB stage and join stage using sort stage when using order by clause.

#5. There are two types of variables - string and encrypted. If you create an encrypted environment variable it will appears as the string "*******" in the Administrator tool.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Thursday, 19 November 2015

Python points #2 - Data Type & String Manipulations

Monday, 2 November 2015

5 Tips For Better DataStage Design #4



1) While using AsInteger() function in datastage transformer always trim the imput column before passing it to function because if there are extra spaces or unwanted characters which generates zeros when actual integer values are expected. We should use APT_STRING_PADCHAR=0x20 (space) env var for fixed field padding.

2) The Len(col) will return the incorrect length if the input column is having some non-ASCII or double byte characters in it. So check your NLS settings for the job to fix this.



3) To remove embedded spaces from decimal data, use StripWhiteSpace(input.field) function to remove all spaces.

4) To get the datastage job no, Open the log view of the job in datastage director and double click on any entry of the log. The job number will be listed under the field "Job Number:"

5) Set these 2 parameters APT_NO_PART_INSERTION, APT_NO_SORT_INSERTION to TRUE to avoid datastage to insert partitioning or sorting method to improve the job performance at compile job. This will remove the warning also "When checking operator: User inserted sort "<name>" does not fulfill the sort requirements of the downstream operator "<name>""





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Friday, 25 September 2015

5 Tips for better DataStage Design #1



01.  Always try to complete the requirement first, Don't spoil your time to develop an optimized job which is practically useless until requirement is completed.

02.  To re-arrange an existing job design, or insert new stage types into an existing job flow, first disconnect the links from the stage to be changed, then the links will retain any meta-data associated with them.



03.  A Lookup stage can only have one input stream, one output stream, and, optionally, one reject stream. Depending on the type of lookup, it can have several reference links. To change the use of particular Lookup   links in an existing job flow, disconnect the links from the Lookup stage and then right-click to change the link type, for example, Stream to Reference.

04.  The Copy stage is a good place-holder between stages if you anticipate that new stages or logic will be needed in the future without damaging existing properties and derivations. When inserting a new stage, simply drag the input and output links from the Copy place-holder to the new stage. Unless the Force property is set in the Copy stage, InfoSphere DataStage optimizes the actual copy out at runtime.

05.  DataStage takes longer to start-up a job for reasons such as it validates the entire environment, nodes and database connectivity before starting the ETL job. By doing so you have overheads upfront by it ensures robust and reliable data loading and integration. Parallel jobs are not recommended for small volume of data and serial fashion of ETL design, as there is an overhead is starting PX processes.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx