Showing posts with label Jobs. Show all posts
Showing posts with label Jobs. Show all posts

Friday, 7 July 2017

ICONV mystery - the UV function

Iconv (Internal CONVersion) is a function supported by UniVerse DB (UV db) to convert the DATA, not only DATE, into internal format. DataStage Server Jobs are using lots of UV functions to manipulate the data.

Today, I will try to unwrap the mystery behind the Iconv function and try to put the details in simpler words. We will not go into data conversion details but date conversion which is used by DataStage :-)

Like most of other date functions (Parallel one), Iconv also accept the date(string) and its format.

Suppose, Date =   June 17, 2017

To Convert this date into internal format, we have to use -

Iconv("2017-06-17", D-YMD)  = 18066
Iconv("2017/06/17", D/YMD)   = 18066
Iconv("2017:17:06", D:YDM)  = 18066
Iconv("17-06-17", D-Y2MD)    = 18066

D-  --> D for Delimiter followed by delimiter char
Y --> year in YYYY
M --> month in MM
D --> date in DD

As we can see, if we provide the date format with date string, Iconv convert the date to an integer no and it is very important to do because now datastage can understand the given date and we can use Oconv function to re-format the date as required.

I will cover Oconv in next post, till then Keep Learning !!

Like the below page to get update

Wednesday, 7 December 2016

Import the jobs from DS windows client #iLoveScripting

As we have discussed a script which can export the datastage jobs from your client system ( likewise we can write another one to import the jobs. Let's see how -

DsImportJobsClient.bat :

This Script read all the *.dsx job name from the specified Directory and Sub-Directory and import to the Specified project. It can also build (Only BUILD) the existing package created on Information Server Manager and send it to the specified location on client machine.

To use the build feature you need to make sure the package has been created with all the needed jobs, saved and closed. Only update to the selected job will be taken care automatically. To add/delete a job, you need to do manually.

Modify the and ImportJobList.txt file and Go the .bat dir and then execute the importAndBuild.bat. :

ImportJobList.txt :

DsImportJobsClient.bat :

Like the below page to get update

Sunday, 6 November 2016

Export the jobs from DS windows client

Datastage jobs Export/Import are occasional activity (Deployment time :-)) for a developer But it becomes very tedious if the job list are long or it's daily routine to export or import jobs.

So I have written a batch script (windows script) which we can execute from the Client Machine (where datastage clients are installed) and automate this process.

DSjobExportClient.bat :
Export Script read the job name from the file (ExportJobList.txt) and exports the jobName.dsx from the project to the export base location and maintain the folder structure specified in the ExportJobList.txt file. File "ExportJobList.txt" and “” should be updated before running the export script.

Copy the above file on any location of your DS windows client machine and update the "ExportJobList.txt" and “” files.
-    ExportJobList.txt
-    ExportJobs.bat :

ExportJobList.txt :

DsExportJobsClient.bat :

Like the below page to get update

Monday, 18 April 2016

10 Scenario based Interview Questions #2 - DataStage

11. Design a job which insert the data if not exists and update if exists in target
12. Design a job which includes a header and footer in output file
13. Design a job which checks whether Currency data is having 'Rs. 9999.99' format or not.
14. Design a job which checks the date passed validation. Date input in YYYYMMDD format.

For more ->  DataStage Scenario

15. Design a job which check the date difference in Hrs and Mins ( HH:MM)
16. Design a job which delete the data if exists in target first and then load.
17. Design a job which check whether each column is having a value or not
18. Design a job which populate the Date dimention table.
19. Design a job which transpose the columns into row with the help of transformer stage.
20. Design a job which remove duplicate without using Remove Duplicate Stage.

For more ->  DataStage Scenario

Like the below page to get update

Tuesday, 12 April 2016

Export DataStage job log from Director Client

These steps are not for exporting log for more than 3-5 jobs as these are manual steps and need to repeat for each job (Yes, each job individually) and NO CUSTOMIZATION available in export.

** If want to customized export, use dsjob command in your script.
1. Open the job log view for one of the jobs in Director Client.
2. Choose Project > Print All entries > Full details > Print to file, and then enter a file name in which you want to save the log.

Like the below page to get update

Sunday, 10 April 2016

5 Tips For Better DataStage Design #12

1. Minimum number of sort stages should be use in a datastage job. “Don’t sort if previously sorted” in sort Stage, this option should be set this to “true”, which improves the Sort Stage performance. The same Hash key should be used.  In Transformer Stage “Preserve Sort Order” can be used to maintain sort order option.

2. Minimum number of stages should be used in a job; otherwise it affects the performance of the job.
If a job is having more stages then the job should be decomposed into smaller number of small jobs. The use of container is a best way for better visualize and readability. If the existing active stages occupy almost all the CPU resources, the performance can be improved by running multiple parallel copies of the same stage process. This is done by using a share container.

3. Use of minimum of Stage variables in transformer is a good practice. The performance degrades when more stage variables are used.

4. The use of column propagation should be taken care . Columns, which are not needed in the job flow, should not be propagated from one Stage to another and from one job to the next. The best option is to disable the RCP.

5. When there is a need of renaming columns or addition of new columns, use of copy or modify stage is good practice.

Like the below page to get update

Wednesday, 23 March 2016

5 Tips For Better DataStage Design #11

  • When writing intermediate results that will only be shared between parallel jobs, always write to persistent data sets (using Data Set stages). You should ensure that the data is partitioned, and that the partitions, and sort order, are retained at every stage. Avoid format conversion or serial I/O.
  • Data Set stages should be used to create restart points in the event that a job or sequence needs to be rerun. But, because data sets are platform and configuration specific, they should not be used for long-term backup and recovery of source data.
  • Depending on available system resources, it might be possible to optimize overall processing time at run time by allowing smaller jobs to run concurrently. However, care must be taken to plan for scenarios when source files arrive later than expected, or need to be reprocessed in the event of a failure.
  • Parallel configuration files allow the degree of parallelism and resources used by parallel jobs to be set dynamically at run time. Multiple configuration files should be used to optimize overall throughput and to match job characteristics to available hardware resources in development, test, and production modes.
  • The proper configuration of scratch and resource disks and the underlying file system and physical hardware architecture can significantly affect overall job performance.

Like the below page to get update

Tuesday, 15 March 2016

yet another Shell script to access DataStage job logs


We have various data warehouses hosted on AIX/Linux/Unix operating systems. DataStage Director is one of the ETL monitoring & scheduling tool used in numerous data warehouses. In case of ETL job failures, we need to login to DataStage Director and check the log for error messages. The ETL job logs can tend to span into several pages with various types of informative messages. Usually we need to locate the error message under ‘Fatal’ message type. Doing this task by searching the DataStage log manually can be very time-consuming and tiring.

This shell script facilitates the user to access the error messages from the ease of the  AIX/Linux screen. The script also has a facility to email the filtered log messages to the user’s mailbox.

Tool Logic 

Accept jobname and other parameters while script execution.
Establish proper environment settings for local DataStage use
Locate event ids for fatal errors in the ETL joblog.
Extract detail log for fatal errors
Mail the filtered job log with the exact fatal error message to the user.



Like the below page to get update

Friday, 29 January 2016

Where to get DataStage Scheduled jobs ?

Did you guys ever try to find out how the scheduler works when we schedule jobs by DataStage scheduler? Interesting na !!

So, let's have a discussion how this is happening.

DataStage itself doesn't contain any scheduling system or application but it is using OS-level scheduling techniques which is OS-level job scheduler.

If the DataStage server is on Linux/Unix/*nix OS. That's "at" and "cron"

'at -lv' or 'at -l' will show any jobs that are scheduled to run once, 'crontab -l' will show jobs that are scheduled to repeat

On Windows, the jobs become entries in the Scheduled Tasks control panel. In GUI, you can't understand much. But we can extract and dump this information to see it better.

'schtasks /query /v /fo list' command give you a proper formatted information about the scheduled jobs.

Windows has nice output format options, and (with the verbose switch) lots of info, including the next run time, last run time and last result, so you can tell if a job actually ran when it was supposed to.  You may need to use a batch script to filter out the specific entry.

Like the below page to get update!forum/datagenx