Showing posts with label Copy. Show all posts
Showing posts with label Copy. Show all posts

Wednesday, 18 January 2017

5 Tips For Better DataStage Design #17

**  There is an automap button in some stages,it can maps fields with the same names.

**  When you add a shared container into your job you need to map the columns of the container to your job link. What you might miss is the extra option you get on the Columns tab "Load" button. In addition to the normal column load you get "Load from Container" which is a quick way to load the container metadata into your job.

**  Don't create a job from an empty canvas. Always copy and use an existing job. Don't create shared containers from a blank canvas, always build and test a full job and then turn part of it into a container.

**  If you want to copy and paste settings between jobs,you had better open two Designers,then you can have two property windows open at the same time and copy or compare them more easily.As most property windows in DataStage are modal and you can only have one property window open per Designer session.

**  You can load metadata into a stage by using the "Load" button on the column tab or by dragging and dropping a table definition from the Designer repository window onto a link in your job. For sequential file stages the drag and drop is faster as it loads both the column names and the format values in one go. If you used the load button you would need to load the column names and then the format details separately.

**  Maybe you often meet a Modify stage or stage function working incorrectly, trial and error should be often the only way to work out the syntax of a function. If you do this in a large and complex job, it can be consumed a lot of times to debug it. The better way is have a couple test jobs in your project with a row generator, a modify or transformer stage and a peek stage. Have a column of each type in this test job. Use this throughout your project as a quick way to test a function or conversion. By the way, to correctly running the transformer stage need install the c++ compiler.

Like the below page to get update

Wednesday, 3 August 2016

#3 How to Copy DataSet from One Server to Another Server

This post is third and last of How to Copy DataSet from One Server to Another Server Series

We have generated a populated a dataset and identified the files which we need to move to another server serverB from serverA


4. Reading the dataset on another server

This is the most crucial step, Now all 4 files are moved on serverB or the common location which can be accessible from serverB.

For my case, common dir is my home - /home/users/atul

A. Change the default.apt file
We need to change the fastname in default.apt (config file) which we copied from the serverA, [ NOT the default.apt for serverB]

Open the file in any text editor or vi and change as below screen shot -

Temporarily create the "resource disk" and "resource scratchdisk" location if not existing as defined in above config file.

B. Copy the dataset data files 

Move the dataset data file from common directory to "resource disk" as defined in config file.

cp ~/dummy.ds.* /opt/IBM/InformationServer/Server/DataSets/

Now, all files locations are like -

Config file and Dataset descriptor file - my home dir or common dir
Dataset data files - /opt/IBM/InformationServer/Server/DataSets/

Design a job which will read thess dataset files and populate data into sequential file or any other output.

Job Paramaters -
APT_CONFIG_FILE = /home/users/atul/default.apt

DataSet Properties
DataSet File - /home/users/atul/dummy.ds

That is all, you can read the copied dataset on serverB, you can populate this data to some other output such as seq file, table so that you can avoid the use of copied default.apt config file which is not for serverB.

Try it out, let me know if you have any question.

If you like this post, follow the below pages to get update

Monday, 1 August 2016

#2 How to Copy DataSet from One Server to Another Server

This post is second part of How to Copy DataSet from One Server to Another Server


 After generating the dummy dataset, next step is to identify the files which we need to copy.

2. Files which we need to move

a. APT_CONFIG_FILE - configuration file which used in dataset
b. DataSet Descriptor file - *.ds file, in our case it is dummy.ds
c. DataSet Data files - Actual data files which stored in RESOURCE DISK location

So let's get all the path which we need to access -

APT_CONFIG_FILE = /opt/IBM/InformationServer/Server/Configulations/default.apt
RESOURCE DISK = /opt/IBM/InformationServer/Server/DataSets
DATASET LOC = /home/users/atul/dummy.ds

Use commands or any FTP tool to copy these files in a shared location which can be accessible from another server (serverB)

For my case, I have stored all of them into my linux home direcory which is common in both server.

So I have executed these commands to copy all the required files into my home directory.

cp  /opt/IBM/InformationServer/Server/Configulations/default.apt ~
cp  /opt/IBM/InformationServer/Server/DataSets/dummy.ds.* ~
cp  /home/users/atul/dummy.ds ~

Now, my home directory is having these files -

You can copy these 4 files on serverB where you want to move your dataset. I am not doing the same as my home directory is common for both server.

3. Why we need these files only

Config file was used by datastage to create dataset ( descriptot file, data files, data file location)
So, we needed - config file, dataset descriptor file and dataset data files.

Like the below page to get update

Saturday, 30 July 2016

#1 How to Copy DataSet from One Server to Another Server

Hi Guys...
I've been asked so many times that how can we move/copy one dataset from one server to another So here is the way which I follow.

At very first step, Analyze if you can avoid this by using some other way like creating sequential file and ftp Or load the data into temporary table which can be accessible on another server, if using datastage packs then via mqs, xml or json formats etc. Why I am suggesting these solutions coz these are easy to design and guaranteed the data quality at other end.

If above solutions are not possible, please follow the below steps -

Points I am going to cover here -
1. Generating a dummy dataset
2. Files which we need to move
3. Why we need these files only
4. Reading the dataset on another server


1. Generating a dummy dataset

I have created a dummy job which is generating a dataset with default APT_Config_file which has 2 nodes.

Here, I am generating 10 dummy rows with the help of Row Generator stage and storing them into a datasset.

a. Config File - I am using the default config file (replaced the server name in "fastname" with serverA)

APT_CONFIG_FILE = /opt/IBM/InformationServer/Server/Configulations/default.apt

check out the "resource disk" location in config file, we need it for further processing

RESOURCE DISK = /opt/IBM/InformationServer/Server/DataSets

b. dataset location - I have created this dataset in my home dir named dummy.ds

DATASET LOC = /home/atul/dummy.ds

Keep looking for next post........

Like the below page to get update

Sunday, 10 April 2016

5 Tips For Better DataStage Design #12

1. Minimum number of sort stages should be use in a datastage job. “Don’t sort if previously sorted” in sort Stage, this option should be set this to “true”, which improves the Sort Stage performance. The same Hash key should be used.  In Transformer Stage “Preserve Sort Order” can be used to maintain sort order option.

2. Minimum number of stages should be used in a job; otherwise it affects the performance of the job.
If a job is having more stages then the job should be decomposed into smaller number of small jobs. The use of container is a best way for better visualize and readability. If the existing active stages occupy almost all the CPU resources, the performance can be improved by running multiple parallel copies of the same stage process. This is done by using a share container.

3. Use of minimum of Stage variables in transformer is a good practice. The performance degrades when more stage variables are used.

4. The use of column propagation should be taken care . Columns, which are not needed in the job flow, should not be propagated from one Stage to another and from one job to the next. The best option is to disable the RCP.

5. When there is a need of renaming columns or addition of new columns, use of copy or modify stage is good practice.

Like the below page to get update

Monday, 1 February 2016

5 Tips For Better DataStage Design #8

#1. Templates have to be created to enhance reusability and enforce coding standard. Jobs should be created using templates.
#2. The template should contain the standard job flow along with proper naming conventions of components, proper Job level annotation and short/long description. Change record section should be kept in log description to keep track.

#3. Don't copy the job design only. copy using 'save as' or create copy option at the job level.
#4. The DataStage connection should be logged off after completion of work to avoid locked jobs.
#5. Creation of common lookup jobs
#6. Some extraction jobs can be created to created reference datasets. The datasets can then be used in different conversion modules

Like the below page to get update!forum/datagenx

Wednesday, 23 December 2015

5 Tips For Better DataStage Design #6

#1. If you are using a copy or a filter stage either immediately after or immediately before a transformer stage, you are reducing the efficiency by using more stages because a transformer does the job of both copy stage as well as a filter stage

#2. Work done by "COPY Stage"
a) Columns order can be altered.
b) And columns can be dropped.
c) We can change the column names.

#3. When you need to run the same sequence of jobs again and again, better create a sequencer with all the jobs that you need to run. Running this sequencer will run all the jobs. You can provide the sequence as per your requirement.

#4. Sort the data as much as possible in DB and reduced the use of DS-Sort for better performance of jobs. Avoid the work done by DataStage which is possible in DB. But it doesn't mean you have to put all the complexity in SQL only, for that we are using datastage.

#5. Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.

Like the below page to get update!forum/datagenx