Wednesday, 28 September 2016

DS_PXDEBUG - DataStage Parallel Debugging Variable

* Controlled with an environment variable, not exposed on GUI.  DS_PXDEBUG set to activate feature (e.g. DS_PXDEBUG=1)
* Warning logged when job run with this debug feature on.
* Debug collected under a new project-level directory "Debugging" on the server. Subdirectories on a per-job basis, named after the job (created as required). For multi-instance jobs jobs run with a non-empty invocation ID, the directory will be "<jobname>.<invocationID>".
* Internally turns on Osh environment variable APT_MSG_FILELINE so that warnings/errors issued by Osh have source filename & linenumber attached.
* Internally turns on Osh environment variable APT_ENGLISH_MESSAGES so that unlocalised copies of PX-originated error / warning messages are issued in addition to the localised copy (where available).

* Internally turns on Osh environment variables APT_PM_PLAYER_TIMING APT_PM_PLAYER_MEMORY APT_RECORD_COUNTS for more reporting from playes
* Places content of jobs RT_SC<jobnum> directory in the debug location (includes job parameter file, Osh script, parent shell script, any osh and compile scripts associated with transformers). These will be in the same characterset as the original files.
* Places content of jobs RT_BP<jobnum>.O directory in the debug location. Includes library file binaries for PX transformers (plus possibly binaries associated with any Server portions of the job).
* Dump of environment varaible values at startup (same as in the log) placed in a named file in the debug location.
* Dump of osh command options placed in a named file in the debug location. Note that this is as issued from the Server wrapper code. Particularly in the case of Windows, it may not represent exactly what is received by the Osh command line, due to the action of the OshWrapper program,  and interpretation of quotes and backslash-escapes.
* Copy of received raw osh output messages in a named file in the debug location. These will typically be in the host characterset, even though on an NLS system Orchestrate will be originating them in UTF8.
* Copy of PX configuration file placed in the debug location. This will be in the same characterset as the original file.
* This new feature collects together and enhances a number of debug features already exposed with other environment variables. In order to minimise code impact risk, the original features will not be removed at this stage.
* The exception is the "dump of raw osh output messages"; it was previously placed in the &COMO& directory. If the old and the new debug options are both enabled, the new one will take precedence and there will not be a copy in &COMO&. Again this decision has been taken to minimise code change.

Contributed by Christ Thornton 2/2/2007

Like the below page to get update

Monday, 19 September 2016

#1 DataStage Solutions to Common Warnings/Errors - Datatype

Warnings/Errors Related to Datatype

This Warnings/Errors described in this section are based on Data type like data type mismatch, Column length variations.
Few common Warnings/Errors we get based on data type are :

1.1    Conversion error calling conversion routine decimal_from_string data may have been lost
Cause:    This can happen when the input is in incorrect format while converting into target data type or is contains null value, so that the conversion function is not able to convert it to target data type
Resolution:Check for the correct date format or decimal format and also null values in the date or decimal fields before passing to StringToDecimal functions.
    Similar issue can come for the datastage StringToDate, DateToString,DecimalToString  conversion functions as well

1.2    Possible truncation of input string when converting from a higher length string to lower length stringCause:    This can happen when the input is having length which is more than the length given in output of the same stage
Resolution: Change the length of the specified column in specified stage by giving same length in output as of it is in input.
This can happen in stages like Merge, Sort, Join, Lookup etc

1.3    APT_CombinedOperatorController,0: Numeric string expected for input column 'XXXXX’. Use default value. Cause:    This can happen when the input data type and output data type are different and the type conversion is not handled in transformer.
Resolution: Type conversion function should be applied based on target data type.
Ex:    Input data type = Char, Output data type= BigInt
In this case, the direct mapping with out any type conversion will give this message. Need to provide the type conversion function

i. The Log normally doesn’t show this message as Warning/ Errror, it will be mentioned as “Info”  
ii. When this happen the records will not be inserted into the table/file.
iii. The stage name will not mentioned in the log, to get the stage name where this issue is happening, need to include 1 Environment Variable in job properties. i.e. $APT_DISABLE_COMBINATION and set it to “True”
1.4    Reading the WVARCHAR database column XXXXX into a VARCHAR column can cause data loss or corruption due to character set conversionsCause:    This can happen when the data type is supposed to be Unicode but it’s not mentioned in stage.
Resolution:Change the data type for the column to Varchar along with “Unicode” instead of Varchar alone.  i.e. select Unicode from the Drop down provided in Extended column.

1.5    Schema reconciliation detected a type mismatch for field SSSSS. When moving data from field type CHAR(min=100,max=100) into INT64Cause:    This can happen when the data type is supposed to be Char  but it’s mentioned as BigInt in stage.
Resolution:Change the data type for the column to Char with length 100 instead of BigInt in corresponding stage.

Like the below page to get update

Monday, 12 September 2016

Python Points #15 - Exceptions

Monday, 29 August 2016

Modify Stage - What's been promised

Modify stage, one of the most un-used stage in DataStage but very useful in terms of performance tuning. It is advisable to Developers not to use transformer stage to just Trimming or NULL handling but if and only if in the case when they are aware and comfortable with the syntax and derivations supported by modify stage as there is no drop down or right click options to help us with functions/synatx.

The definition of Modify Stage as IBM documented -

"The Modify stage alters the record schema of its input data set. The modified data set is then output. You
can drop or keep columns from the schema, or change the type of a column.
The Modify stage is a processing stage. It can have a single input link and a single output link."
The operations offered by Modify Stage is -
1. Dropping of Columns
2. Keeping Columns
3. Create Duplicate Columns
4. Rename Columns
5. Change Null-ability of Columns
6. Change Data Type of Columns

Stage is supporting only 1 input stage and 1 output stage.

All these operations are easily done in other stages such as copy, transformer etc. But why Modify stage is required or can say, we should use this?

Answer of this Datastage problem is simple - Performance Tuning of jobs

** Why not to use Transformer -
Cause, Whenever we call the transformer functions, data processed to and through C++ code (transformer implementation) which cause the performance letancy(delay). This delay is negligible for less no of records than higher no. So, Prefer the Modify stage when no of records are high to process.

Keep looking for this place as we are going to learn lot of tips on Modify stage.

Get this Article as PDF - and

Like the below page to get update

Monday, 22 August 2016

5 Tips For Better DataStage Design #15

1. Stage variable does not accept null value. Hence no null able column should be directly mapped to stage variable without null handling.

2. Use of SetNull() function in stage variables should be avoided because it causes compilation error.

3. If input links are not already partitioned on join key then they should be hash partitioned on the join key in join stage. In case of multiple join key it is recommended to partition on one key and sort by the other keys.

4. If there is a need to do the repartition on an input link then we need to clear the preserve partitioning flag in the previous stage. Otherwise it will generate warning in job log.

5. If database table has less volume of data as a reference then it is good to use lookup stage.

6. It is always advisable to avoid Transformation stage. Because the Transformation stage is not written in DataStage native language, instead it is written in c. So every time you compile a job it embeds the c code with the native code in the executable file, which degrades the performance of the job.

Like the below page to get update

Sunday, 14 August 2016

NULL behaviour in Oracle

NULL, as known in SQL, a special marker and keyword indicating that something has no value. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown. (ref -

For understanding it, I am using a practical approach where we play around NULL value, sorry No Value, with Oracle database.

** NULL value doesn't count in aggregate functions.
NULL value doesn't count by aggregate functions such as MAX, MIN, SUM, COUNT in SQL.


** Inserted null string converted to Null.


** Where can Null be compared?

Oracle SQL Reference says "In a DECODE function, Oracle considers two nulls to be equivalent. If expr is null, then Oracle returns the result of the first search that is also null."
But this therory varies with DB to DB internal architecture and the way they handled the NULL.Another place where Null can be compared is in range partition definition, where MAXVALUE is greater than Null

**  Unique constraints

You are able to insert another Null without getting ORA-1 (unique constraint violated).

So if all columns are null, the unique constraint will not be violated. If one or more columns have non-null values, the constraint takes effect.

** Unknown OR True returns True, Unknown AND False returns False

Like the below page to get update

Monday, 8 August 2016

ETL Strategy #2

Continued......ETL Strategy #1

Delta from Sources extracted by Timestamp
This project will use the Timestamp to capture the deltas for most of the Operational Database sources where a date/time value can be used. ETL process will extract data from the operational data stores based on date/time value column like Update_dt during processing of the delta records, and then populate it into the Initial Staging area. The flow chart shown below shows step by step flow.

As shown in the flow chart above. It is shown in two parts, one for initial load and the other for delta processing.

Ref #              Step Description
1    Insert record into control tables manually or using scripts for each ETL process. This is done only once when a table gets loaded for the first time in data warehouse 
2    Set the extract date to desired Initial load date on the control table. This is the timestamp which the ETL process will use to go against the source system.
3    Run ETL batch process which will read the control tables for extract timestamp.
4    Extract all data from source system greater than the set extract timestamp on the control table.
5    Check if the load completed successfully or failed with errors.
6    If the load failed with errors, then the error handling service is called.
7    If the load completed successfully then the load flag is set to successful.
8    The max timestamp of  the ETL load is obtained
9    A new record is inserted to the control structure with the timestamp obtained in the above step.
10    The process continues to pull the delta records with the subsequent runs.

Delta from Sources extracted by comparison
Where a transaction Date or Timestamp is not available, a process will compare the new and current version of a source to generate its delta. This strategy is mostly used for files as source of data. This is manageable for small to medium size files that are used in this project and should be avoided with larger source file. A transaction code (I=Insert; U=Update; D=Delete) will have to be generated so that the rest of the ETL stream can recognise the type of transaction and process it.
Files are pushed into ETL server or they are pulled from the FTP servers to ETL server. If the files contain delta records, then the files are uploaded directly to the Data warehouse. If the file is a full extract file, then the file comparison delta process will be used to identify the changed records before uploading to the Data warehouse.

E10 Validate Source Data transferred via FTP
Input:    Source Data File and Source Control File.
Output:    NONE.
Dependency: Availability of other systems files.
•    Validate if the number of records in the Source File is the same number as the one contained in the Source Control File.  This will guarantee that the right number of records has been transferred from Source to Target.

Like the below page to get update