Recent Posts

DS_PXDEBUG - DataStage Parallel Debugging Variable





* Controlled with an environment variable, not exposed on GUI.  DS_PXDEBUG set to activate feature (e.g. DS_PXDEBUG=1)
* Warning logged when job run with this debug feature on.
* Debug collected under a new project-level directory "Debugging" on the server. Subdirectories on a per-job basis, named after the job (created as required). For multi-instance jobs jobs run with a non-empty invocation ID, the directory will be "<jobname>.<invocationID>".
* Internally turns on Osh environment variable APT_MSG_FILELINE so that warnings/errors issued by Osh have source filename & linenumber attached.
* Internally turns on Osh environment variable APT_ENGLISH_MESSAGES so that unlocalised copies of PX-originated error / warning messages are issued in addition to the localised copy (where available).

* Internally turns on Osh environment variables APT_PM_PLAYER_TIMING APT_PM_PLAYER_MEMORY APT_RECORD_COUNTS for more reporting from playes
* Places content of jobs RT_SC<jobnum> directory in the debug location (includes job parameter file, Osh script, parent shell script, any osh and compile scripts associated with transformers). These will be in the same characterset as the original files.
* Places content of jobs RT_BP<jobnum>.O directory in the debug location. Includes library file binaries for PX transformers (plus possibly binaries associated with any Server portions of the job).
* Dump of environment varaible values at startup (same as in the log) placed in a named file in the debug location.
* Dump of osh command options placed in a named file in the debug location. Note that this is as issued from the Server wrapper code. Particularly in the case of Windows, it may not represent exactly what is received by the Osh command line, due to the action of the OshWrapper program,  and interpretation of quotes and backslash-escapes.
* Copy of received raw osh output messages in a named file in the debug location. These will typically be in the host characterset, even though on an NLS system Orchestrate will be originating them in UTF8.
* Copy of PX configuration file placed in the debug location. This will be in the same characterset as the original file.
* This new feature collects together and enhances a number of debug features already exposed with other environment variables. In order to minimise code impact risk, the original features will not be removed at this stage.
* The exception is the "dump of raw osh output messages"; it was previously placed in the &COMO& directory. If the old and the new debug options are both enabled, the new one will take precedence and there will not be a copy in &COMO&. Again this decision has been taken to minimise code change.

Contributed by Christ Thornton 2/2/2007





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

DataStage Solutions to Common Warnings/Errors #1


Warnings/Errors Related to Datatype

This Warnings/Errors described in this section are based on Data type like data type mismatch, Column length variations.
Few common Warnings/Errors we get based on data type are :
 
1.1    Conversion error calling conversion routine decimal_from_string data may have been lost
Cause:    This can happen when the input is in incorrect format while converting into target data type or is contains null value, so that the conversion function is not able to convert it to target data type
Resolution:Check for the correct date format or decimal format and also null values in the date or decimal fields before passing to StringToDecimal functions.
    Similar issue can come for the datastage StringToDate, DateToString,DecimalToString  conversion functions as well




1.2    Possible truncation of input string when converting from a higher length string to lower length stringCause:    This can happen when the input is having length which is more than the length given in output of the same stage
Resolution: Change the length of the specified column in specified stage by giving same length in output as of it is in input.
This can happen in stages like Merge, Sort, Join, Lookup etc

1.3    APT_CombinedOperatorController,0: Numeric string expected for input column 'XXXXX’. Use default value. Cause:    This can happen when the input data type and output data type are different and the type conversion is not handled in transformer.
Resolution: Type conversion function should be applied based on target data type.
Ex:    Input data type = Char, Output data type= BigInt
In this case, the direct mapping with out any type conversion will give this message. Need to provide the type conversion function

Note:
i. The Log normally doesn’t show this message as Warning/ Errror, it will be mentioned as “Info”  
ii. When this happen the records will not be inserted into the table/file.
iii. The stage name will not mentioned in the log, to get the stage name where this issue is happening, need to include 1 Environment Variable in job properties. i.e. $APT_DISABLE_COMBINATION and set it to “True”
   
1.4    Reading the WVARCHAR database column XXXXX into a VARCHAR column can cause data loss or corruption due to character set conversionsCause:    This can happen when the data type is supposed to be Unicode but it’s not mentioned in stage.
Resolution:Change the data type for the column to Varchar along with “Unicode” instead of Varchar alone.  i.e. select Unicode from the Drop down provided in Extended column.

1.5    Schema reconciliation detected a type mismatch for field SSSSS. When moving data from field type CHAR(min=100,max=100) into INT64Cause:    This can happen when the data type is supposed to be Char  but it’s mentioned as BigInt in stage.
Resolution:Change the data type for the column to Char with length 100 instead of BigInt in corresponding stage.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Python Points #15 - Exceptions

Modify Stage - What's been promised


Modify stage, one of the most un-used stage in DataStage but very useful in terms of performance tuning. It is advisable to Developers not to use transformer stage to just Trimming or NULL handling but if and only if in the case when they are aware and comfortable with the syntax and derivations supported by modify stage as there is no drop down or right click options to help us with functions/synatx.
http://buff.ly/2bqOV7Z

The definition of Modify Stage as IBM documented -

"The Modify stage alters the record schema of its input data set. The modified data set is then output. You
can drop or keep columns from the schema, or change the type of a column.
The Modify stage is a processing stage. It can have a single input link and a single output link."


http://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/c_deeref_Modify_Stage.html
The operations offered by Modify Stage is -
1. Dropping of Columns
2. Keeping Columns
3. Create Duplicate Columns
4. Rename Columns
5. Change Null-ability of Columns
6. Change Data Type of Columns


Stage is supporting only 1 input stage and 1 output stage.

All these operations are easily done in other stages such as copy, transformer etc. But why Modify stage is required or can say, we should use this?

Answer of this Datastage problem is simple - Performance Tuning of jobs

** Why not to use Transformer -
Cause, Whenever we call the transformer functions, data processed to and through C++ code (transformer implementation) which cause the performance letancy(delay). This delay is negligible for less no of records than higher no. So, Prefer the Modify stage when no of records are high to process.

Keep looking for this place as we are going to learn lot of tips on Modify stage.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

5 Tips For Better DataStage Design #15



1. Stage variable does not accept null value. Hence no null able column should be directly mapped to stage variable without null handling.

2. Use of SetNull() function in stage variables should be avoided because it causes compilation error.

http://www.datagenx.net/2016/08/5-tips-for-better-datastage-design-15.html

3. If input links are not already partitioned on join key then they should be hash partitioned on the join key in join stage. In case of multiple join key it is recommended to partition on one key and sort by the other keys.

4. If there is a need to do the repartition on an input link then we need to clear the preserve partitioning flag in the previous stage. Otherwise it will generate warning in job log.

5. If database table has less volume of data as a reference then it is good to use lookup stage.

6. It is always advisable to avoid Transformation stage. Because the Transformation stage is not written in DataStage native language, instead it is written in c. So every time you compile a job it embeds the c code with the native code in the executable file, which degrades the performance of the job.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

NULL behaviour in Oracle


NULL, as known in SQL, a special marker and keyword indicating that something has no value. A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown. (ref - https://technet.microsoft.com/en-us/library/ms191504(v=sql.105).aspx)

For understanding it, I am using a practical approach where we play around NULL value, sorry No Value, with Oracle database.



** NULL value doesn't count in aggregate functions.
NULL value doesn't count by aggregate functions such as MAX, MIN, SUM, COUNT in SQL.

Example:


** Inserted null string converted to Null.

Example:


** Where can Null be compared?
Example:


Oracle SQL Reference says "In a DECODE function, Oracle considers two nulls to be equivalent. If expr is null, then Oracle returns the result of the first search that is also null."
But this therory varies with DB to DB internal architecture and the way they handled the NULL.Another place where Null can be compared is in range partition definition, where MAXVALUE is greater than Null

**  Unique constraints
Example:


You are able to insert another Null without getting ORA-1 (unique constraint violated).
--
Example:


So if all columns are null, the unique constraint will not be violated. If one or more columns have non-null values, the constraint takes effect.


** Unknown OR True returns True, Unknown AND False returns False
Example:



 
Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/