Showing posts with label design. Show all posts
Showing posts with label design. Show all posts

Wednesday, 5 July 2017

Conditionally Aborting Jobs with Transformer Stage


How to develop a job which will stop processing when FROM_DATE and TO_DATE is equal in data? Or
I want to abort the job when reject row count is more than 50?

Above scenarios can be implemented using Transformer Stage but How? Let's check this out -

  • The Transformer can be used to conditionally abort a job when incoming data matches a specific rule. 
    • In our case 1, it is FROM_DATE  = TO_DATE 
    • In our case 2, it is some reject condition 
  • Create a new output link that will handle rows that match the abort rule. 
  • Within the link constraints dialog box, apply the abort rule to this output link
  • Set the “Abort After Rows” count to the number of rows allowed before the job should be aborted .
    • In case 1, it should be 1. as we want to abort the job when FROM_DATE is equal to TO_DATE
    • In case 2, it should be 50 as we want to abort the job when reject condition have more than 50 records
xfm

But, since the Transformer will abort the entire job flow immediately, it is possible that valid rows will not have been flushed from Sequential File (export) buffers, or committed to database tables.
It is important to set the Sequential File buffer flush  or database commit parameters otherwise we have to manually remove the data which has been inserted into sequential file or database.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 5 April 2017

NULL Handling in Sequential File Stage



DataStage has a mechanism for denoting NULL field values. It is slightly different in server and parallel jobs. In the sequential file stage a character or string may be used to represent NULL column values. Here's how represent NULL with the character "~":

Server Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. On the right enter the "~" next to "Default NULL string:"

Parallel Job:
1. Create a Sequential file stage and make sure there is an Output link from it.
2. Open the Sequential file stage and click the "Outputs" tab ans Select "Format"
3. Right click on "Field defaults" ==> "Add sub-property" and select "Null field value"
4. Enter the "~" in the newly created field.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 18 January 2017

5 Tips For Better DataStage Design #17





**  There is an automap button in some stages,it can maps fields with the same names.

**  When you add a shared container into your job you need to map the columns of the container to your job link. What you might miss is the extra option you get on the Columns tab "Load" button. In addition to the normal column load you get "Load from Container" which is a quick way to load the container metadata into your job.

**  Don't create a job from an empty canvas. Always copy and use an existing job. Don't create shared containers from a blank canvas, always build and test a full job and then turn part of it into a container.



**  If you want to copy and paste settings between jobs,you had better open two Designers,then you can have two property windows open at the same time and copy or compare them more easily.As most property windows in DataStage are modal and you can only have one property window open per Designer session.

**  You can load metadata into a stage by using the "Load" button on the column tab or by dragging and dropping a table definition from the Designer repository window onto a link in your job. For sequential file stages the drag and drop is faster as it loads both the column names and the format values in one go. If you used the load button you would need to load the column names and then the format details separately.

**  Maybe you often meet a Modify stage or stage function working incorrectly, trial and error should be often the only way to work out the syntax of a function. If you do this in a large and complex job, it can be consumed a lot of times to debug it. The better way is have a couple test jobs in your project with a row generator, a modify or transformer stage and a peek stage. Have a column of each type in this test job. Use this throughout your project as a quick way to test a function or conversion. By the way, to correctly running the transformer stage need install the c++ compiler.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Saturday, 8 October 2016

#2 DataStage Solutions to Common Warnings/Error - Null Handling


Warnings/Errors Related to Null Handling -



1.1       When checking operator: When binding output interface field “XXXXX” to field “XXXXX”: Converting a nullable source to a non-nullable result

Cause: This can happen when reading from oracle database or in any processing stage where input column is defined as nullable and metadata in datastage is defined as non-nullable.

Resolution: Convert a nullable field to non  nullable. Need to apply available null functions in datastage or in the query.


1.2       APT_CombinedOperatorController(1),0: Field 'XXXXX' from input dataset '0' is NULL. Record dropped.

Cause: This can happen when there is no null handling mentioned on column and the same column is used in constraints/Stage Varibales.

Resolution:  Provide Null handling function to the column mentioned in constraint/Stage variable.


http://www.datagenx.net/2016/09/datastage-solutions-to-common.html


1.3       Fatal Error: Attempt to setIsNull() on the accessor interfacing to non-nullable field "XXXX".

Cause: This can happen when the column in source is nullable but in DB2 stage its mentioned as Non Nullable

Resolution: Change the Nullable field for the column to “Yes” instead of “No” i.e.


1.4       Exporting nullable field without null handling properties

Cause: This can happen when the columns are mentioned as nullable in sequential file stage and no representation for null values was specified.

Resolution: Specify Null field value in Format tab of sequential file stage.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 29 August 2016

Modify Stage - What's been promised


Modify stage, one of the most un-used stage in DataStage but very useful in terms of performance tuning. It is advisable to Developers not to use transformer stage to just Trimming or NULL handling but if and only if in the case when they are aware and comfortable with the syntax and derivations supported by modify stage as there is no drop down or right click options to help us with functions/synatx.
http://buff.ly/2bqOV7Z

The definition of Modify Stage as IBM documented -

"The Modify stage alters the record schema of its input data set. The modified data set is then output. You
can drop or keep columns from the schema, or change the type of a column.
The Modify stage is a processing stage. It can have a single input link and a single output link."


http://www.ibm.com/support/knowledgecenter/SSZJPZ_11.3.0/com.ibm.swg.im.iis.ds.parjob.dev.doc/topics/c_deeref_Modify_Stage.html
The operations offered by Modify Stage is -
1. Dropping of Columns
2. Keeping Columns
3. Create Duplicate Columns
4. Rename Columns
5. Change Null-ability of Columns
6. Change Data Type of Columns


Stage is supporting only 1 input stage and 1 output stage.

All these operations are easily done in other stages such as copy, transformer etc. But why Modify stage is required or can say, we should use this?

Answer of this Datastage problem is simple - Performance Tuning of jobs

** Why not to use Transformer -
Cause, Whenever we call the transformer functions, data processed to and through C++ code (transformer implementation) which cause the performance letancy(delay). This delay is negligible for less no of records than higher no. So, Prefer the Modify stage when no of records are high to process.

Keep looking for this place as we are going to learn lot of tips on Modify stage.


Get this Article as PDF - http://bit.ly/2fdroHR and  http://bit.ly/2fdrDCr



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 22 August 2016

5 Tips For Better DataStage Design #15



1. Stage variable does not accept null value. Hence no null able column should be directly mapped to stage variable without null handling.

2. Use of SetNull() function in stage variables should be avoided because it causes compilation error.

http://www.datagenx.net/2016/08/5-tips-for-better-datastage-design-15.html

3. If input links are not already partitioned on join key then they should be hash partitioned on the join key in join stage. In case of multiple join key it is recommended to partition on one key and sort by the other keys.

4. If there is a need to do the repartition on an input link then we need to clear the preserve partitioning flag in the previous stage. Otherwise it will generate warning in job log.

5. If database table has less volume of data as a reference then it is good to use lookup stage.

6. It is always advisable to avoid Transformation stage. Because the Transformation stage is not written in DataStage native language, instead it is written in c. So every time you compile a job it embeds the c code with the native code in the executable file, which degrades the performance of the job.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 8 August 2016

ETL Strategy #2


Continued......ETL Strategy #1
 

Delta from Sources extracted by Timestamp
This project will use the Timestamp to capture the deltas for most of the Operational Database sources where a date/time value can be used. ETL process will extract data from the operational data stores based on date/time value column like Update_dt during processing of the delta records, and then populate it into the Initial Staging area. The flow chart shown below shows step by step flow.



As shown in the flow chart above. It is shown in two parts, one for initial load and the other for delta processing.

Ref #              Step Description
1    Insert record into control tables manually or using scripts for each ETL process. This is done only once when a table gets loaded for the first time in data warehouse 
2    Set the extract date to desired Initial load date on the control table. This is the timestamp which the ETL process will use to go against the source system.
3    Run ETL batch process which will read the control tables for extract timestamp.
4    Extract all data from source system greater than the set extract timestamp on the control table.
5    Check if the load completed successfully or failed with errors.
6    If the load failed with errors, then the error handling service is called.
7    If the load completed successfully then the load flag is set to successful.
8    The max timestamp of  the ETL load is obtained
9    A new record is inserted to the control structure with the timestamp obtained in the above step.
10    The process continues to pull the delta records with the subsequent runs.
   


Delta from Sources extracted by comparison
Where a transaction Date or Timestamp is not available, a process will compare the new and current version of a source to generate its delta. This strategy is mostly used for files as source of data. This is manageable for small to medium size files that are used in this project and should be avoided with larger source file. A transaction code (I=Insert; U=Update; D=Delete) will have to be generated so that the rest of the ETL stream can recognise the type of transaction and process it.
Files are pushed into ETL server or they are pulled from the FTP servers to ETL server. If the files contain delta records, then the files are uploaded directly to the Data warehouse. If the file is a full extract file, then the file comparison delta process will be used to identify the changed records before uploading to the Data warehouse.
         

E10 Validate Source Data transferred via FTP
Input:    Source Data File and Source Control File.
Output:    NONE.
Dependency: Availability of other systems files.
Functions:
•    Validate if the number of records in the Source File is the same number as the one contained in the Source Control File.  This will guarantee that the right number of records has been transferred from Source to Target.







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 19 July 2016

ETL Strategy #1



The ETL Load will be designed to extract data from all the required Data Sources. The data to feed the Next Gen BI Database will have to be brought from the sources with specific Extraction, Transformation & Load (ETL) processes. It is essential to define a good ETL strategy to ensure that the execution of these tasks will support appropriate data volumes and Design Should be Scalable, and Maintenance free.


Initial Load Strategy

The Initial Load process is there to support the requirement to include historical data that can’t be included through the regular Delta Refresh.
For the Next Gen BI project, it is expected to have a full extract available for the required sources to prepare an Initial Load prior to the regular Delta Refresh. This Initial extraction of the data will then flow through the regular transformation and load process.
As discussed in the control process, Control tables will be used to initiate the first iteration of the Initial ETL process, a list of source table name with extraction dates will be loaded in the control tables. ETL process can be kicked off through the scheduler and the ETL process will read the control tables and process the full extract.
The rest of the process for an Initial load is same as the delta refresh. As shown in the flow chart under the section (Delta from Sources extracted by Timestamp), The only difference is the loading of the control tables to start the process for the first time when a table gets loaded in the Data Warehouse.


Delta Refresh or CDC Strategy

The Delta refresh process will apply only the appropriate transactions to the Data Warehouse. The result is a greatly reduced volume of information to be processed and applied. Also the Delta transactions for the Warehouse can be reused as input for the different Data mart, since they will be part of the Staging area and already processed. As discussed in the Control Process Strategy, control tables will be used to control delta refresh process.


Continued......ETL Strategy #2



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Thursday, 16 June 2016

5 Tips For Better DataStage Design #14



1. The use of Lookup stage depends upon the volume of data.Sparse lookup type should be used when primary input data volume is small.If the reference data volume is more, Lookup Stage should be avoided.

2. Use of ORDER BY clause in the database is good as compared to use of sort stage.



3. In Dtatastage Administrator, Tuned the 'Project Tunable' for better performance.

4. For Funnel, the use of this stage reduces the performance of a job. Funnel Stage should be run in continuous mode.

5. If the hash file is used only for lookup then "enable Preload to memory". This will improve the performance.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 17 May 2016

Lookup Stage behaviour



Today, I am gonna ask you a question, What value I will get from lookup when my datatype is integer (Not Null) and there is no match b/w source and reference data???

Generally, we say, NULL as there is no match b/w source and reference. But that's not true.
So let's see how the DataStage and Lookup behave :-)

http://www.datagenx.net/2016/05/lookup-stage-behaviour.html
When Source and Reference are NULLable -
-       If there is no match b/ source and reference, we will get NULL in output 

When Source and Reference are Not-NULLable -
-       If there is no match b/ source and reference, we will get DataStage Defaults for that datatype.
        such as - 0 for integer and empty string or '' for varchar when data is going out from lookup stage.

So, Be careful when you are planning to filter the data outside lookup stage based on referenced columns value as field in output file is not null, transformer stage don't receive a null (because it comes with the default value 0) and can't handle it as you expec.

Hoping, this will add one pointer in your learning. Let me know your thoughts in comment section.




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 3 May 2016

Otherwise Constraint - A Quick DataStage Recipe


Recipe:

How to use "Otherwise" constraint in Transformer Stage


www.datagenx.net

How To:

To use "Otherwise" constraint in Transformer stage, Order of link is important.
Typically link with "Otherwise" constraint should be last in Transformer stage link order




Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 25 April 2016

Suppress Warnings - A Quick DataStage Recipe



Recipe:

How to suppress job warnings in DS job log

HowTo:

Add "rule" in Message Handler

www.datagenx.net

Method:

From DS Director, from Message Handler, add a rule
Select Warning message as example of Message text
Or
Open the job log, select the message you want to suppress
right click and add in Message Handler





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 18 April 2016

10 Scenario based Interview Questions #2 - DataStage



11. Design a job which insert the data if not exists and update if exists in target
12. Design a job which includes a header and footer in output file
13. Design a job which checks whether Currency data is having 'Rs. 9999.99' format or not.
14. Design a job which checks the date passed validation. Date input in YYYYMMDD format.

For more ->  DataStage Scenario

15. Design a job which check the date difference in Hrs and Mins ( HH:MM)
16. Design a job which delete the data if exists in target first and then load.
17. Design a job which check whether each column is having a value or not
18. Design a job which populate the Date dimention table.
19. Design a job which transpose the columns into row with the help of transformer stage.
20. Design a job which remove duplicate without using Remove Duplicate Stage.


For more ->  DataStage Scenario



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Sunday, 10 April 2016

5 Tips For Better DataStage Design #12



1. Minimum number of sort stages should be use in a datastage job. “Don’t sort if previously sorted” in sort Stage, this option should be set this to “true”, which improves the Sort Stage performance. The same Hash key should be used.  In Transformer Stage “Preserve Sort Order” can be used to maintain sort order option.

2. Minimum number of stages should be used in a job; otherwise it affects the performance of the job.
If a job is having more stages then the job should be decomposed into smaller number of small jobs. The use of container is a best way for better visualize and readability. If the existing active stages occupy almost all the CPU resources, the performance can be improved by running multiple parallel copies of the same stage process. This is done by using a share container.





3. Use of minimum of Stage variables in transformer is a good practice. The performance degrades when more stage variables are used.

4. The use of column propagation should be taken care . Columns, which are not needed in the job flow, should not be propagated from one Stage to another and from one job to the next. The best option is to disable the RCP.

5. When there is a need of renaming columns or addition of new columns, use of copy or modify stage is good practice.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 4 April 2016

DataStage Scenario #16 - Cross duplicate Removal


Need to remove duplicate where source or destination can be switched.


Input:


source   destination   distance
city1 city2 500
city2 city1 500
city3 city4 500
city4 city3 500 
city5 city7 700
city7 city5 700



Output:

source   destination   distance
city1 city2 500
city3 city4 500
city5 city7 700





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

Monday, 28 March 2016

10 Scenario based Interview Questions #1 - DataStage


1. Design a job which convert single source row to three target row.
2. Design a job which can identify the row if they are duplicate in input.
3. Design a job which will fetch the input file header and footer.
4. Design a job which will segregate unique and duplicate records in different files.
5. Design a job which remove the header from the input file.

For more ->  DataStage Scenario
6. Design a job which remove the footer from the input file.
7. Design a job which throw a mail if footer is not there in input file.
8. Design a job which extract the alternate records from the input file.
9. Design a job which extract the Nth row from the input file
10. Design a job which extract data from two input files and load them in alternate in target.


For more ->  DataStage Scenario

Wednesday, 23 March 2016

5 Tips For Better DataStage Design #11




  • When writing intermediate results that will only be shared between parallel jobs, always write to persistent data sets (using Data Set stages). You should ensure that the data is partitioned, and that the partitions, and sort order, are retained at every stage. Avoid format conversion or serial I/O.
  • Data Set stages should be used to create restart points in the event that a job or sequence needs to be rerun. But, because data sets are platform and configuration specific, they should not be used for long-term backup and recovery of source data.
  • Depending on available system resources, it might be possible to optimize overall processing time at run time by allowing smaller jobs to run concurrently. However, care must be taken to plan for scenarios when source files arrive later than expected, or need to be reprocessed in the event of a failure.
  • Parallel configuration files allow the degree of parallelism and resources used by parallel jobs to be set dynamically at run time. Multiple configuration files should be used to optimize overall throughput and to match job characteristics to available hardware resources in development, test, and production modes.
  • The proper configuration of scratch and resource disks and the underlying file system and physical hardware architecture can significantly affect overall job performance.






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 22 March 2016

Transformer Stage alternative - A Quick DataStage Recipe



What to use instead of "Transformer" Stage

Copy Stage

Use "Copy" stage instead of "Transformer" Stage for following:
Renaming columns
Dropping columns
Default type conversions
Job design placeholder between stages


Modify Stage

Use "Modify" stage
Non default type conversions
Null handling
Character string trimming



Filter Stage

Use "Filter" Stage
Using constraints on output data


Will add more.......






Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Tuesday, 8 March 2016

Python - Get line no with print statement


Sometime when we are working on some code and there are lot of print statement, It's difficult to check for which print statement produce this statement.

It helps a lot while debugging the code and helped me to create better python learning script which will generate the output with the code line no which is easy to relate.




Code:
from inspect import currentframe

def lno():
    cf = currentframe()
    val = str(cf.f_back.f_lineno)+". "
    return val

print "this is Me", lno()
print lno(), "Hi! there"

Output:

this is Me  8.
9.  Hi! there


** This is a little overhead on the code as with each print statement the function is being called. So use it wisely :-)





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Monday, 7 March 2016

Data Warehouse Approaches #2



Top-down approach(Inmon)

The top-down approach views the data warehouse from the top of the entire analytic environment.

The data warehouse holds atomic or transaction data that is extracted from one or more source systems and integrated within a normalized, enterprise data model. From there, the data is summarized, dimensionalized, and distributed to one or more “dependent” data marts. These data marts are “dependent” because they derive all their data from a centralized data warehouse.

Sometimes, organizations supplement the data warehouse with a staging area to collect and store source system data before it can be moved and integrated within the data warehouse. A separate staging area is particularly useful if there are numerous source systems, large volumes of data, or small batch windows with which to extract data from source systems.


Pros/Cons 

The major benefit of a “top-down” approach is that it provides an integrated, flexible architecture to support downstream analytic data structures.
First, this means the data warehouse provides a departure point for all data marts, enforcing consistency and standardization so that organizations can achieve a single version of the truth. Second, the atomic data in the warehouse lets organizations re-purpose that data in any number of ways to meet new and unexpected business needs.

For example, a data warehouse can be used to create rich data sets for statisticians, deliver operational reports, or support operational data stores (ODS) and analytic applications. Moreover, users can query the data warehouse if they need cross-functional or enterprise views of the data.

On the downside, a top-down approach may take longer and cost more to deploy than other approaches, especially in the initial increments. This is because organizations must create a reasonably detailed enterprise data model as well as the physical infrastructure to house the staging area, data warehouse, and the data marts before deploying their applications or reports. (Of course, depending on the size of an implementation, organizations can deploy all three “tiers” within a single database.) This initial delay may cause some groups with their own IT budgets to build their own analytic applications. Also, it may not be intuitive or seamless for end users to drill through from a data mart to a data warehouse to find the details behind the summary data in their reports.


Bottom-up approach(Kimball)

In a bottom-up approach, the goal is to deliver business value by deploying dimensional data marts as quickly as possible. Unlike the top-down approach, these data marts contain all the data — both atomic and summary — that users may want or need, now or in the future. Data is modeled in a star schema design to optimize usability and query performance. Each data mart builds on the next, reusing dimensions and facts so users can query across data marts, if desired, to obtain a single version of the truth as well as both summary and atomic data.

The “bottom-up” approach consciously tries to minimize back-office operations, preferring to focus an organization’s effort on developing dimensional designs that meet end-user requirements. The “bottom-up” staging area is non-persistent, and may simply stream flat files from source systems to data marts using the file transfer protocol. In most cases, dimensional data marts are logically stored within a single database. This approach minimizes data redundancy and makes it easier to extend existing dimensional models to accommodate new subject areas.


Pros/Cons 

The major benefit of a bottom-up approach is that it focuses on creating user-friendly, flexible data structures using dimensional, star schema models. It also delivers value rapidly because it doesn’t lay down a heavy infrastructure up front.
Without an integration infrastructure, the bottom-up approach relies on a “dimensional bus” to ensure that data marts are logically integrated and stovepipe applications are avoided. To integrate data marts logically, organizations use “conformed” dimensions and facts when building new data marts. Thus, each new data mart is integrated with others within a logical enterprise dimensional model.
Another advantage of the bottom-up approach is that since the data marts contain both summary and atomic data, users do not have to “drill through” from a data mart to another structure to obtain detailed or transaction data. The use of a staging area also eliminates redundant extracts and overhead required to move source data into the dimensional data marts.

One problem with a bottom-up approach is that it requires organizations to enforce the use of standard dimensions and facts to ensure integration and deliver a single version of the truth. When data marts are logically arrayed within a single physical database, this integration is easily done. But in a distributed, decentralized organization, it may be too much to ask departments and business units to adhere and reuse references and rules for calculating facts. There can be a tendency for organizations to create “independent” or non-integrated data marts.

In addition, dimensional marts are designed to optimize queries, not support batch or transaction processing. Thus, organizations that use a bottom-up approach need to create additional data structures outside of the bottom-up architecture to accommodate data mining, ODSs, and operational reporting requirements. However, this may be achieved simply by pulling a subset of data from a data mart at night when users are not active on the system.







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/