My scrapbook about almost anything I stumble upon in my tech world. If you find anything useful don't forget to give thumbs-up :)

Breaking

Monday, July 18, 2016

ETL Strategy #1



The ETL Load will be designed to extract data from all the required Data Sources. The data to feed the Next Gen BI Database will have to be brought from the sources with specific Extraction, Transformation & Load (ETL) processes. It is essential to define a good ETL strategy to ensure that the execution of these tasks will support appropriate data volumes and Design Should be Scalable, and Maintenance free.


Initial Load Strategy

The Initial Load process is there to support the requirement to include historical data that can’t be included through the regular Delta Refresh.
For the Next Gen BI project, it is expected to have a full extract available for the required sources to prepare an Initial Load prior to the regular Delta Refresh. This Initial extraction of the data will then flow through the regular transformation and load process.
As discussed in the control process, Control tables will be used to initiate the first iteration of the Initial ETL process, a list of source table name with extraction dates will be loaded in the control tables. ETL process can be kicked off through the scheduler and the ETL process will read the control tables and process the full extract.
The rest of the process for an Initial load is same as the delta refresh. As shown in the flow chart under the section (Delta from Sources extracted by Timestamp), The only difference is the loading of the control tables to start the process for the first time when a table gets loaded in the Data Warehouse.


Delta Refresh or CDC Strategy

The Delta refresh process will apply only the appropriate transactions to the Data Warehouse. The result is a greatly reduced volume of information to be processed and applied. Also the Delta transactions for the Warehouse can be reused as input for the different Data mart, since they will be part of the Staging area and already processed. As discussed in the Control Process Strategy, control tables will be used to control delta refresh process.


Continued......ETL Strategy #2



Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

No comments:

Post a Comment

Disclaimer

The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.