My e-Notes about DataScience, Machine Learning, Python, Data Analytics, DataStage, DWH and ETL Concepts


Monday, 21 March 2016

Data Warehouse Glossary #2

A variable, perspective or general category of information that is used to organize and analyze information in a multi-dimensional data cube.

Drill Down:
The ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube.

Drill Up:
The ability of a data-mining tool to move back up into higher levels of data in a data mart, data warehouse or multi-dimensional data cube.

Executive Information Management System (EIS):
A type of decision support system designed for executive management that reports summary level information as opposed to greater detail derived in a decision support system.

Extraction, Transformation and Loading (ETL) Tool:
Software that is used to extract data from a data source like a operational system or data warehouse, modify the data and then load it into a data mart, data warehouse or multi-dimensional data cube.

The level of detail in a data store or report.

The organization of data, e.g. a dimension, into a outline or logical tree structure.  The strata of a hierarchy are referred to as levels.  The individual elements within a level are referred to as categories.  The next lower level in a hierarchy is the child; the next higher level containing the children is their parent.

Legacy System:
Older systems developed on platforms that tend to be one or more generations behind the current state-of-the-art applications.  Data marts and warehouses were developed in large part due to the difficulty in extracting data from these system and the inconsistencies and incompatibilities among them.

A tier or strata in a dimensional hierarchy. Each lower level represents an increasing degree of detail.  Levels in a location dimension might include country, region, state, county, city, zip code, etc.

A quantifiable variable or value stored in a multi-dimensional OLAP cube.  It is a value in the cell at the intersection of two or more dimensions.

One of the data points for a level of a dimension.

Meta Data:
Information in a data mart or warehouse that describes the tables, fields, data types, attributes and other objects in the data warehouse and how they map to their data sources.  Meta data is contained in database catalogs and data dictionaries.

Multi-Dimensional Online Processing (MOLAP):
Software that creates and analyzes multi-dimensional cubes to store its information.

Non-Volatile Data:
Data that is static or that does not change.  In transaction processing systems the data is updated on a continual regular basis.  In a data warehouse the database is added to or appended, but the existing data seldom changes.

The process of eliminating duplicate information in a database by creating a separate table that stores the redundant information.  For example, it would be highly inefficient to re-enter the address of an insurance company with every claim.  Instead, the database uses a key field to link the claims table to the address table.  Operational or transaction processing systems are typically “normalized”.  On the other hand, some data warehouses find it advantageous to de-normalize the data allowing for some degree of redundancy.

Online Analytical Processing (OLAP):
The process employed by multi-dimensional analysis software to analyze the data resident in data cubes.  There are different types of OLAP systems named for the type of database employed to create them and the data structures produced.

Open Database Connectivity (ODBC):
A database standard developed by Microsoft and the SQL Access Group Consortium that defines the “rules” for accessing or retrieving data from a database.

Relational Database Management System:
Database management systems that have the ability to link tables of data through a common or key field.  Most databases today use relational technologies and support a standard programming language called Structured Query Language (SQL).

Relational Online Analytical Processing (ROLAP):
OLAP software that employs a relational strategy to organize and store the data in its database.

The process of copying data from one database table to another.

The attribute or capability of a database to significantly expand the number of records that it can manage.  It also refers to hardware systems and their ability to be expanded or upgraded to increase their processing speed and handle larger volumes of data.

Structured Query Language (SQL):
A standard programming language used by contemporary relational database management systems.

The process by which the data in two or more separate database are synchronized so that the records contain the same information.  If the fields and records are updated in one database the same fields and records are updated in the other.

Dimensional Model: 
A type of data modeling suited for data warehousing. In a dimensional model, there are two types of tables: dimensional tables and fact tables. Dimensional table records information on each dimension, and fact table records all the "fact", or measures.

Dimensional Table: 
Dimension tables store records related to this particular dimension. No facts are stored in a dimensional table.

Drill Across:
Data analysis across dimensions.

No comments:

Post a comment


The postings on this site are my own and don't necessarily represent IBM's or other companies positions, strategies or opinions. All content provided on this blog is for informational purposes and knowledge sharing only.
The owner of this blog makes no representations as to the accuracy or completeness of any information on this site or found by following any link on this site. The owner will not be liable for any errors or omissions in this information nor for the availability of this information. The owner will not be liable for any losses, injuries, or damages from the display or use of his information.