Showing posts with label Terminology. Show all posts
Showing posts with label Terminology. Show all posts

Sunday, 27 March 2016

Data Warehouse Glossary #3


Drill Through:
Data analysis that goes from an OLAP cube into the relational database.

Data Warehousing:
The process of designing, building, and maintaining a data warehouse system.

Conformed Dimension:
A dimension that has exactly the same meaning and content when being referred from different fact tables.

Central Warehouse
A database created from operational extracts that adheres to a single, consistent, enterprise data model to ensure consistency of decision-support data across the corporation. A style of computing where all the information systems are located and managed from a single physical location.

Change Data Capture
The process of capturing changes made to a production data source. Change data capture is typically performed by reading the source DBMS log. It consolidates units of work, ensures data is synchronized with the original source, and reduces data volume in a data warehousing environment.

Classic Data Warehouse Development
The process of building an enterprise business model, creating a system data model, defining and designing a data warehouse architecture, constructing the physical database, and lastly populating the warehouses database.

Data Access Tools
An end-user oriented tool that allows users to build SQL queries by pointing and clicking on a list of tables and fields in the data warehouse.

Data Analysis and Presentation Tools

Software that provides a logical view of data in a warehouse. Some create simple aliases for table and column names; others create data that identify the contents and location of data in the warehouse.



Data Dictionary
A database about data and database structures. A catalog of all data elements, containing their names, structures, and information about their usage. A central location for metadata. Normally, data dictionaries are designed to store a limited set of available metadata, concentrating on the information relating to the data elements, databases, files and programs of implemented systems.

Data Warehouse Architecture
An integrated set of products that enable the extraction and transformation of operational data to be loaded into a database for end-user analysis and reporting.

Data Warehouse Architecture Development
A SOFTWARE AG service program that provides an architecture for a data warehouse that is aligned with the needs of the business. This program identifies and designs a warehouse implementation increment and ensures the required infrastructure, skill sets, and other data warehouse foundational aspects are in place for a Data Warehouse Incremental Delivery.

Data Warehouse Engines
Relational databases (RDBMS) and Multi-dimensional databases (MDBMS). Data warehouse engines require strong query capabilities, fast load mechanisms, and large storage requirements.

Data Warehouse Incremental Delivery
A SOFTWARE AG program that delivers one data warehouse increment from design review through implementation.

Data Warehouse Infrastructure
A combination of technologies and the interaction of technologies that support a data warehousing environment.

Data Warehouse Management Tools
Software that extracts and transforms data from operational systems and loads it into the data warehouse.

Data Warehouse Network
An industry organization for know-how exchange. SOFTWARE AG was the first vendor member of the Data Warehouse Network.

Functional Data Warehouse
A warehouse that draws data from nearby operational systems. Each functional warehouse serves a distinct and separate group (such as a division), functional area (such as manufacturing), geographic unit, or product marketing group.

OLTP
On-Line Transaction Processing. OLTP describes the requirements for a system that is used in an operational environment.

Scalability
The ability to scale to support larger or smaller volumes of data and more or less users. The ability to increase or decrease size or capability in cost-effective increments with minimal impact on the unit cost of business and the procurement of additional services.

Schema
The logical and physical definition of data elements, physical characteristics and inter-relationships.

Slice and Dice
A term used to describe a complex data analysis function provided by MDBMS tools.

Warehouse Business Directory
Provides business professionals access to the data warehouse by browsing a catalog of contents.

Warehouse Technical Directory
Defines and manages an information life cycle, a definition of warehouse construction, change management, impact analysis, distribution and operation of a warehouse.

Transformers
Rules applied to change data.



Monday, 21 March 2016

Data Warehouse Glossary #2


Dimension:
A variable, perspective or general category of information that is used to organize and analyze information in a multi-dimensional data cube.

Drill Down:
The ability of a data-mining tool to move down into increasing levels of detail in a data mart, data warehouse or multi-dimensional data cube.

Drill Up:
The ability of a data-mining tool to move back up into higher levels of data in a data mart, data warehouse or multi-dimensional data cube.

Executive Information Management System (EIS):
A type of decision support system designed for executive management that reports summary level information as opposed to greater detail derived in a decision support system.

Extraction, Transformation and Loading (ETL) Tool:
Software that is used to extract data from a data source like a operational system or data warehouse, modify the data and then load it into a data mart, data warehouse or multi-dimensional data cube.

Granularity:
The level of detail in a data store or report.

Hierarchy:
The organization of data, e.g. a dimension, into a outline or logical tree structure.  The strata of a hierarchy are referred to as levels.  The individual elements within a level are referred to as categories.  The next lower level in a hierarchy is the child; the next higher level containing the children is their parent.

Legacy System:
Older systems developed on platforms that tend to be one or more generations behind the current state-of-the-art applications.  Data marts and warehouses were developed in large part due to the difficulty in extracting data from these system and the inconsistencies and incompatibilities among them.

Level:
A tier or strata in a dimensional hierarchy. Each lower level represents an increasing degree of detail.  Levels in a location dimension might include country, region, state, county, city, zip code, etc.

Measure:
A quantifiable variable or value stored in a multi-dimensional OLAP cube.  It is a value in the cell at the intersection of two or more dimensions.

Member:
One of the data points for a level of a dimension.

Meta Data:
Information in a data mart or warehouse that describes the tables, fields, data types, attributes and other objects in the data warehouse and how they map to their data sources.  Meta data is contained in database catalogs and data dictionaries.

Multi-Dimensional Online Processing (MOLAP):
Software that creates and analyzes multi-dimensional cubes to store its information.



Non-Volatile Data:
Data that is static or that does not change.  In transaction processing systems the data is updated on a continual regular basis.  In a data warehouse the database is added to or appended, but the existing data seldom changes.

Normalization:
The process of eliminating duplicate information in a database by creating a separate table that stores the redundant information.  For example, it would be highly inefficient to re-enter the address of an insurance company with every claim.  Instead, the database uses a key field to link the claims table to the address table.  Operational or transaction processing systems are typically “normalized”.  On the other hand, some data warehouses find it advantageous to de-normalize the data allowing for some degree of redundancy.

Online Analytical Processing (OLAP):
The process employed by multi-dimensional analysis software to analyze the data resident in data cubes.  There are different types of OLAP systems named for the type of database employed to create them and the data structures produced.

Open Database Connectivity (ODBC):
A database standard developed by Microsoft and the SQL Access Group Consortium that defines the “rules” for accessing or retrieving data from a database.

Relational Database Management System:
Database management systems that have the ability to link tables of data through a common or key field.  Most databases today use relational technologies and support a standard programming language called Structured Query Language (SQL).

Relational Online Analytical Processing (ROLAP):
OLAP software that employs a relational strategy to organize and store the data in its database.

Replication:
The process of copying data from one database table to another.

Scalable:
The attribute or capability of a database to significantly expand the number of records that it can manage.  It also refers to hardware systems and their ability to be expanded or upgraded to increase their processing speed and handle larger volumes of data.

Structured Query Language (SQL):
A standard programming language used by contemporary relational database management systems.

Synchronization:
The process by which the data in two or more separate database are synchronized so that the records contain the same information.  If the fields and records are updated in one database the same fields and records are updated in the other.

Dimensional Model: 
A type of data modeling suited for data warehousing. In a dimensional model, there are two types of tables: dimensional tables and fact tables. Dimensional table records information on each dimension, and fact table records all the "fact", or measures.

Dimensional Table: 
Dimension tables store records related to this particular dimension. No facts are stored in a dimensional table.

Drill Across:
Data analysis across dimensions.

Friday, 18 March 2016

Data Warehouse Glossary #1



Ad Hoc Query:

A database search that is designed to extract specific information from a database.  It is ad hoc if it is designed at the point of execution as opposed to being a “canned” report.  Most ad hoc query software uses the structured query language (SQL).

Aggregation:

The process of summarizing or combining data.

Catalog:

A component of a data dictionary that describes and organizes the various aspects of a database such as its folders, dimensions, measures, prompts, functions, queries and other database objects.  It is used to create queries, reports, analyses and cubes.

Cross Tab:

A type of multi-dimensional report that displays values or measures in cells created by the intersection of two or more dimensions in a table format.

Dashboard:

A data visualization method and workflow management tool that brings together useful information on a series of screens and/or web pages.  Some of the information that may be contained on a dashboard includes reports, web links, calendar, news, tasks, e-mail, etc.  When incorporated into a DSS or EIS key performance indicators may be represented as graphics that are linked to various hyperlinks, graphs, tables and other reports.  The dashboard draws its information from multiple sources applications, office products, databases, Internet, etc.

Cube:

A multi-dimensional matrix of data that has multiple dimensions (independent variables) and measures (dependent variables) that are created by an Online Analytical Processing System (OLAP).  Each dimension may be organized into a hierarchy with multiple levels.  The intersection of two or more dimensional categories is referred to as a cell.


Data-based Knowledge:

Factual information used in the decision making process that is derived from data marts or warehouses using business intelligence tools.  Data warehousing organizes information into a format so that it represents an organizations knowledge with respect to a particular subject area, e.g. finance or clinical outcomes.

Data Cleansing:

The process of cleaning or removing errors, redundancies and inconsistencies in the data that is being imported into a data mart or data warehouse.  It is part of the quality assurance process.

Data Mart:

A database that is similar in structure to a data warehouse, but is typically smaller and is focused on a more limited area.  Multiple, integrated data marts are sometimes referred to as an Integrated Data Warehouse.  Data marts may be used in place of a larger data warehouse or in conjunction with it.  They are typically less expensive to develop and faster to deploy and are therefore becoming more popular with smaller organizations.

Data Migration:

The transfer of data from one platform to another.  This may include conversion from one language, file structure and/or operating environment to another.

Data Mining:

The process of researching data marts and data warehouses to detect specific patterns in the data sets.  Data mining may be performed on databases and multi-dimensional data cubes with ad hoc query tools and OLAP software.  The queries and reports are typically designed to answer specific questions to uncover trends or hidden relationships in the data.

Data Scrubbing:

See Data Cleansing


Data Transformation:

The modification of transaction data extracted from one or more data sources before it is loaded into the data mart or warehouse.  The modifications may include data cleansing, translation of data into a common format so that is can be aggregated and compared, summarizing the data, etc.

Data Warehouse:

An integrated, non-volatile database of historical information that is designed around specific content areas and is used to answer questions regarding an organizations operations and environment.

Database Management System:

The software that is used to create data warehouses and data marts.  For the purposes of data warehousing, they typically include relational database management systems and multi-dimensional database management systems.  Both types of database management systems create the database structures, store and retrieve the data and include various administrative functions.

Decision Support System (DSS):

A set of queries, reports, rule-based analyses, tables and charts that are designed to aid management with their decision-making responsibilities.  These functions are typically “wrapped around” a data mart or data warehouse.  The DSS tends to employ more detailed level data than an EIS.







Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://datagenx.slack.com/messages/datascience/

Wednesday, 9 September 2015

DataStage Terminology


 DataStage Term                     Description
DataStage Administrator Client program that is used to manage the project of DataStage.
DataStage server DataStage server engine component that links to DataStage client programs.
DataStage Director Execution and monitoring of the program that you want to use DataStage job.
DataStage Designer Graphical design tool that developers use to design and development of the DataStage job.
DataStage Manager Program used to manage the contents of the repository and see DataStage. Please refer to the "DataStage Manager Guide."
Stage Component that represents the data source for DataStage job, and process steps.
Source I pointed to the source of data extraction. STEPS, such as, for example.
Target Refers to the destination of the data. For example, point to the file to be loaded to AML (output by DataStage).
Category Category names used to categorize the jobs in DataStage.
Container Process of calling a common process.
Job Program that defines how to do data extraction, transformation, integration, and loading data into the target database.
Job templates And job processing is performed similar role model.
Job parameters Variables that are included in the job design. (For example,. Is the file name and password, for example)
Job sequence Control is a job for a start and run other jobs.
Scratch disk Disk space to store the data set, such as virtual record.
Table definition Definition that describes the required data includes information about the associated data tables and columns. Also referred to as metadata.
Partitioning For high-speed processing of large amounts of data, the mechanism of DataStage to perform split of the data.
Parallel engine Engine running on multiple nodes to control jobs DataStage.
Parallel jobs Available to parallel processing, the DataStage job type.
Project Jobs and a collection of components required to develop and run. The entry level to DataStage from the client. DataStage project must be licensed.
Metadata Data about the data. For example, a table definition that describes the columns that are building data.
Link Each stage of the job combined to form a data flow or reference lookup.
Routine Functions that are used in common.
Column definition Define the columns to be included in the data table. Contains the names and data types of the columns that make up the column.
Environmental parameters Variables that are included in the job design. For example, the file name and password.
DB2 stage Stage to be able to read and write to DB2 database.
FileSet stage Collection of files used to store the data.
Lookup Stage Performing processing such as a file reference to be used in DataStage or text file or table.
LookupFileSet Stage The process of storing a lookup table.
Sequential Stage Want to manipulate text files.
Custom Stage The process can not be implemented in stages, which is provided as standard DataStage, the process of implementing the language C.
Copy Stage The process of copying the data set.
Stage Generator The process of generating a dummy data set.
DataSet stage Data file to be used in the parallel engine.
Funnel Stage The process of copying the data from one set of multiple data sets.
Filter Stage Processing to extract records from the input data set.
Merge Stage Processing the join more than one input record.
LinkCorrector Stage Process of collecting the data that was previously divided.
RemoveDuplicate Stage The process of removing duplicate entries from the data set.
ChangeCapture Stage The process of comparing the two data sets, the difference between the two records.
RowImport Stage Process of importing a column from a string or binary column.
RowExport Stage Processing to export a column of another type string or binary column.
Transformer Stage Processing to edit item, or type conversion.
Modify Stage Process of conversion to the specified data type, the conversion of the value to the specified value NULL.
XML input stage Reading and processing of XML file, the extraction of the required element from the XML data.
Sort Stage Process to sort data in ascending or descending order.
Join Stage Processing the join more than one input record.
RowGenerator Stage Process of adding a line to an existing dataset
ColumnGenerator Stage Process of adding a column to an existing dataset
Aggregator stage The process of aggregating the data.
Pivot Stage The process of converting multiple columns into multiple rows of records.
Peek Stage Treatment to destroy the records.
Stream link Link that represents the flow of data.
Reference Links Input link that represents the reference data.
Reject link Link to output the data that do not meet the criteria you specify.
Integer type Data type that represents the integer value.
Decimal type Data type represents a number that contains the value of the decimal point.
NULL value Specific value indicating the unknown value. 0, not the same as a blank or empty sequence.
DataSet Collection of data.
SQLStatement Statement to manipulate the data in the table.
TWS Stands for Tivoli Workload Scheduler. The name of the product you want to create a job net.
Hash One way of dividing the data specified in the partitioning function of DataStage. Partitioning is performed by using a hash value.
Modulus One way of dividing the data specified in the partitioning function of DataStage. I do partitioning by dividing the number of partitions in the specified value.
Same One way of dividing the data specified in the partitioning function of DataStage. Processing as input partition without subdivision has been output by the previous stage.
Job Properties Property sheet to make settings for the job.





Like the below page to get update  
https://www.facebook.com/datastage4you
https://twitter.com/datagenx
https://plus.google.com/+AtulSingh0/posts
https://groups.google.com/forum/#!forum/datagenx

 

Saturday, 22 August 2015

MongoDB & RDBMS Terminology





Term relationship between MongoDB ( NoSQL) and RDBMS -






MongoDB Term RDBMS Term
Database  Database
Table Collection
Tuple or Row Document
Column Field or Key
Table Join Embeded Document
Primary Key Primary Key ( default _id field)
mongod ( DB process) oracle/db2 /mysqld
mongo sqlplus/db2 client/mysql