Thursday, December 17, 2009

Topic 4: Data Warehouse and OLAP

This week, i have learn about the differences between operational database and data warehouse.

There are several approach when creating a data warehouse namely:
1. Star Schema (http://en.wikipedia.org/wiki/Star_schema)
2. Snowflake Schema (http://en.wikipedia.org/wiki/Snowflake_schema)
3. Constellation Schema (http://datawarehouse4u.info/Data-warehouse-schema-architecture-fact-constellation-schema.html)

The star schema is the simplest style of data warehouse schema. it consists of a few fact tables referencing any number of dimension tables. it is also considered an important special case of the snowflake schema.

structure of star schema:


speaking of snowflake schema,it is a logical arrangement of tables in a multidimensional database such that the entity relationship diagram resembles a snowflake in shape. Closely related to the star schema, the snowflake schema is represented by centralized fact tables which are connected to multiple dimensions.

stucture of snowflake schema:


In a fact constellation schema, different fact tables are explicitly assigned to the dimensions, which are for given facts relevant. This may be useful in cases when some facts are associated with a given dimension level and other facts with a deeper dimension level.

structure of constellation schema:




Parent child dimension
A parent-child dimension is based on two dimension table columns that together define the lineage relationships among the members of the dimension. One column, called the member key column, identifies each member; the other column, called the parent key column, identifies the parent of each member. This information is used to create parent-child links, which are then combined into a single member hierarchy that represents a single meta data level.

For example, in the following Employee table, the column that identifies each member is Employee_Number. The column that identifies the parent of each member is Manager_Employee_Number. (This column stores the employee number of each employee's manager.)


there is also 3 types of OLAP namely:
1.Relational OLAP (ROLAP)
2.Multidimensional OLAP (MOLAP)
3.Hybrid OLAP (HOLAP)

1.Relational online analytical processing (ROLAP) is a form of online analytical processing (OLAP) that performs dynamic multidimensional analysis of data stored in a relational database rather than in a multidimensional database

2.MOLAP (multidimensional online analytical processing) is online analytical processing (OLAP) that indexes directly into a multidimensional database.MOLAP processes data that is already stored in a multidimensonal array in which all possible combinations of data are reflected, each in a cell that can be accessed directly. For this reason, MOLAP is, for most uses, faster and more user-responsive than relational online analytical processing (ROLAP)

3.Hybrid online analytical processing (HOLAP) is a combination of relational OLAP (ROLAP) and multidimensional OLAP (usually referred to simply as OLAP). HOLAP was developed to combine the greater data capacity of ROLAP with the superior processing capability of OLAP. HOLAP can use varying combinations of ROLAP and OLAP technology. Typically it stores data in a both a relational database (RDB) and a multidimensional database (MDDB) and uses whichever one is best suited to the type of processing desired.

you can take a look @http://businessintelligence.ittoolbox.com/documents/advantagesdisadvantages-of-molap-rolap-and-holap-15897 it describes the advantages and disadvantages of using different types of OLAP.

Furthermore, there are eight basic steps while Designing a data warehouse fact table:
1) Choosing the data mart for the small group of end users we
deal with.
2) Fact table granularity (the smallest defined level of data in the
table is determined.
3) Fact table dimensions are selected.
4) Determine the facts for the table. In most cases, the granularity
is at the transaction level, so the fact is the amount.
5) Storing pre-calculations in the fact table
6) Rounding out the dimension tables
7) Choosing the duration of the database
8) The need to track slowly changing dimensions

No comments: