R E Q N D O C

Loading

Data warehousing | Document-oriented database | Three types of schemas: Star, Snowflake, Galaxy

Data storage Scheme – a visualization of a domain-specific information database, specially designed and intended for the preparation of reports and business analysis to support decision-making in an organization.

100$

Equipment

  • Pages per document: 1 model     
  • Font size: 12    
  • Document format: docx, pdf    
  • Development time: 1-2 days    
  • Prepayment: 100% of the total amount    
  • Signing a contract: yes    
  • Free consultations: yes (0,5 hour)    

A document-oriented database is a designed for storing, retrieving, and managing document-oriented, or semi structured data. Schemas are a logical description of data warehouse tables. Schemes are formed from several fact tables and measurements.

There are three schemes for data warehouses:

  • Star
  • Snowflake
  • Galaxy

Star scheme

When using the star scheme, the fact table is central, with which all dimension tables are associated. Thus, information about each dimension is located in a separate table, which simplifies their viewing, and makes the diagram itself logically transparent and understandable to the user. Such type is used in Data Warehouse MySQL, Data Warehouse SAS and Data Warehouse SAP.

 Main characteristics:

  • Star scheme has only one fact table and several dimension tables.
  • In the star scheme, a one-dimensional table represents each dimension.
  • Measurement tables are not normalized in the star schema.
  • Each dimension table is combined with a key in the fact table.

 

Snowflake scheme

However, placing all the measurement information in one table is not always justified. For example, if the goods sold are grouped (there is a hierarchy), you have to show in one way or another that group each product belongs to, which lead to a repeated repetition of group names. This is not only cause an increase in redundancy, but also increase the likelihood of contradictions (if, for example, the same product is mistakenly assigned to different groups).

For more efficient work with hierarchical measurements, a modification of the “star” scheme was developed, which was called the snowflake. The main feature of the snowflake scheme is that information about one dimension can be stored in several related tables. That is, if at least one of the dimension tables has one or more other dimension tables associated with it, then the snowflake scheme will be applied.

Constellation of Facts Scheme (Galaxy Scheme)

The constellation of facts has several fact tables. It means that in a Galaxy (or constellation) scheme, two or more related fact tables are surrounded by corresponding dimension tables.

The difference between star and snowflake

The main functional difference between the snowflake scheme and the star scheme is the ability to work with hierarchical levels that determine the level of detail of the data. In the above example, the snowflake scheme allows to work with data at the level of maximum detail, for example, for each product separately, or to use a generalized representation of groups of goods with the corresponding aggregation of facts.

The choice of scheme for building DWH depends on the mechanisms used for collecting and processing data. Each of the schemes has its advantages and disadvantages, which, however, can manifest themselves to a greater or lesser extent depending on the characteristics of the functioning of the CD as a whole.

Advantages and disadvantages of data warehousing schemes

The advantages of the star scheme include:

  • simplicity and logical transparency of the model;
  • a simpler procedure for replenishing measurements, since you have to work with only one table.

The disadvantages of the star scheme are:

  • slow processing of measurements, since the same measurement values can occur several times in the same table;
  • high probability of inconsistencies in the data (in particular, inconsistencies), for example, due to input errors.

The advantages of the snowflake scheme are as follows:

  • it is closer to the presentation of data in a multidimensional model;
  • the procedure for loading from DWH into multidimensional structures is more efficient and simpler, since loading is performed from separate tables;
  • much lower probability of errors, data inconsistencies;
  • large, in comparison with the star scheme, compactness of data presentation, since all measurement values are mentioned only once.

Disadvantages of the snowflake scheme:

  • complex enough to implement and understand the data structure;
  • complicated procedure for adding measurement values.

Work examples


Send Us a Message