Dimensions and Facts in Terms of Data Warehousing

In data warehousing, facts and dimensions are common concepts. They tell us things like the number of resources used for a specific task. They both record the exact amount of resources as well as information about the resource and the task.

Platform principles: Dimensional modeling and the star schema:

The data warehouse is a collection of databases and operations that populate them automatically. At a high level, the ultimate goal is to create the data mart, from which users will run reports and conduct analytics to help them manage their business. We maintain this data in the warehouse longer than in the CMDB because it is more relevant for trending and analysis than it is for normal transactional processing.

A data warehouse is designed to handle large amounts of data at once and to process it in a variety of unanticipated ways. In contrast, transactional processing systems are designed to maximize write access on a small number of records inside a single transaction, resulting in a more consistent flow of operations.

We apply the Kimball approach to dimensional modeling in order to optimize the data warehouse for both usability and performance. This means that tables in the DWDataMart database are logically divided into subject matter sections that, when shown in a diagram, resemble stars. For this reason, these groupings are frequently referred to as “star schemas.”

Benefits of Data Warehouse:

Better business analytics: Data warehouse plays an important role in every business to store and analysis of all the past data and records of the company. which can further increase the understanding or analysis of data for the company.
Faster Queries: The data warehouse is designed to handle large queries that’s why it runs queries faster than the database.
Improved data Quality: In the data warehouse the data you gathered from different sources is being stored and analyzed it does not interfere with or add data by itself so your quality of data is maintained and if you get any issue regarding data quality then the data warehouse team will solve this.
Historical Insight: The warehouse stores all your historical data which contains details about the business so that one can analyze it at any time and extract insights from it.

star schema:

In data warehousing, what are facts?

In data warehousing terminology, a fact refers to quantitative transactional data such as measurements, metrics, or values that are prepared for analysis. Header numbers, order numbers, ticket numbers, transaction numbers, transaction currencies, and so forth are examples of these.

One type of fact measures or key performance indicator (KPI) is the quantity sold. The lowest degree of granularity in fact tables can be used to hold this data. Facts and various dimension tables establish foreign key linkages. In this case, every attribute of a primary key is present in a composite primary key. This is regarded as the dimension table’s foreign key.

What Is a Fact Table?

A fact table is a primary table in a dimensional model. It includes statistics, metrics, and information related to a business process. In dimensional tables, it serves as a foreign key as well. A fact table often contains numerical data. A fact table is located in the middle of a star or snowflake schema. Fact tables can be defined by their atomic level, store report labels, and lack a hierarchy.

Examples of facts include the following:

Additive
Semi-additive
Non-additive

Additive: describes the measures we must add to all dimensions.

Semi-additive: describes what measures can be added to some dimensions but not with others.

Non-additive: describes the storage of basic units of measurement of business processes—for example, phone calls, orders, and sales.

What Do Data Warehousing Dimensions Mean?

Dimensions are characteristics of facts such as the date of sale that go hand in hand with facts. Customer dimension attributes, for instance, often consist of first and last name, gender, birthdate, occupation, and so forth. The URL characteristics and website name make up a website dimension. Because of one-to-many relationships, they denormalize and characterize distinct things.

What Is a Dimensional Table?

A foreign key connects a dimensional table, which has data that gives the dimensions of a fact, to a fact table. Dimension tables comprise columns containing dimension properties.

Dimension tables are located in the periphery of a star or snowflake schema. They include comprehensive, descriptive, and highly wordy (and quality certified) detailed data.

Dimension tables benefit from the properties of the facts and frequently take the form of their descriptive characteristics. Occasionally, dimensions have one or more hierarchical connections. Additionally, there is no upper limit on how many dimensions can be attributed to these interactions.

Examples of dimension include the following:

Conformed dimensions
Degenerate dimensions
Junk dimensions
Role-playing dimensions
Shrunken rollup dimensions

Conformed dimensions: The relevant facts are the conformed dimensions. This dimension is not limited to one-star schemas or Datamarts. One great illustration of a conformed dimension is the date dimension. The month, week, day, and year are examples of attributes that convey the same information across a variety of facts. This method facilitates consistency by allowing us to keep things the same across fact tables. To generate distinct reports, many tables will utilize the same table across the fact table.

Degenerate dimensions: There are no comparable dimensions for degenerate dimensions. It isn’t dimensional because it is taken from a fact table. Degenerate dimensions are used to gather fact table snapshots. Product IDs, for instance, originate from a product dimension table. Nonetheless, the invoice number might be essential to monitoring product quantities because it is an independent property without any linked attributes.

Junk dimensions: A collection of arbitrary transactional codes, text characteristics, or flags is known as junk dimensions. They frequently defy logic and don’t fit into any one dimension. The values of these properties typically fit into the category of basic indicators, such as true or false or yes/no. A fact table may have one or more junk dimensions, depending on how complex it is. In order to control foreign keys formed by rapidly changing dimensions, junk dimensions are also created.

Role-playing dimensions: A logically separate role for every dimension is referenced in role-playing dimensions. In a fact table, each and every physical dimension is helpful to refer to repeatedly. A fact table, for instance, often includes foreign keys for the delivery and shipment dates. You can join these tables to the foreign keys since they are all associated with the same characteristics.

Shrunken rollup dimensions: In essence, shrunken rollup dimensions are divisions of a base dimension’s columns and rows. When creating aggregated fact tables, these dimensions work really well. When business procedures inherently gather data at a higher degree of detail, it’s useful. As an illustration, you can forecast using a brand and a month rather than a specific date and product linked to sales data.

#1Visual representation of Star Schema:

#2Visual representation of Star Schema:

Advantages of the Star Schema

It is extremely simple to understand and build.
No need for complex joins when querying data.
Accessing data is faster (because the engine doesn’t have to join various tables to generate results).
Simpler to derive business insights.
Works well with certain tools for analytics, in particular, with OLAP systems that can create OLAP cubes from data stored using star schema.

Disadvantages of the Star Schema

Denormalized data can cause integrity issues. This means some data can turn out to be inconsistent at times.
Maintenance may appear simple at the beginning, but the larger data warehouse you need to maintain, the harder it becomes (due to data redundancy).
It requires a lot more disk space than snowflake schema to store the same amount of data.
Many-to-many relationships are not supported.
Limited possibilities for complex queries development.

What is a snowflake schema?

A more complex format of data can be stored in a snowflake schema than in a star schema, a kind of database schema. The “fact table,” which is the main table in the snowflake schema, is joined by many other tables that are referred to as “dimension tables.” Similar to other schemas, the dimension tables hold data about the dimensions of the events or facts, while the fact tables hold data about the events or facts themselves.

#1Visual representation of Snowflake Schema:

#2Visual Representation of Snowflake schema

Advantage of Snowflake Schema

The primary advantage of the snowflake schema is the development in query performance due to minimized disk storage requirements and joining smaller lookup tables.
It provides greater scalability in the interrelationship between dimension levels and components.
No redundancy, so it is easier to maintain.

Disadvantage of Snowflake Schema

The primary disadvantage of the snowflake schema is the additional maintenance efforts required due to the increasing number of lookup tables. It is also known as a multi fact star schema.
There are more complex queries and hence, difficult to understand.
More tables more join so more query execution time.

Conclusion: In conclusion, the primary distinction is found in the kind of data they store and the function they fulfill in a data warehouse:
Fact tables are the hub of a data warehouse, storing quantifiable, numerical data that captures crucial performance metrics.
Dimension tables contextualize the data in the fact tables and hold descriptive information. They make it possible to analyze and classify the quantitative data.
Fact and dimension tables are connected by a network of foreign key relationships in a typical data warehouse schema to create a star or snowflake schema that makes data analysis and querying more effective.

Dimensions and facts help in organizing and analyzing the data in a meaningful way, providing valuable insights into the business operations. By understanding these concepts, we can make informed decisions and improve business operations.

Happy reading! Don’t forget to share your comments and feedback.

Dimensions and Facts in Terms of Data Warehousing

Benefits of Data Warehouse:

Advantages of the Star Schema

Disadvantages of the Star Schema

Leave a ReplyCancel Reply

Write for Us at DataAnalytic Group

Become a Subject Matter Expert

Hire Industry-Ready Talent from us

Corporate Training Programs