Since its existence, the terms “facts” and “dimensions” have been used in Data Warehousing. These tables fall under the database modeling style known as a “dimensional model”. This style was first introduced by Ralph Kimball, the data warehouse guru. This database style modification has also become known as an OLAP (Online Analytical Processing) database. The purpose of building a dimensional model is to speed up the database performance when querying a database.
Dimension Tables
The two types of tables used in dimensional modeling are facts and dimensions. A dimension table contains descriptive attribute fields whereas a fact table contains only measures and key relationships. This is why requirements gathering is very important when building a data warehouse in its early stages. It’s crucial to capture all of the required fields (or columns). If not caught in these early stages of project development, there will likely be more development needed later on.
Some examples of dimension tables include:
- DimCustomer – attributes of a customer
- DimEmployee – attributes of an employee
- DimInvoice – invoice attributes
- DimOrganization – attributes of an organization
- DimDate – attributes of a date (ex. fiscal year, month, day)
Dimensions contain attributes of the objects they’re trying to describe. As an example, in the DimCustomer or DimEmployee tables we may see attributes such as first name, last name or date of birth. We might also see attributes such as invoice date, invoice amount, etc.. in the DimInvoice table. These descriptive fields will become useful when it comes time for reporting and analysis.
You might be thinking – “Invoice amount, shouldn’t that be on the fact table?” or “Invoice date, shouldn’t that be a key to the date dimension on the fact table?” and the truth is that it can be both. Dates and amounts can still be attribute fields in a dimension as well as a key field or amount on a fact table. It all depends on how these attributes are used for reporting.
Date Dimensions
Date dimension tables and are seen in almost every data warehouse and should be especially noted. These tables are usually static meaning they do not change. This is largely because we already know which fields are most used for reporting (ex. year, month, and day). The data is also minimal to load – for example, 30 years worth of dates is only ~11k records. These date tables are typically populated one time unless there are changes in requirements.
Conformed Dimensions
One last note on dimensions – they’re re-usable meaning we can use them across multiple fact tables. Dimension tables that get re-used across multiple fact tables are known as conformed dimensions. The date dimension is a prime example of a conformed dimension candidate. Most fact tables have date keys on them. Date keys are commonly aliased across several fact tables. We can see in the following diagram there are multiple fact tables that connect to the same DimCustomer and DimDate tables. These are examples of conformed dimensions.
Fact Tables
Fact tables are an entirely different beast. There are usually no attribute fields on a fact table, but that’s not an iron-clad rule. If an attribute field is included into a fact table it’s referred to as a degenerate dimension field. These degenerate columns are usually only added as one-offs when it doesn’t make sense to maintain them in their own dimension table.
We introduce fact tables when we’re trying to aggregate specific measures and provide a certain level of detail. A fact table will tie together several dimension tables to form what is known as a “star schema”. We can see in the pictured data model diagram that they tend to form a star-like structure.
Fact Table vs Dimension Table
Fact tables usually contain several measures related to the tied attribute dimensions. We may see certain measures on a fact table such as revenue, quantity, amount, etc… A fact table without any measures is known as a “fact-less fact”. The relationship between a dimension and fact table is always 1 to many or 1 to 1. It’s never many to 1 as the fact table always contains the lowest level of detail.
Fact tables can also be harder to name since they represent an accumulation of data. In our above example, we have a fact table that’s based on sales transactions so we just labeled it FactTransactions. Naming a fact table can be tough depending on the company, business terminology, and level of detail. We might rename FactTransactions to something like FactSalesTransactions, FactSales, or FactRevenue instead.
Aggregate, Summary, and Roll-Up Tables
The data granularity (or level of detail) of a fact table is also very important. This is the lowest level of detail that a fact table can provide. For performance reasons a fact table can be rolled-up to a summary table. Sometimes we even see both detail and summary facts created depending on the situation. Using FactTransactions as an example, we may have a FactTransactionDetail and FactTransactionSummary instead. Summary facts will contain less dimension attributes at a higher aggregated level for performance reasons. As computer performance gets increasingly better, there’s less need to create these fact table roll-ups.
Final Thoughts – Fact Table vs Dimension Table
Now we know a dimension represents an object and its attributes (or descriptive fields). On the other hand, a fact table contains measures and represents the lowest level of detail in a dataset. Understanding these differences is an important step to becoming a data warehouse expert.
We could even go a level further. There are several types of slowly changing dimensions (aka SCD’s) and three types of fact tables: transaction, periodic snapshots and accumulating snapshots.