- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
Chapter 3. Using multiple fact tables
In the previous chapter, you learned how to handle a scenario where there are two fact tables related to each other: header and detail tables. You saw how the best option to obtain a simple data model is modifying it to make it more like a star schema, which makes calculations much easier to perform.
This chapter moves one step further to cover a scenario where you have multiple fact tables that are not related to each other. This is a very common occurrence. Think, for example, of sales and purchases. Both sales and purchases are facts involving some common assets (products, for instance) and some unrelated ones (such as customers for sales and suppliers for purchases).
In general, using multiple fact tables is not an issue if the model is correctly designed and everything works fine. The scenario becomes more challenging when the fact tables are not correctly related to intermediate dimensions, as shown in the first examples, or when you need to create cross-filters between the fact tables. The latter is a technique you will learn in this chapter.
Using denormalized fact tables
The first example we look at is that of having two fact tables that, because of an excessive level of denormalization, are impossible to relate to each other. As you will learn, the solution is very simple: You re-create a star schema out of the unrelated tables to restore proper functionality of the model.
For this demo, we start with a very simple data model that contains only two tables: Sales and Purchases. They have nearly the same structure, and they are completely denormalized, meaning that all information is stored in the tables.
They have no relationships with dimensions. This model is shown in Figure 3-1.
FIGURE 3-1 The Sales and Purchases tables, when completely denormalized, do not have any relationship.
This is a common scenario when you want to merge two queries that you have already used for different purposes. Each of these two tables, by themselves, are perfectly good candidates to perform analysis with an Excel PivotTable. The problem arises when you want to merge the two tables into a single model and perform an analysis by using numbers gathered from the two tables together.
Let us consider an example. Suppose you define two measures, Purchase Amount and Sales Amount, with the following DAX code:
Click here to view code image
Purchase Amount |
:= |
SUMX |
( |
Purchases, Purchases[Quant |
Sales Amount |
:= |
SUMX |
( |
Sales, Sales[Quantity] * S |
You are interested in looking at the sales and purchase amount in a single report and you want to perform a computation on these two measures. Unfortunately, this is not as simple as it might seem. For example, if you put the manufacturer from Purchases on the rows of a PivotTable and both measures in the values, you obtain the result shown in Figure 3-2. There, the value of Sales Amount is clearly wrong, repeating itself for all the rows.
FIGURE 3-2 The Sales Amount and Purchase Amount measures in the same PivotTable produce the wrong results.
What is happening is that the filter created by the Manufacturer column in the Purchases table is active on that table only. It cannot reach the Sales table because there are no relationships between the two tables. Moreover, you cannot create a relationship between the two tables because there is no column suitable for building such a relationship. As you might remember, to create a relationship, the column used needs to be the primary key in the target table. In this case, the product name is not a key in either of the two tables because it has many repetitions in both. To be a key, the column must be unique.
You can easily check this by trying to create the relationship. When you do, you will receive an error message stating that the relationship cannot be created.
As is often the case, you can solve this problem by writing some complex DAX code. If you decide to use columns from Purchases to perform the filter, you can rewrite Sales Amount in such a way that it detects the filters coming from Purchases. The following code is for a version of Sales Amount that senses the filter on the manufacturer:
Click here to view code image
Sales Amount Filtered := CALCULATE (
[Sales Amount],
INTERSECT ( VALUES ( Sales[BrandName] ), VALUES
)
Using the INTERSECT function, this measure computes the set of
Sales[BrandName] that exists in the current selection of Purchases[BrandName]. As a result, any filter on Purchases[BrandName] will be moved to a filter on Sales[BrandName] to effectively filter the Sales table. Figure 3-3 shows the measure in action.
FIGURE 3-3 Sales Amount Filtered uses the filter from the Purchases table to filter the Sales table.
Even though it works, this measure is not an optimal solution for the following reasons:
The current version works with a filter on the brand name, but if you must use different columns, you need to add them all as separate INTERSECT statements inside the CALCULATE statement. This makes the formula complex.
The performance is not optimal because DAX generally works better with relationships than with filters created with CALCULATE.
If you have many measures that aggregate values from Sales, all of them need to follow the same complex pattern. This negatively affects the maintainability of the solution.
Just to give you an idea of how complex the formula would become if you added all the product columns to the filter, consider the following code, in which we expanded the previous pattern to all the relevant columns:
Click here to view code image
Sales Amount Filtered :=
CALCULATE (
[Sales Amount],
INTERSECT ( VALUES ( Sales[BrandName] ), VALUES INTERSECT ( VALUES ( Sales[ColorName] ), VALUES INTERSECT ( VALUES ( Sales[Manufacturer] ), VALU INTERSECT (
VALUES ( Sales[ProductCategoryName] ), VALUES ( Purchases[ProductCategoryName] )
), INTERSECT (
VALUES ( Sales[ProductSubcategoryName] ), VALUES ( Purchases[ProductSubcategoryName] )
)
)
This code is error-prone and needs a lot of effort to maintain it over time. If, for example, you increase the granularity of the tables by adding a column, then you will need to iterate over all the measures and fix them, adding the new INTERSECT for the newly introduced column. A better option involves updating the data model.
To write simpler code, you must modify the data model and transform it into a star schema. Everything would be much easier with a data structure like the one in Figure 3-4, where we added a Product dimension that is capable of filtering both the Sales and Purchases tables. Even if it does not look like it, this model is a perfect star schema, with two fact tables and a single dimension.
FIGURE 3-4 With a Product dimension, the data model becomes much easier to use.
Note
We hid the columns that have been normalized in the Product table.
This makes it impossible for users to use one of those columns in a report, as they would not be able to filter both tables.
To build such a data model, you typically face the following two problems:
You need a source for the Product table, and often you do not have access to the original tables.
The Product table needs a key to make it the target of a relationship.
The first problem is very simple to solve. If you have access to the original Product table, you can create the dimension from there by loading the data into the model. If, on the other hand, you cannot load the original Product table from the database, then you can create a technical one by using a Power Query transformation that loads both Sales and Purchases, performs a union of the two tables, and finally removes duplicates. The following is the M code that does the trick:
Click here to view code image
let
SimplifiedPurchases = Table.RemoveColumns( Purchases,
{"Quantity", "Unit cost", "Date"}
),
SimplifiedSales = Table.RemoveColumns( Sales,
{"Quantity", "Unit Price", "Date"}
),
ProductColumns = Table.Combine ( { SimplifiedPur Result = Table.Distinct (ProductColumns )
in
Result
As you can see, the M code first prepares two local tables, SimplifiedPurchases and SimplifiedSales, which contain only the relevant columns from Product and remove the unwanted ones. Then it combines the two tables by adding the rows of SimplifiedSales to SimplifiedPurchases. Finally, it retrieves only the distinct values, resulting in a table with unique products.
Note
You can obtain the very same result with the query editor in either
Excel or Power BI Desktop. You create two queries that remove the quantity and unit price from the original sources, and then you merge them into a single query by using the union operator. The details of how to perform this operation are beyond the scope of this book, however. We are focused on data modeling more than on userinterface details.
To create the technical dimension, you must combine the two queries with Sales and Purchases. It is possible that a given product exists in only one of these two tables. If you then retrieve the distinct values from only one of the two queries, the result will be a partial dimension, which might produce incorrect results when used in the model.
After you load the Product dimension in the model, you still must create relationships. In this case, you have the option of using the product name as the column to create the relationship because the product name is unique. In different scenarios, such as when you don’t have a suitable primary key for the intermediate dimension, you might be in trouble. If your original tables do not contain product names, then you cannot create the relationship with the product. For example, if you have the product category and product subcategory, but no product name, then you must create dimensions at the degree of granularity that is available. You would need a dimension for the product category and another dimension for the product subcategory, which you obtain by replicating the same technique shown before—only this time it is for different tables.
As is often the case, these kinds of transformations are better done before the data is loaded into the model. If you are loading data from a SQL Server database, for example, you can easily build SQL queries that perform all these operations for you, obtaining a simpler analytical model.
Before leaving this topic, it is worth noting that the same result can be obtained in Power BI by using calculated tables. Calculated tables are not available in Excel at the time of this writing, but they are available in Power BI and in SQL Server Analysis Services 2016. The following code creates a calculated table that contains the product dimension, and it is even simpler than the M code:
Click here to view code image
Products =
DISTINCT (
UNION (
ALL (