Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
Analyzing Data with Power BI and Power Pivot for Excel (Alberto Ferrari, Marco Russo) (z-lib.org).pdf
Скачиваний:
11
Добавлен:
14.08.2022
Размер:
18.87 Mб
Скачать

Chapter 3. Using multiple fact tables

In the previous chapter, you learned how to handle a scenario where there are two fact tables related to each other: header and detail tables. You saw how the best option to obtain a simple data model is modifying it to make it more like a star schema, which makes calculations much easier to perform.

This chapter moves one step further to cover a scenario where you have multiple fact tables that are not related to each other. This is a very common occurrence. Think, for example, of sales and purchases. Both sales and purchases are facts involving some common assets (products, for instance) and some unrelated ones (such as customers for sales and suppliers for purchases).

In general, using multiple fact tables is not an issue if the model is correctly designed and everything works fine. The scenario becomes more challenging when the fact tables are not correctly related to intermediate dimensions, as shown in the first examples, or when you need to create cross-filters between the fact tables. The latter is a technique you will learn in this chapter.

Using denormalized fact tables

The first example we look at is that of having two fact tables that, because of an excessive level of denormalization, are impossible to relate to each other. As you will learn, the solution is very simple: You re-create a star schema out of the unrelated tables to restore proper functionality of the model.

For this demo, we start with a very simple data model that contains only two tables: Sales and Purchases. They have nearly the same structure, and they are completely denormalized, meaning that all information is stored in the tables.

They have no relationships with dimensions. This model is shown in Figure 3-1.

FIGURE 3-1 The Sales and Purchases tables, when completely denormalized, do not have any relationship.

This is a common scenario when you want to merge two queries that you have already used for different purposes. Each of these two tables, by themselves, are perfectly good candidates to perform analysis with an Excel PivotTable. The problem arises when you want to merge the two tables into a single model and perform an analysis by using numbers gathered from the two tables together.

Let us consider an example. Suppose you define two measures, Purchase Amount and Sales Amount, with the following DAX code:

Click here to view code image

Purchase Amount

:=

SUMX

(

Purchases, Purchases[Quant

Sales Amount

:=

SUMX

(

Sales, Sales[Quantity] * S

You are interested in looking at the sales and purchase amount in a single report and you want to perform a computation on these two measures. Unfortunately, this is not as simple as it might seem. For example, if you put the manufacturer from Purchases on the rows of a PivotTable and both measures in the values, you obtain the result shown in Figure 3-2. There, the value of Sales Amount is clearly wrong, repeating itself for all the rows.

FIGURE 3-2 The Sales Amount and Purchase Amount measures in the same PivotTable produce the wrong results.

What is happening is that the filter created by the Manufacturer column in the Purchases table is active on that table only. It cannot reach the Sales table because there are no relationships between the two tables. Moreover, you cannot create a relationship between the two tables because there is no column suitable for building such a relationship. As you might remember, to create a relationship, the column used needs to be the primary key in the target table. In this case, the product name is not a key in either of the two tables because it has many repetitions in both. To be a key, the column must be unique.

You can easily check this by trying to create the relationship. When you do, you will receive an error message stating that the relationship cannot be created.

As is often the case, you can solve this problem by writing some complex DAX code. If you decide to use columns from Purchases to perform the filter, you can rewrite Sales Amount in such a way that it detects the filters coming from Purchases. The following code is for a version of Sales Amount that senses the filter on the manufacturer:

Click here to view code image

Sales Amount Filtered := CALCULATE (

[Sales Amount],

INTERSECT ( VALUES ( Sales[BrandName] ), VALUES

)

Using the INTERSECT function, this measure computes the set of

Sales[BrandName] that exists in the current selection of Purchases[BrandName]. As a result, any filter on Purchases[BrandName] will be moved to a filter on Sales[BrandName] to effectively filter the Sales table. Figure 3-3 shows the measure in action.

FIGURE 3-3 Sales Amount Filtered uses the filter from the Purchases table to filter the Sales table.

Even though it works, this measure is not an optimal solution for the following reasons:

The current version works with a filter on the brand name, but if you must use different columns, you need to add them all as separate INTERSECT statements inside the CALCULATE statement. This makes the formula complex.

The performance is not optimal because DAX generally works better with relationships than with filters created with CALCULATE.

If you have many measures that aggregate values from Sales, all of them need to follow the same complex pattern. This negatively affects the maintainability of the solution.

Just to give you an idea of how complex the formula would become if you added all the product columns to the filter, consider the following code, in which we expanded the previous pattern to all the relevant columns:

Click here to view code image

Sales Amount Filtered :=

CALCULATE (

[Sales Amount],

INTERSECT ( VALUES ( Sales[BrandName] ), VALUES INTERSECT ( VALUES ( Sales[ColorName] ), VALUES INTERSECT ( VALUES ( Sales[Manufacturer] ), VALU INTERSECT (

VALUES ( Sales[ProductCategoryName] ), VALUES ( Purchases[ProductCategoryName] )

), INTERSECT (

VALUES ( Sales[ProductSubcategoryName] ), VALUES ( Purchases[ProductSubcategoryName] )

)

)

This code is error-prone and needs a lot of effort to maintain it over time. If, for example, you increase the granularity of the tables by adding a column, then you will need to iterate over all the measures and fix them, adding the new INTERSECT for the newly introduced column. A better option involves updating the data model.

To write simpler code, you must modify the data model and transform it into a star schema. Everything would be much easier with a data structure like the one in Figure 3-4, where we added a Product dimension that is capable of filtering both the Sales and Purchases tables. Even if it does not look like it, this model is a perfect star schema, with two fact tables and a single dimension.

FIGURE 3-4 With a Product dimension, the data model becomes much easier to use.

Note

We hid the columns that have been normalized in the Product table.

This makes it impossible for users to use one of those columns in a report, as they would not be able to filter both tables.

To build such a data model, you typically face the following two problems:

You need a source for the Product table, and often you do not have access to the original tables.

The Product table needs a key to make it the target of a relationship.

The first problem is very simple to solve. If you have access to the original Product table, you can create the dimension from there by loading the data into the model. If, on the other hand, you cannot load the original Product table from the database, then you can create a technical one by using a Power Query transformation that loads both Sales and Purchases, performs a union of the two tables, and finally removes duplicates. The following is the M code that does the trick:

Click here to view code image

let

SimplifiedPurchases = Table.RemoveColumns( Purchases,

{"Quantity", "Unit cost", "Date"}

),

SimplifiedSales = Table.RemoveColumns( Sales,

{"Quantity", "Unit Price", "Date"}

),

ProductColumns = Table.Combine ( { SimplifiedPur Result = Table.Distinct (ProductColumns )

in

Result

As you can see, the M code first prepares two local tables, SimplifiedPurchases and SimplifiedSales, which contain only the relevant columns from Product and remove the unwanted ones. Then it combines the two tables by adding the rows of SimplifiedSales to SimplifiedPurchases. Finally, it retrieves only the distinct values, resulting in a table with unique products.

Note

You can obtain the very same result with the query editor in either

Excel or Power BI Desktop. You create two queries that remove the quantity and unit price from the original sources, and then you merge them into a single query by using the union operator. The details of how to perform this operation are beyond the scope of this book, however. We are focused on data modeling more than on userinterface details.

To create the technical dimension, you must combine the two queries with Sales and Purchases. It is possible that a given product exists in only one of these two tables. If you then retrieve the distinct values from only one of the two queries, the result will be a partial dimension, which might produce incorrect results when used in the model.

After you load the Product dimension in the model, you still must create relationships. In this case, you have the option of using the product name as the column to create the relationship because the product name is unique. In different scenarios, such as when you don’t have a suitable primary key for the intermediate dimension, you might be in trouble. If your original tables do not contain product names, then you cannot create the relationship with the product. For example, if you have the product category and product subcategory, but no product name, then you must create dimensions at the degree of granularity that is available. You would need a dimension for the product category and another dimension for the product subcategory, which you obtain by replicating the same technique shown before—only this time it is for different tables.

As is often the case, these kinds of transformations are better done before the data is loaded into the model. If you are loading data from a SQL Server database, for example, you can easily build SQL queries that perform all these operations for you, obtaining a simpler analytical model.

Before leaving this topic, it is worth noting that the same result can be obtained in Power BI by using calculated tables. Calculated tables are not available in Excel at the time of this writing, but they are available in Power BI and in SQL Server Analysis Services 2016. The following code creates a calculated table that contains the product dimension, and it is even simpler than the M code:

Click here to view code image

Products =

DISTINCT (

UNION (

ALL (