- •Contents at a glance
- •Contents
- •Introduction
- •Who this book is for
- •Assumptions about you
- •Organization of this book
- •Conventions
- •About the companion content
- •Acknowledgments
- •Errata and book support
- •We want to hear from you
- •Stay in touch
- •Chapter 1. Introduction to data modeling
- •Working with a single table
- •Introducing the data model
- •Introducing star schemas
- •Understanding the importance of naming objects
- •Conclusions
- •Chapter 2. Using header/detail tables
- •Introducing header/detail
- •Aggregating values from the header
- •Flattening header/detail
- •Conclusions
- •Chapter 3. Using multiple fact tables
- •Using denormalized fact tables
- •Filtering across dimensions
- •Understanding model ambiguity
- •Using orders and invoices
- •Calculating the total invoiced for the customer
- •Calculating the number of invoices that include the given order of the given customer
- •Calculating the amount of the order, if invoiced
- •Conclusions
- •Chapter 4. Working with date and time
- •Creating a date dimension
- •Understanding automatic time dimensions
- •Automatic time grouping in Excel
- •Automatic time grouping in Power BI Desktop
- •Using multiple date dimensions
- •Handling date and time
- •Time-intelligence calculations
- •Handling fiscal calendars
- •Computing with working days
- •Working days in a single country or region
- •Working with multiple countries or regions
- •Handling special periods of the year
- •Using non-overlapping periods
- •Periods relative to today
- •Using overlapping periods
- •Working with weekly calendars
- •Conclusions
- •Chapter 5. Tracking historical attributes
- •Introducing slowly changing dimensions
- •Using slowly changing dimensions
- •Loading slowly changing dimensions
- •Fixing granularity in the dimension
- •Fixing granularity in the fact table
- •Rapidly changing dimensions
- •Choosing the right modeling technique
- •Conclusions
- •Chapter 6. Using snapshots
- •Using data that you cannot aggregate over time
- •Aggregating snapshots
- •Understanding derived snapshots
- •Understanding the transition matrix
- •Conclusions
- •Chapter 7. Analyzing date and time intervals
- •Introduction to temporal data
- •Aggregating with simple intervals
- •Intervals crossing dates
- •Modeling working shifts and time shifting
- •Analyzing active events
- •Mixing different durations
- •Conclusions
- •Chapter 8. Many-to-many relationships
- •Introducing many-to-many relationships
- •Understanding the bidirectional pattern
- •Understanding non-additivity
- •Cascading many-to-many
- •Temporal many-to-many
- •Reallocating factors and percentages
- •Materializing many-to-many
- •Using the fact tables as a bridge
- •Performance considerations
- •Conclusions
- •Chapter 9. Working with different granularity
- •Introduction to granularity
- •Relationships at different granularity
- •Analyzing budget data
- •Using DAX code to move filters
- •Filtering through relationships
- •Hiding values at the wrong granularity
- •Allocating values at a higher granularity
- •Conclusions
- •Chapter 10. Segmentation data models
- •Computing multiple-column relationships
- •Computing static segmentation
- •Using dynamic segmentation
- •Understanding the power of calculated columns: ABC analysis
- •Conclusions
- •Chapter 11. Working with multiple currencies
- •Understanding different scenarios
- •Multiple source currencies, single reporting currency
- •Single source currency, multiple reporting currencies
- •Multiple source currencies, multiple reporting currencies
- •Conclusions
- •Appendix A. Data modeling 101
- •Tables
- •Data types
- •Relationships
- •Filtering and cross-filtering
- •Different types of models
- •Star schema
- •Snowflake schema
- •Models with bridge tables
- •Measures and additivity
- •Additive measures
- •Non-additive measures
- •Semi-additive measures
- •Index
- •Code Snippets
Product is on the many side of the relationship between Product and Product Subcategory. Thus, there are many products with the same category. If you add the Product Subcategory key to the fact table, you are not going to change its size, in terms of rows.
Whenever you build a new model, always take these considerations into account. After you define the dimensions, try to reduce the size of the fact table to its natural granularity, by performing grouping and pre-aggregation during the extraction of the data. The result is a smaller model, or more precisely, a model of optimal size: Neither too small nor too large, simply perfect.
Notice that the fact table discussed in this section includes no detailed information about the order number. If you put the order number in the table, then many rows will become different, even if they share the same set of dimensions. For example, two identical orders for the same customer could be grouped if you do not take into account the order number. However, as soon as you want the order number, they can no longer be grouped together. Thus, the presence of detailed information in the fact table changes the granularity of the table itself. You might have good reasons to store this detailed information in the fact table. It is only important to understand that its presence has a high cost in terms of size and memory use. Store these fields only if they are really needed for your reports.
Relationships at different granularity
Now that we have set the common terminology about granularity, let us see examples where granularity is different between fact tables. A great example is a budgeting scenario.
Analyzing budget data
When you need to analyze a budget, you are likely to check the difference between the actual sales (either in the past or in the current year) and the forecasted, budgeted figures. This leads to interesting key performance indicators (KPIs) and reports. To do this, however, you must face the problem of granularity. In fact, it is very unlikely that you have forecasted sales in the budget for each product and day. However, you have sales at the product and day level. Let us explore this with an example. Figure 9-3 shows a data model with a standard star schema for Sales and a Budget table, which contain the figures for the next year.
FIGURE 9-3 This data model contains sales and budget data in the same structure.
The budget information is present at the country/region and brand level. Obviously, it does not make sense to provide a day-by-day budget. When you forecast numbers, you do so at a higher level. The same applies at the product level. You cannot forecast the sales of individual products (unless you have very few sales). In the example shown in Figure 9-3 the budgeting manager focused on only two attributes: the country/region and the brand.
If you try to build a report that shows Sales and Budget in the same structure, you will find yourself in trouble because of the missing relationships. As you can see in Figure 9-4, you can slice sales in 2009 by brand using the product brand (that is, the Brand column in the Product table), but you cannot slice budget information using the same column, because there are no relationships between Product and Budget.
FIGURE 9-4 The product brand does not slice the Budget table because there are no relationships between Product and Budget.
You might remember that we dealt with a similar scenario in Chapter 1, “Introduction to data modeling.” At that time, you did not have the knowledge you have now, so we can now discuss in more detail the different options available to solve it.
It is important to note that the granularity issue is not a mistake in the model. The budget exists at its own granularity, whereas sales are present at a different one. Both tables are modeled the right way. However, it is not so easy to slice by both of them.
The first option we want to analyze is the easiest way of making the budget model a working one: reducing the granularity of both tables to match by removing details in Sales that are not present in Budget. You can easily do this by modifying the queries that load Sales and removing tables referenced by Sales that are at a level of detail not supported by the budget. The resulting model is shown in Figure 9-5.
FIGURE 9-5 By simplifying both tables, you can obtain a simple star schema.
To obtain the simplified model, we reduced the granularity of Sales by removing all the details. We had to remove the date references, the product key (replaced by the brand), and the StoreKey (replaced by the CountryRegion). We precomputed the amount of sales while grouping. All the dimensions are gone, replaced by two simple dimensions containing the brands and the country/region. The resulting model is straightforward and, as shown in Figure 9-6, it works just fine. You can slice Sales and Budget by the brand, obtaining meaningful figures.