Sdílet prostřednictvím


Optimizing Data Warehouse Query Performance Through Bitmap Filtering

Most data warehouse queries are designed to follow a star schema and can process hundreds of millions of rows in a single query. By default, the query optimizer detects queries against star schemas and creates efficient query plans for them. One method the optimizer can use to generate an efficient plan is to use bitmap filtering. A bitmap filter uses a compact representation of a set of values from a table in one part of the operator tree to filter rows from a second table in another part of the tree. Essentially, the filter performs a semi-join reduction; that is, only the rows in the second table that qualify for the join to the first table are processed.

In SQL Server 2008, bitmap filtering can be introduced in the query plan after optimization, as in SQL Server 2005, or introduced dynamically by the query optimizer during query plan generation. When the filter is introduced dynamically, it is referred to as an optimized bitmap filter. Optimized bitmap filtering can significantly improve the performance of data warehouse queries that use star schemas by removing non-qualifying rows from the fact table early in the query plan. Without optimized bitmap filtering, all rows in the fact table are processed through some part of the operator tree before the join operation with the dimension tables removes the non-qualifying rows. When optimized bitmap filtering is applied, the non-qualifying rows in the fact table are eliminated immediately.

Optimized bitmap filtering is available only on the Enterprise, Developer, and Evaluation editions of SQL Server.

Understanding Bitmap Filtering

The bitmap filter compares favorably to the bitmap index. A bitmap index is an alternate form for representing row ID (RID) lists in a value-list index using one or more bit vectors indicating which row in a table contains a certain column value. Both can be very effective in removing unnecessary rows from result processing; however, there are important differences between a bitmap filter and a bitmap index. First, bitmap filters are in-memory structures, thus eliminating any index maintenance overhead due to data manipulation language (DML) operations made to the underlying table. In addition, bitmap filters are very small and, unlike existing on-disk indexes that typically depend on the size of the table on which they are built, bitmap filters can be created dynamically with minimal impact on query processing time.

Comparing Bitmap Filtering with Optimized Bitmap Filtering

Bitmap filtering and optimized bitmap filtering are implemented in the query plan by using the bitmap showplan operator. Bitmap filtering is applied only in parallel query plans in which hash or merge joins are used. Optimized bitmap filtering is applicable only to parallel query plans in which hash joins are used. In both cases, the bitmap filter is created on the build input (the dimension table) side of a hash join; however, the actual filtering is typically done within the Parallelism operator, which is on the probe input (the fact table) side of the hash join. When the join is based on an integer column, the filter can be applied directly to the initial table or index scan operation rather than the Parallelism operator. This technique is called in-row optimization.

When bitmap filtering is introduced in the query plan after optimization, query compilation time is reduced; however, the query plans that the optimizer can consider are limited, and cardinality and cost estimates are not taken into account.

Optimized bitmap filters have the following advantages:

  • Filtering from several dimension tables is supported.

  • Multiple filters can be applied to a single operator.

  • Optimized bitmap filters can be applied to more operator types. These include exchange operators such as the Distribute Streams and Repartition Streams operators, table or index scan operators, and filter operators.

  • Filtering is applicable to SELECT statements and the read-only operators used in INSERT, UPDATE, DELETE, and MERGE statements.

  • Filtering is applicable to the creation of indexed views in the operators used to populate the index.

  • The optimizer uses cardinality and cost estimates to determine if optimized bitmap filtering is appropriate.

  • The optimizer can consider more plans.

How Optimized Bitmap Filtering Is Implemented

A bitmap filter is useful only if it is selective. The query optimizer determines when a optimized bitmap filter is selective enough to be useful and to which operators the filter is applied. The optimizer places the optimized bitmap filters on all branches of a star join and uses costing rules to determine whether the plan provides the smallest estimated execution cost. When the optimized bitmap filter is nonselective, the cost estimate is usually too high and the plan is discarded. When considering where to place optimized bitmap filters in the plan, the optimizer looks for hash join variants such as a right-deep stack of hash joins. Joins with dimension tables are implemented to execute the likely most selective join first.

The operator in which the optimized bitmap filter is applied contains a bitmap predicate in the form of PROBE([Opt_Bitmap1001], {[column_name]} [, 'IN ROW']). The bitmap predicate reports on the following information:

  • The bitmap name that corresponds to the name introduced in the Bitmap operator. The prefix 'Opt_' indicates an optimized bitmap filter is used.

  • The column probed against. This is the point from which the filtered data flows through the tree.

  • Whether the bitmap probe uses in-row optimization. When it is, the bitmap probe is invoked with the IN ROW parameter. Otherwise, this parameter is missing.

Example

The following example represents a query against a simple star schema. The two dimension tables DimProduct and DimCustomer join to the fact table FactInternetSales using a primary-key-to-foreign-key join on a single integer column.

USE AdventureWorksDW2008R2;
GO
SELECT * 
FROM dbo.FactInternetSales AS F
INNER JOIN dbo.DimProduct AS D1 ON F.ProductKey = D1.ProductKey
INNER JOIN dbo.DimCustomer AS D2 ON F.CustomerKey = D2.CustomerKey
WHERE D1.StandardCost <= 30 AND D2.YearlyIncome <= 50000;

The following illustration shows the execution plan for this query as it might appear in SQL Server 2005. At the points marked 1A, the dimension tables have been scanned and the information needed to filter out non-qualifying rows from the fact table (1B) is known. However, the properties of the Table Scan operator show that no predicate is used to limit the rows returned from the fact table.

SQL Server query plan without bitmap filters.

In contrast, the following illustration shows the execution plan of the same query as it might appear in SQL Server 2008. Optimized bitmap operators are used in the subtrees of both dimension tables. The properties of the table scan operator show that the filters (bitmap probes) from these subtrees are applied directly to the fact table tree to limit the rows returned from the fact table prior to the first join operation.

SQL Server query plan with bitmap filters.

Optimized Bitmap Filtering Requirements

Optimized bitmap filtering has the following requirements:

  • Fact tables are expected to have at least 100 pages. The optimizer considers smaller tables to be dimension tables.

  • Only inner joins between a fact table and a dimension table are considered.

  • The join predicate between the fact table and dimension table must be a single column join, but does not need to be a primary-key-to-foreign-key relationship. An integer-based column is preferred.

  • Joins with dimensions are only considered when the dimension input cardinalities are smaller than the input cardinality from the fact table.