How Update Statistics really works in SAP BW
- This blog has been re-published at https://techcommunity.microsoft.com/t5/Running-SAP-Applications-on-the/How-Update-Statistics-really-works-in-SAP-BW/ba-p/367454
- This blog post might be outdated by now. The latest documentation regarding SQL Server Columnstore on SAP is available in https://www.sap.com/documents/2019/04/023e5928-487d-0010-87a3-c30de2ffd8ff.html
In the following we explain, how database statistics should be updated for SAP BW running on Microsoft Server. In particular, we describe the new feature “Lightweight Update Statistics”, which speeds up by factors SAP BW process chains containing Update Statistics.
The implementation of Update Statistics in Microsoft SQL Server as well as the implementation of Update Statistics within SAP BW (running on SQL Server) has been improved a few times within the last 10 years. We will discuss here only SAP BW 7.00 and newer and SQL Server 2005 and newer. All older releases of SQL Server or SAP BW should not even be considered for running a productive system. Some of the features we describe below require the newest SAP Support Packages as listed at the end of this BLOG.
The BW user interface is independent from the underlying RDBMS. As a result, it looks the same for ORACLE, DB2, Sybase and SQL Server. The BW user interface does not take any specifics of the RDBMS into account. This causes some confusion for SAP BW customers.
The improvements we implemented resulted in a changed system behavior in the past. This still causes some confusion for SAP BW customers, in particular when running age-old SAP BW support packages. All improvements were driven by two goals:
- Increasing BW query performance by avoiding outdated statistics, in particular on small dimension tables
- Decreasing overhead of running Update Statistics and leveraging SQL Server Automatic Update Statistics
Update Statistics within Microsoft SQL server
The cost based query optimizer of SQL Server relies on actual index and column statistics. SAP strongly recommends turning on the database options AUTO_UPDATE_STATISTICS and AUTO_CREATE_STATISTICS. Under no circumstances you should ever turn off one of these database options. As a result, statistics are updated automatically by Microsoft SQL Server (and additional column statistics are created when needed).
Parameterized SQL statements benefit from the database option AUTO_UPDATE_STATISTICS_ASYNC. Therefore SAP strongly recommends, to turn it on for all SAP ERP systems.
SAP BW queries do not use parameterized SQL statements. They always use literals. However, besides BW queries there are also normal Open SQL queries running on an SAP BW system. These queries are typically running against administration tables. When running your system with actual SAP support packages, you will hardly run into a situation, where the setting of AUTO_UPDATE_STATISTICS_ASYNC matters.
Therefore you should turn on AUTO_UPDATE_STATISTICS_ASYNC also for SAP BW systems. However, there may be special scenarios, were SAP support recommends turning it off for a specific customer system. You can find a discussion about this in https://blogs.msdn.com/b/saponsqlserver/archive/2012/01/13/auto-update-statistics-async-in-sap-netweaver.aspx.
Automatic Update Statistics runs fine for almost all ERP tables. It is described in detail at https://msdn.microsoft.com/en-us/library/dd535534(SQL.100).aspx. In the past we fully relied on Automatic Update Statistics in SAP BW, too. However, for small tables Automatic Update Statistics is not sufficient in SAP BW:
Issue with small tables
The Automatic Update Statistics of Microsoft SQL Server is not running for tables, which contain less than 500 rows. For a typical ERP system this is not an issue, because for small tables the execution plan does not matter anyway. However, in SAP BW we typically join small tables (which potentially have less than 500 rows) with very large fact tables (having hundreds of million rows). We have seen really bad execution plans when executing these JOINs, if the small table did not have any database statistics (or a histogram containing only a handful of rows). We did not observe any difference between using the default sample rate and FULL Update Statistics (except for SQL Server 2000 – that’s the reason why we still use the WITH FULLSCAN option for small tables in SAP BW).
The first example are temporary tables (for example /BI0/06* tables) used in SAP BW queries. In newer SAP BW support packages, the statistics of these tables are automatically updated by SAP BW (means: SAP BW is explicitly executing a manual UPDATE STATISTICS command on SQL Server when running the BW query).
The second example are small dimension tables. They are filled when loading data into an InfoCube. We strongly recommend using BW process chains including the process type "Construct Database Statistics". In this case you will never see an issue with the statistics of a dimension table. When not doing so, you may run into the following issue:
When a new cube is created, the dimension tables are filled with one row having dimension ID 0 (column DIMID = 0). The first SELECT on this table causes an Automatic Update Statistics, which creates a histogram on DIMID (which is the primary key for dimension tables). Further Automatic Update Statistics are not triggered as long as the table has less than 500 rows.
When loading a few rows of master data (but less than 499 rows) into the dimension table, the histogram still wrongly contains only one value for the DIMID column (DIMID = 0). This is the worst case when joining a dimension table with a fact table, because fact tables never contain rows with dimension ID 0.
Once you run into this issue (because you did not run update statistics in a BW process chain), you can fix it by running a manual update statistics once. If you load further data into the cube, the number of rows in the dimension table can increase. However, in many cases an additional update statistics is not necessary any more, since the histogram is good enough (it already contains several valid values for DIMID besides 0). Once a dimension table reaches the 500 rows limit, the automatic update statistics runs anyway.
In the following we describe how Update Statistics is implemented in SAP BW for Microsoft SQL Server:
Update Statistics within SAP BW
InfoCube property for Update Statistics (not used anymore)
You can set a property for an InfoCube to automatically update the database statistics after each data load when using 3.x DataSources:
This property still causes a lot of confusion, because it is not used for Data Transfer Processes (DTP) or within Process Chains. You typically do not need to take care about this property, when considering an Update Statistics Strategy for SAP BW 7.0 or newer. That’s all, you need to know about it.
Why is it still available in SAP BW 7.30? It is still a relic from SAP BW 3.5 and older releases: For 3.x InfoSources you can create transfer and update rules, which allow direct staging from a 3.x DataSource into an InfoCube. When executing an InfoPackage (outside a Process Chain), which loads data from a 3.x DataSource into an InfoCube, the database statistics of the InfoCube can be automatically updated.
However, this is a very rare scenario. As of SAP BW 7, all data is supposed to be loaded using DTPs within Process Chains. There is explicitly a process type "Construct Database Statistics", which can be added to a Process Chain. The InfoCube properties "Refresh DB statistics after each data load" and "Also refresh statistics after delta upload" do not have any impact when using DTPs!
Manual Update Statistics in RSA1
You can manually update the database statistics of all tables belonging to an InfoCube in transaction RSA1 => (select) InfoProvider => (right click) Manage => (choose tab) Performance.
There are two buttons: "Refresh Statistics" and "Create Statistics (Btch)". Both do exactly the same: They start an SAP batch job, which calls function module RSDU_ANALYZE_TABLE_MSS for each table belonging to the InfoCube. When pressing "Refresh Statistics", the batch job starts immediately. When pressing "Create Statistics (Btch)", you can define the name of the batch job and schedule it (periodically):
Here you can also change the InfoCube properties for 3.x DataSources. As already mentioned, they have no impact when using DTPs or Process Chains. "Create Statistics (Btch)" does not create any additional column-statistics on Microsoft SQL Server. Column-statistics are automatically created as required by Microsoft SQL Server during query execution.
The parameter "Percentage of IC Data Used to Create Statistics” is not used for Microsoft SQL Server. A configured sample rate in table DBSTATC is also not taken into account (any more). Instead the sample rate for UPDATE STATISTICS is calculated as follows:
- For tables having 500 rows or less, the WITH FULLSCAN clause is added.
- For all other tables, the default sample rate is used (calculated by Microsoft SQL Server).
In the newest* Support Packages, SAP BW never executes the SQL command UPDATE STATISTICS for a table, if the automatic update statistics was turned off (by using sp_autostats) for any INDEX STATISTICS or COLUMN STATISTICS of this table. We implemented this behavior, because a manual update statistics would change the sp_autostats setting implicitly. To clarify: we do not recommend changing the autostats options for SAP BW tables.
Manual Update Statistics in DBACOCKPIT
In SAP Transaction DBACOCKPIT you can manually update the database statistics for any INDEX or COLUMN STATISTICS of any table (not only for SAP BW tables). Here you can choose any sample rate you want.
However, we do not recommend using DBACOCKPIT to update the statistics for SAP BW tables. RSA1 is much more sophisticated, since you can update the statistics of all database tables belonging to an InfoCube at the same point in time. Furthermore, you can schedule update statistics as a batch job in RSA1.
Scheduled Update Statistics
There are two ways to schedule Update Statistics for all tables of an InfoCube:
- Use "Create Statistics (Btch)" in transaction RSA1 as described above.
- Create a Process Chain, containing only the process type "Construct Database Statistics" and schedule the Process Chain.
However we do not recommend to schedule Update Statistics at all. You should rather update the database statistics at the end of the Process Chains, which contain the DTPs. In addition, you can leverage the Update Statistics included in the cube compression, as described below.
Update Statistics within BW process chains
You can use the BW process type "Construct Database Statistics" within a BW Process Chain. The given sample rate in the process variant is not taken into account. It is calculated the same way as described above for Manual Update Statistics in RSA1.
You can drop and re-create the secondary database indexes of the f-fact table during data load by using the following BW process types within a Process Chain (“Delete Index” has no impact on the e-fact table):
- “Delete Index” (labeled as “Indexes” in the picture above)
- “Data Transfer Process”
- “Generate Index” (labeled as “Create Index” in the picture above)
- "Construct Database Statistics" (labeled as “DB Statistics” in the picture above)
The DTP is faster, if there are no secondary indexes on the f-fact table. On the other hand, you have the additional overhead of re-creating these indexes. The benefit depends on the ratio between transferred data and data already contained in the f-fact table. Therefore you may not want include the “Delete Index” process type in all Process Chains which perform DTPs.
However, including “Generate Index” in a Process Chain is always a good idea. Indeed, this process type repairs all indexes of the fact-tables. The runtime is negligible, if all indexes already exist.
We recommend to add the BW process type "Construct Database Statistics" at the end of all process chains, which use DTPs to InfoCubes. Hereby you make sure that the database statistics for small dimension tables are always up-to-date.
There is one tricky thing you should know about "Construct Database Statistics": In the variant of the processs type you can either define the InfoCubes directly or those InfoCubes, which are the target of a Data Transfer Process. We recommend using the latter:
By doing so, only the statistics of the dimension tables and the f-fact table are updated. The statistics of the e-fact table is not updated, because the e-fact table is not modified by the DTP (when specifying the InfoCube instead, an UPDATE STATISTICS is running for all tables belonging to the InfoCube. This also includes the e-fact table and the globally used master data tables of all InfoObjects contained in the InfoCube).
You can further reduce the overhead of the BW process type "Construct Database Statistics" by leveraging Lightweight Update Statistics, a new feature of SAP BW running on Microsoft SQL Server. As a result, UPDATE STATISTICS is actually only performed on two types of tables:
- small tables (having 500 rows or less)
- tables, which have at least one empty INDEX STATISTICS or COLUMN STATISTICS (means, the creation date of the statistics is empty).
Lightweight Update Statistics can only be used within Process Chains and only when specifying the InfoCubes indirectly as a target of a DTP. Since there are typically no missing statistics (except for empty tables), Lightweight Update Statistics is very fast: We update only the database statistics of those tables, which may cause performance issues in BW queries: small (dimension) tables.
You can configure Lightweight Update Statistics by setting the RSADMIN parameter MSS_LEIGHTWEIGHT_UPDSTATS using SAP report SAP_RSADMIN_MAINTAIN. You have the following options:
- MSS_LEIGHTWEIGHT_UPDSTATS is not set (default setting): Lightweight Update Statistics is used in BW Process Chains.
However, it is never used when specifying the cube directly (instead of the DTP) or for a Manual Update Statistics in RSA1 - MSS_LEIGHTWEIGHT_UPDSTATS = 'FALSE':
Lightweight Update Statistics is never used.
BW triggered Update Statistics
There are a few scenarios where SAP BW automatically executes an UPDATE STATISTICS on particular tables. The most important scenario is the usage of temporary tables within SAP BW queries. These tables (for example /BI0/06* tables) are created and filled with data. Before using them for JOINs in BW queries, SAP BW executes an UPDATE STATISTICS on these temporary tables.
Another scenario is the BW cube compression (aka Condense or Collapse): In the newest* BW support packages, Update Statistics is running on all tables of the InfoCube as last step of the cube compression. You can modify this behavior by setting the RSADMIN parameter MSS_COND_UPDSTATS using SAP report SAP_RSADMIN_MAINTAIN. You have the following options:
- MSS_COND_UPDSTATS is not set (default setting):
Update statistics is running on all tables of the InfoCube at the end of the BW cube compression - MSS_COND_UPDSTATS = 'FALSE':
No update statistics is running during BW cube compression - MSS_COND_UPDSTATS = 'E' , ‘F’, ‘D’, ‘EF’, ‘DEF’ …
Update statistics is running only on particular tables at the end of the BW cube compression, dependent on the characters contained in the parameter value:- Included ‘E’: update statistics of e-fact table
- Included ‘F’: update statistics of f-fact table
- Included ‘D’: update statistics of dimension tables
Update Statistics Strategy for SAP BW
*) When we talk about the newest SAP support package stacks in this BLOG, we mean the following SPS + applying SAP note 1755488 (Lightweight UPDATE STATISTICS in SAP BW) – or any of the subsequent support packages.
- 7.00 SPS 27
- 7.01 SPS 12
- 7.02 SPS 12
- 7.11 SPS 10
- 7.3 SPS 8
- 7.31 SPS 5
In order to use Lightweight Update Statistics or Update Statistics during BW cube compression you have to apply the newest* BW support packages. Additional benefit of these support packages is described in https://blogs.msdn.com/b/saponsqlserver/archive/2012/08/17/sap-bw-improvements-in-new-support-package.aspx.
Scenario I: Not having the newest* BW support packages
Even when not applying the newest* support packages, we do not expect any issues with database statistics on small tables any more, if you
- implement SAP note 1610269 (BW query performance improvements in rsdu_analyze_table_mss). The correction instruction attached to the note works for SAP BW 7.00 SP18 and all newer SAP BW releases or support packages. This is a mandatory fix. Without this fix, BW triggered update statistics on temporary tables does not work.
- For cubes having only a few dozen rows in the dimension tables:
Use BW process chains including "Construct Database Statistics" for each DTP to make sure, that small dimension tables have up-to-date statistics. To be on the save side you may add the Update Statistics to all DTP process chains. However, this will result in an increased runtime of the process chain.
Scenario II: Having the newest* BW support packages
The following strategy keeps database statistics up-to-date, while reducing the overhead of running UPDATE STATISTICS to a minimum. It is automatically used, when applying the newest* BW support packages, without the need to set any RSADMIN parameter. You only have to include the process type "Construct Database Statistics" (using the object type “Data Transfer Process” in its variant) at the end of each DTP Process Chain. The Update Statistics is then spread over 3 places:
- Lightweight Update Statistics in each DTP Process Chain
DTP Process Chains typically run several times a day, potentially even during normal working hours. Here we want to reduce the overhead of an Update Statistics to an absolute minimum. A single Process Chain may contain may DTPs from different data sources to the same data target. The last step of the Process Chain should be "Construct Database Statistics". Per default, Lightweight Update Statistics is used for it. Therefore the database statistics of small dimension tables are always up-to-date immediately after the content of these tables has changed. The database statistics of the fact tables are still good enough. - BW triggered Update Statistics during BW cube compression
The BW cube compression runs less frequent than DTP Process Chains, typically once a day or once a week. Several BW requests are compressed within a single BW cube compression job. Normally the BW cube compression runs at night and its runtime can be significant. Therefore, it is a good occasion to update the statistics of the whole InfoCube at the end of the BW cube compression. This is done per default in the newest* BW support packages. Therefore also the database statistics of the fact-tables are regularly updated. - Additional Automatic Update Statistics
When running regularly BW cube compression, you will not see any Automatic Update Statistics running on BW cube tables. However, you may not want to perform BW cube compression on particular InfoCubes. In this case, the Automatic Update Statistics of SQL Server ensures up-to-date statistics also for the fact tables of these InfoCubes. The overhead is relatively small. We always use an automatically calculated sample rate. Automatic Update Statistics runs on a separate thread (when AUTO_UPDATE_STATISTICS_ASYNC is set) and does not block the triggering BW query.
In the newest* BW support packages we further decreased the overhead of Automatic Update Statistics: When creating additional indexes (for example the single-column indexes on the dimension tables), the already existing column statistics may become useless. However, they have to be maintained during an Automatic Update Statistics. In the process type "Generate Index" of BW process chains we drop automatically created column statistics, which became obsolete in the meanwhile. As a result, any following Automatic Update Statistics becomes faster.
Performance improvements
The runtime of Update Statistics depends on many parameters: Hardware (CPU, RAM, I/O), cube size (number of rows in fact tables), cube design (number of dimensions, number of characteristics, number of rows in dimension table). We have performed some measurements on a test system (running inside a virtual machine). The absolute runtime may not be typical, however it gives you an idea about the performance impact of Lightweight Update Statistics. The cube had 38 characteristics in 11 dimensions, 8 key figures, 1 million rows in the f-fact table and 10 million rows in the e-fact table. Since most of the dimension tables already contained more than 500 rows, the Lightweight Update Statistics was very fast (229 times faster compared with Normal Update Statistics):