次の方法で共有


Bulk Import Optimizations (Minimal Logging)

In the previous blog I had listed all bulk import optimizations. Now, I will describe each of these optimizations in detail in separate blog entries. Let me first start with minimal logging. By the way minimal logging is also called bulk-logging but the correct term is 'minimal logging'. Bulk logging refers more to the database recovery model that enables minimal logging. Note, that the minimal logging can also be enabled by SIMPLE recovery model as well. This BLOG decribes the behavior for minimal logging as it exists in SQL Server 2005.

Under minimal logging, the SQL Server does not log individual rows and logs only the page and extent allocations. So if each page can fit 100 rows, we are trading 100 log records for 1 log record, a significant improvement. It is useful here to step back and examine how SQL Server does ensures ACID properties even when individual rows are not logged or how does the log backup work.

When the bulk import transaction commits (i.e. the number of rows as specified in the batchsize), the SQL Server flushes all the data pages (a subset of these pages are likely to have been flushed as part of checkpoint) to disk to guarantee the atomocity of the transaction. This is not any more expensive than flushing the log records as the log records would contain the full image of all the rows. And if for some reason, the bulk import transaction needs to be rolled back, all SQL Server needs to do is to traverse the log records and mark the pages/extents to be free again. When the transaction log backup is taken, the SQL Server also backs up the data pages, including index pages when applicable, as part of the transaction log. This enables a database restore by restoring the database backup followed by zero or more transaction log backups. The only drawback of minimal logging is that it does not support point-in-time recovery. This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated. One can argue that we could possibly do point-in-time recovery to just before the bulk import had started but it is not supported. If this is of concern, you can do transaction log backup before/after bulk operations. Please refer to BOL for details of point-in-time recovery and associated restrictions.

So far we discussed minimal logging at a conceptual level. Now let us consider a simple example where we will insert 5 rows into a table both with minimal logging and without it and examine the difference.

The table t_heap below has a row length of > 1000 bytes so the log record representing the an insert of the data row will also be > 1000 bytes so it will be easy to spot when we examie the log records

create table t_heap (c1 int, c2 int, c3 decimal (38,2), c4 char(1000))

Let us execute the following command to insert 5 rows through optimized Bulk Import

bulk insert t_heap

from 'C:\t_heap-c-small.dat' with (TABLOCK)

Now we can look at the top 5 log records by executing the following query

select 5 operation,context,[log record length]

from fn_dblog(null, null)

where allocunitname='dbo.t_heap'

order by [Log Record Length] Desc

Here are the top 5 log records returned. You will notice that all records are < 100 bytes. In other words, the individual row inserts were not logged. The log records below refer to PFS, an allocation meta-information related page that tracks free space in the allocated page.

Operation context [log record length]

--------- ------- -------------------

LOP_MODIFY_ROW LCX_PFS 92

LOP_MODIFY_ROW LCX_PFS 92

LOP_MODIFY_ROW LCX_PFS 92

LOP_MODIFY_ROW LCX_PFS 92

LOP_MODIFY_ROW LCX_PFS 92

Now let us take a look at the log records when we insert 5 rows through the normal insert code path as follows

declare @i int

select @i = 0

while (@i < 5)

begin

     insert into t_heap values

      (@i, @i + 5000, 10.2, replicate ('a', 60))

     set @i = @i + 1

end

                                                                    

And here are the top 5 log records and you can see that in this case, each inserted row is fully logged.

operation context [log record length]

--------- ------- -------------------

LOP_INSERT_ROWS LCX_HEAP 1116

LOP_INSERT_ROWS LCX_HEAP 1116

LOP_INSERT_ROWS LCX_HEAP 1116

LOP_INSERT_ROWS LCX_HEAP 1116

LOP_INSERT_ROWS LCX_HEAP 1116

You can use fn_dblog() as a handy way to see if you are getting minimal logging or not.

Till now, we have discussed what does the minimal logging mean and how you can checks if you indeed are getting the minimal logging but we have not looked at the conditions that need to be met for minimal logging. The conditions for minimal logging are

· The database recovery model must be set to BULK_LOGGED or SIMPLE. Under FULL recovery model, as the name implies, all operations are fully logged.

· The target table should not have been marked for replication

· If database backup command is being run concurrently, the minimal logging is temporarily disabled but it does not stop bulk import. So you may see that part of bulk import is fully logged.

· Minimal logging depends on the indexes defined on the table and whether the table/index is empty or not as described in the table below. You will notice that TABLOCK is required to enable minimal logging.

Comments

  • Anonymous
    February 05, 2008
    The comment has been removed

  • Anonymous
    February 05, 2008
    You don't get minimal logging with regular TSQL INSERT statement independent of the recovery model. However, we are looking into if we can remove this restriction for regular inserts in future releases including SQL2008

  • Anonymous
    February 05, 2008
    The big news of the week is Windows Server 2008 RTM'ing. Read more here. The storage engine team has begin a series on bulk insert. It is definitely recommended reading. Part 1, Part 2, Part 3. The S ...

  • Anonymous
    February 07, 2008
    Thanks for the answer. I was looking for an article talking about the differences among FULL, SIMPLE and RECOVERY model (which each one of this setting really write in the log file), but I haven't find it. Could you talk about this or blog about it?

  • Anonymous
    February 07, 2008
    BOL has a good description on different recovery model. If you think something is not well covered, please send me a mail and we will update the BOL. In a nutshell, under SIMPLE recovery model, SQL Server truncates the inactive portion of the log but otherwise the logging is similar to what we have under bilk-logging recovery model.

  • Anonymous
    February 21, 2008
    If you have a table with clustered index, and with existing data, say you have 30 million in the table. When you bulk import 10 million more, each one would be fully logged, correct? What's the best way to achieve minimal logging in this scenario, drop the clustered index first, then add the clustered index back after bulk import? For a large table, creating a clustered index might take a while. But is it still a better option than having a clustered index in place when importing?

  • Anonymous
    February 21, 2008
    I recommend reading the following white paper that dealt with exactly these kinds of questions. This is still applicable for SQL2005. I will follow up this thread with some changes we have made for minimal logging in SQL Server 2008. http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspx

  • Anonymous
    December 09, 2009
    Hello Vishal It is a very good article. Can You Elabrate on the Below.I have picked the lines from your article. "This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated. "

  • Anonymous
    December 14, 2009
    The pages containing the minimally logged operations are backed up as part of log backup. When these log records are restored as part of restore, we don't have the UNDO information in case the changes in  these pages need to be undone.

  • Anonymous
    June 03, 2010
    Nice artice..... I have tried the above scripts ...But in the following order After creating the table .....I done  the Bulk Insert With( TabLock )...I can see more ...LOP_MODIFY_ROW        LCX_PFS  Records (am not using top 5 option.) Can you tell me why more LOP_MODIFY_ROW        LCX_PFS  records ?

  • Anonymous
    June 04, 2010
    Aljo, PFS pages need to be changed to track allocation/deallocation. Similarly GAM pages. Key thing about minimal logging is that we don't want to log the 'data row' thanks Sunil

  • Anonymous
    June 06, 2010
    Sunil, Thanks for clarifying this. Aljo

  • Anonymous
    October 29, 2011
    it is possible that logging  can be turned off temporarily to support bulk imports?  

  • Anonymous
    April 14, 2015
    The below statement does not appear to hold true any more: " If database backup command is being run concurrently, the minimal logging is temporarily disabled but it does not stop bulk import. So you may see that part of bulk import is fully logged." Has this behavior changed somewhere between SQL 2005 and SQL 2014?