Bulk Import Optimizations (Minimal Logging)
In the previous blog I had listed all bulk import optimizations. Now, I will describe each of these optimizations in detail in separate blog entries. Let me first start with minimal logging. By the way minimal logging is also called bulk-logging but the correct term is 'minimal logging'. Bulk logging refers more to the database recovery model that enables minimal logging. Note, that the minimal logging can also be enabled by SIMPLE recovery model as well. This BLOG decribes the behavior for minimal logging as it exists in SQL Server 2005.
Under minimal logging, the SQL Server does not log individual rows and logs only the page and extent allocations. So if each page can fit 100 rows, we are trading 100 log records for 1 log record, a significant improvement. It is useful here to step back and examine how SQL Server does ensures ACID properties even when individual rows are not logged or how does the log backup work.
When the bulk import transaction commits (i.e. the number of rows as specified in the batchsize), the SQL Server flushes all the data pages (a subset of these pages are likely to have been flushed as part of checkpoint) to disk to guarantee the atomocity of the transaction. This is not any more expensive than flushing the log records as the log records would contain the full image of all the rows. And if for some reason, the bulk import transaction needs to be rolled back, all SQL Server needs to do is to traverse the log records and mark the pages/extents to be free again. When the transaction log backup is taken, the SQL Server also backs up the data pages, including index pages when applicable, as part of the transaction log. This enables a database restore by restoring the database backup followed by zero or more transaction log backups. The only drawback of minimal logging is that it does not support point-in-time recovery. This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated. One can argue that we could possibly do point-in-time recovery to just before the bulk import had started but it is not supported. If this is of concern, you can do transaction log backup before/after bulk operations. Please refer to BOL for details of point-in-time recovery and associated restrictions.
So far we discussed minimal logging at a conceptual level. Now let us consider a simple example where we will insert 5 rows into a table both with minimal logging and without it and examine the difference.
The table t_heap below has a row length of > 1000 bytes so the log record representing the an insert of the data row will also be > 1000 bytes so it will be easy to spot when we examie the log records
create table t_heap (c1 int, c2 int, c3 decimal (38,2), c4 char(1000))
Let us execute the following command to insert 5 rows through optimized Bulk Import
bulk insert t_heap
from 'C:\t_heap-c-small.dat' with (TABLOCK)
Now we can look at the top 5 log records by executing the following query
select 5 operation,context,[log record length]
from fn_dblog(null, null)
where allocunitname='dbo.t_heap'
order by [Log Record Length] Desc
Here are the top 5 log records returned. You will notice that all records are < 100 bytes. In other words, the individual row inserts were not logged. The log records below refer to PFS, an allocation meta-information related page that tracks free space in the allocated page.
Operation context [log record length]
--------- ------- -------------------
LOP_MODIFY_ROW LCX_PFS 92
LOP_MODIFY_ROW LCX_PFS 92
LOP_MODIFY_ROW LCX_PFS 92
LOP_MODIFY_ROW LCX_PFS 92
LOP_MODIFY_ROW LCX_PFS 92
Now let us take a look at the log records when we insert 5 rows through the normal insert code path as follows
declare @i int
select @i = 0
while (@i < 5)
begin
insert into t_heap values
(@i, @i + 5000, 10.2, replicate ('a', 60))
set @i = @i + 1
end
And here are the top 5 log records and you can see that in this case, each inserted row is fully logged.
operation context [log record length]
--------- ------- -------------------
LOP_INSERT_ROWS LCX_HEAP 1116
LOP_INSERT_ROWS LCX_HEAP 1116
LOP_INSERT_ROWS LCX_HEAP 1116
LOP_INSERT_ROWS LCX_HEAP 1116
LOP_INSERT_ROWS LCX_HEAP 1116
You can use fn_dblog() as a handy way to see if you are getting minimal logging or not.
Till now, we have discussed what does the minimal logging mean and how you can checks if you indeed are getting the minimal logging but we have not looked at the conditions that need to be met for minimal logging. The conditions for minimal logging are
· The database recovery model must be set to BULK_LOGGED or SIMPLE. Under FULL recovery model, as the name implies, all operations are fully logged.
· The target table should not have been marked for replication
· If database backup command is being run concurrently, the minimal logging is temporarily disabled but it does not stop bulk import. So you may see that part of bulk import is fully logged.
· Minimal logging depends on the indexes defined on the table and whether the table/index is empty or not as described in the table below. You will notice that TABLOCK is required to enable minimal logging.
Comments
Anonymous
February 05, 2008
The comment has been removedAnonymous
February 05, 2008
You don't get minimal logging with regular TSQL INSERT statement independent of the recovery model. However, we are looking into if we can remove this restriction for regular inserts in future releases including SQL2008Anonymous
February 05, 2008
The big news of the week is Windows Server 2008 RTM'ing. Read more here. The storage engine team has begin a series on bulk insert. It is definitely recommended reading. Part 1, Part 2, Part 3. The S ...Anonymous
February 07, 2008
Thanks for the answer. I was looking for an article talking about the differences among FULL, SIMPLE and RECOVERY model (which each one of this setting really write in the log file), but I haven't find it. Could you talk about this or blog about it?Anonymous
February 07, 2008
BOL has a good description on different recovery model. If you think something is not well covered, please send me a mail and we will update the BOL. In a nutshell, under SIMPLE recovery model, SQL Server truncates the inactive portion of the log but otherwise the logging is similar to what we have under bilk-logging recovery model.Anonymous
February 21, 2008
If you have a table with clustered index, and with existing data, say you have 30 million in the table. When you bulk import 10 million more, each one would be fully logged, correct? What's the best way to achieve minimal logging in this scenario, drop the clustered index first, then add the clustered index back after bulk import? For a large table, creating a clustered index might take a while. But is it still a better option than having a clustered index in place when importing?Anonymous
February 21, 2008
I recommend reading the following white paper that dealt with exactly these kinds of questions. This is still applicable for SQL2005. I will follow up this thread with some changes we have made for minimal logging in SQL Server 2008. http://www.microsoft.com/technet/prodtechnol/sql/2000/maintain/incbulkload.mspxAnonymous
December 09, 2009
Hello Vishal It is a very good article. Can You Elabrate on the Below.I have picked the lines from your article. "This is because during the transaction log backup, the state of the pages involved in the bulk import may have changed between the time bulk import was done and the transaction log back is initiated. "Anonymous
December 14, 2009
The pages containing the minimally logged operations are backed up as part of log backup. When these log records are restored as part of restore, we don't have the UNDO information in case the changes in these pages need to be undone.Anonymous
June 03, 2010
Nice artice..... I have tried the above scripts ...But in the following order After creating the table .....I done the Bulk Insert With( TabLock )...I can see more ...LOP_MODIFY_ROW LCX_PFS Records (am not using top 5 option.) Can you tell me why more LOP_MODIFY_ROW LCX_PFS records ?Anonymous
June 04, 2010
Aljo, PFS pages need to be changed to track allocation/deallocation. Similarly GAM pages. Key thing about minimal logging is that we don't want to log the 'data row' thanks SunilAnonymous
June 06, 2010
Sunil, Thanks for clarifying this. AljoAnonymous
October 29, 2011
it is possible that logging can be turned off temporarily to support bulk imports?Anonymous
April 14, 2015
The below statement does not appear to hold true any more: " If database backup command is being run concurrently, the minimal logging is temporarily disabled but it does not stop bulk import. So you may see that part of bulk import is fully logged." Has this behavior changed somewhere between SQL 2005 and SQL 2014?