Condividi tramite


Interesting observation with table > 1TB

Interesting observation with table > 1TB

We have a project using SQL 2005 with a table that is several terabytes. This table has a varbinary(max) column with an average length of 150k bytes.

We needed a second copy of the table on a smaller test server to do some other tests and decided that SELECT INTO was our best option because it is minimally logged. The smaller test server is 4CPU with 8GB RAM. Since we had all the SQL products installed on the test server, there was only 4GB allocated to SQL at first. There was plenty of disk space available.

So we ran the SELECT INTO from the test server via a linked server connection to pull the data across. It failed with error 802=Insufficient Memory in the Buffer Pool. After some investigation it was discovered that this is due to the extent allocations needing an entry in the lock manager, which uses space in the buffer pool. Regardless of the hints or isolation level you set, extent locks are still acquired but only held if you have a text field (includes image and varchar(max)) that spans to another extent. In other words, if you don’t have a text field or all the data fits in the same extent, then the extent locks are released immediately.

In my case, with 150k average text size all rows required new extent allocations. To calculate how many lock entries you will need for this scenario:

Total data size / 64k * 128.

Data writes are done in 64k chunks and each lock entry takes 128 bytes. So a 1 TB table would need 2GB in buffer pool space to hold the locks for a SELECT INTO. The lock manager can have at most 60% of the total memory size available to SQL Server, so you would need a minimum of 4GB in the buffer pool. Keeping some room for the OS and other memory used by SQL, a machine with 8GB RAM would barely be enough.

You might think that the lock escalations that were so well explained by Sunil Agarwal in this blog https://blogs.msdn.com/sqlserverstorageengine/archive/2006/05/17/Lock-escalation.aspx would apply. But it doesn’t because the locks in my situation are not normal table or page locks. They are the locks acquired when allocating a new extent. It will hold them until a commit occurs. And a SELECT INTO has no control over batch or commit size, it happens all as one transaction. In addition, the NOLOCK hint doesn’t do anything to the destination table, only the source table.

As a side note, the data writes are done in 64k chunks with SELECT INTO. And it doesn’t matter if the text data is in row or out of row. Out of row is where the row just contains a pointer the first page of text, which points to the next and so on. Since this situation had a varchar(max) column with an average length of 150k bytes and SQL uses 8k pages, we had a lot of pages chained together for one row. Fortunately in the source database, the read-ahead manager kicks in and puts the pages in memory before they are needed. You can monitor this by starting Perfmon and watching the counter SQL Server Buffer Manager : Read Ahead Pages / sec.

I had an interesting idea that maybe partitioning the table might break it into smaller chunks, but there is no way to create a partitioned table using a SELECT INTO command. The table cannot exist before you start the command and there is no syntax allowing the resulting table to end up being partitioned. It would make no difference if the source table was partitioned because the memory is consumed by locks that track the new extents getting allocated while the data is written to the new table.

We could have done several other solutions, like SELECT INTO several tables, each with a single partition, then SWITCH the partitions into a final table. Or we could have used SSIS bulk task and the result would have been the same if we had tried to do this all in one batch. Making the batch size smaller in SSIS would cause more frequent commits and it would have released the memory used by the locks after each commit.

But we chose to turn off all other services and increase SQL’s max memory to 7.5GB and run it again. And it worked. But a larger table would have failed.

One last comment on a special situation, and that is when you have TEXT_IN_ROW option turned on anticipating that most of your text will fit in an 8k page. The allocation locks for a row contained in a single page will be released immediately. The allocation locks for all rows that span more than one page will also be released immediately unless it crosses an extent boundary. Any multi-page inserts that cross an extent boundary will need a new allocation created and it’s lock will be held until the end of the transaction.

SQL 2008:

This locking behavior does not change in SQL 2008. Text/image/varchar(max) fields that cross extent boundaries will cause the extent locks to be held until the transaction is complete. However there are more solutions in SQL 2008. Under the right circumstances you can get a large insert command to be minimally logged, which was our original goal for using the SELECT INTO.

In SQL 2008 any insert can be minimally logged if you have the database in bulk-logged or simple recovery mode. The table must not be replicated and you must use the TABLOCK command. So the solution for my task would be to first INSERT newtable WITH (TABLOCK) SELECT FROM sourcetable WHERE (range 1 goes here). Then repeat the INSERT with different ranges until I am done. This would achieve minimal logging and not hold as many extent locks since they would be released after each batch. NOTE: The new WITH (TABLOCK) syntax does not work in CTP6 but should be completed by the final release.

Thanks to Srikumar Rangarajan from the SQL Product Team for helping me dive deeper into this subject and to Lubor Kollar for showing me the new WITH (TABLOCK) syntax in SQL 2008.

Kevin Cox

Comments

  • Anonymous
    April 08, 2008
    PingBack from http://www.travel-hilarity.com/airline_travel/?p=2541

  • Anonymous
    April 13, 2008
    It might be interesting to try this with filestream. Since the data is out of the database you should end up with significantly fewer allocations. That said, 'select into' might still mean spooling to temp but still, you should see a significant difference in behavior. (Whether positive or not remains to be seen ;)).