Fragmentation (part 4): what are heaps?
Ok - really catching up with the various blog post series I started back in June/July - time to bang out the next few posts in the series on index fragmentation. Remember I'm starting from first principles and covering
- What are records?
- What are pages?
- What are extents?
- What is a heap?
- What is a clustered index?
- What is a non-clustered index?
before getting into the details of fragmentation.
So what is a heap?
- A heap is the simplest storage arrangement and is an unordered table (or you could think of it as a collection of unordered pages).
- This means that rows will be inserted into the heap anywhere where there is space (don't let this statement confuse you - for a table that has nothing but inserts, records will always be appended to the last allocated page).
- A heap consists entirely of data pages - as there is no b-tree, there are no index pages.
- As a heap is unordered, the data pages are not linked in any way. (Ah - there are some exceptions. In SQL Server 2000 and 2005 (and maybe in 7.0 - I don't remember), the sysfiles1 table is a linked-heap. This table contains the locations of the files comprising the database and so for safety's sake we want the pages to be linked, but not with the complexity of having a b-tree because we read the pages directly at startup time. In SQL Server 2005, some other system tables are also linked-heaps).
- If you want to do a singleton lookup from a heap, the whole heap has to be scanned to find the row(s) matching the search predicate.
- Doing a select (*) of the contents of a heap will not guarantee to return the rows in the order they were inserted (as it did in 6.5 when all heaps were linked as sysfiles1 is and the space from singleton deletes was not reclaimed). This is because the pages are accessed in allocation order.
- Non-clustered index records contain the physical RID (record ID or record locator) of the corresponding heap record.
Heaps have one interesting feature - forwarded records. If a record needs to be updated, the updated record size is greater than the current record size, and there is no space on the page to fit the new record in then we have two choices:
- move the record to a new page and change all the non-clustered index records that point to it to point to the new location of the record
- move the record to a new page and leave a forwarding record in the original location to point to the new location
Guess which approach we took? Right, #2. Approach #1 has enormous performance implications because of the extra IOs and logging involved. The forwarding record has the physical location of the new record and the forwarded record has a back-pointer to the forwarding record (so that if its deleted, the forwarding record can be deleted too).
This is one drawback of using heaps - all the extra space that's "wasted" with the forwarding/forwarded records. Another drawback is that when scanning through the heap, forwarding records have to followed immediately (as opposed to ignoring them and just reading the forwarded records when they're encountered). This is to vastly reduce the possiblity of read anomalies (such as non-repeatable reads or missed rows if a row moves before the scan point during a scan).
And that's it. Next up is clustered indexes...
Comments
Anonymous
November 03, 2006
Greg Linwood, a fellow SQL Server MVP, has started a series of articles in which he attempts to prove...Anonymous
November 21, 2006
The comment has been removedAnonymous
December 03, 2006
Hi Gent, The KB article really only applies when there is free space in the pages that comprise the heap. For a heap that is append-only, with no deletes, the free space will always be in the last allocated page - so there's nothing to optimize - the only place to insert the record is in the last allocated page. The same applies to the insert/select example you give. ThanksAnonymous
December 05, 2006
The comment has been removedAnonymous
December 07, 2006
Yes, when the records are randomly sized then it will make use of any free space in earlier pages.Anonymous
January 07, 2007
Greg Linwood, a fellow SQL Server MVP, has started a series of articles in which he attempts to proveAnonymous
July 05, 2007
Hi, Paul Very nice blog on those fragmentation series..but where is you next one on the clustered index and non-clustered index? My another question, if all the rows from a page are deleted, that page gets deallocated and reclaimed by the database and if all pages are empty due to range deleteion in an extent, the extent will get deallocated and reclaimed by the database. Is that correct statement? Is that the same for the heap? I dont think so as "empty" page(due to deletion) do not reclaimed and deallocated for the heap. Can you explain in that area in respect to the when a range deletion occurs on the page, what happens to the page? You can ping me at billkan@microsoft.com Thank you, BillAnonymous
July 05, 2007
Hi Bill, I haven't done them yet - I'll probably redo most of the series at the end of the summer. When pages and extents are deallocated is complicated, and the subject of a whole blog post. I'll get to it... ThanksAnonymous
October 26, 2007
MVP Hugo Kornelius once reported that he encountered a situation in which it was possible to performAnonymous
June 10, 2015
Adding some piece of Information that could be useful for someone else DELETE (Transact-SQL) msdn.microsoft.com/.../ms189835(v=sql.105).aspx Deleting Rows from a Heap When rows are deleted from a heap the Database Engine may use row or page locking for the operation. ----> As a result, the pages made empty by the delete operation remain allocated to the heap. ----> When empty pages are not deallocated, the associated space cannot be reused by other objects in the database. To delete rows in a heap and deallocate pages, use one of the following methods. • Specify the TABLOCK hint in the DELETE statement. Using the TABLOCK hint causes the delete operation to take an exclusive lock on the table instead of a row or page lock. This allows the pages to be deallocated. For more information about the TABLOCK hint, see Table Hints (Transact-SQL). • Use TRUNCATE TABLE if all rows are to be deleted from the table. • Create a clustered index on the heap before deleting the rows. You can drop the clustered index after the rows are deleted. This method is more time consuming than the previous methods and uses more temporary resources. For more information about locking, see Locking in the Database Engine.