Share via


How to Partition an Existing Table in MS SQL Server

Imagine you have a table with millions of records. It would be really  difficult to manage and query such a huge amount of data. Partitioning is a database process, introduced in SQL Server 2005, where these tables and indexes are divided into smaller parts or technically a single table is spread over multiple partitions so that the ETL/DML  queries against these tables finishes quickly. 
See benefits of partitioning in below link
http://technet.microsoft.com/en-us/library/ms190787.aspx

Before SQL Server 2012  number of partitions was limited to 1,000. Starting from SQL Server 2012 it was lifted up to 15,000 by default. Partitioning can be achieved in two different ways:

  1. With an existing table.
  2. With a new table.

In our example, we are going to partition a table, that is already existing in our database. Since we don't have huge amounts of data available for this example, we are considering the Person table of AdventureWorks2008 database. Partitioning can be achieved by creating the following

  1. Partition Range Function
  2. Partition Schemes
  3. Dropping and recreating the Clustered Index on the table to be partitioned.

Let us have a closer look of these steps in a bit.
**
Step 1:** To create the partitioning in a table, let us consider a table named "Person" with all information like Firstname, Lastname and other related data with a Primary Key column called BusinessEntityID (which is an identity column). We have a total of 19972 records in this table. I am going to partition on the BusinessEntityID column of this table.
**
Step 2:** By default, SQL Server allocates one partition when the table is created. To view the current partition information on the table let us make use of the DMV's in SQL Server.

SELECT o.name AS ObjectName,i.name AS IndexName, partition_id, partition_number, [rows] 
FROM sys.partitions p 
INNER JOIN sys.objects o ON o.object_id=p.object_id 
INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id 
WHERE o.name = 'Person' 

Once the above query is executed it should provide you the information about the indexes and partitions in your table as shown below. http://sqlsaga.com/wp-content/uploads/2014/01/p11.jpg
**
Step 3:** Creating a Partition Range Function. By creating a Partition Range Function, we specify the range of values that are to be stored in different partitions. This is where we define which partition should have which data. A Partition Range Function can be created as shown below.

CREATE PARTITION FUNCTION [myRange] (INT) 
AS RANGE RIGHT FOR VALUES (5000, 15000,50000) 
GO 

We can specify two types of Ranges while creating the partition function.

  1. RIGHT
  2. LEFT

In our example we are considering RANGE RIGHT, which means if BusinessEntityID falls less than 5000 move that to partition1, which falls between 5000-14999 falls in partition2 and 15000-49999 falls into partition3 and the rest if any present will go to partition4. With RANGE LEFT, if BusinessEntityID is less than or equal to 5000 it will be moved to partition1, 5001-15000 will be considered as partition2 and 15001-50000 will be partition3 and the rest will be partition4. Step 4: Creating the Partition Scheme. The Partitioning Function we created just now is to be used as the Partition Scheme. Use the following code to create a partitioning Scheme.

CREATE PARTITION SCHEME PartitionScheme  
AS PARTITION myRange ALL TO ([PRIMARY])  
GO 

Step 5: At this point if you are not sure what was the Range you have specified for partitioning, you can use a simple query as below to find that.

SELECT ps.name AS [Partition Scheme Name],pf.name AS [Partition Function Name],boundary_id,value 
FROM sys.partition_schemes ps 
INNER JOIN sys.partition_functions pf ON pf.function_id=ps.function_id 
INNER JOIN sys.partition_range_values prf ON pf.function_id=prf.function_id 
GO 

Step 6: DROP AND RECREATE THE CLUSTERED INDEX. This is a very important part of our scenario. If you are working with a brand new table to partition, it is a highly recommended step. If you are partitioning an already existing table, it is a required step. In our example, since we are considering the same column as a CLUSTERED KEY column so if we just drop and re create it, it is fine. If we consider a different column to be a partitioned, then remember you have to drop the CLUSTERED INDEX and make it the new column as the CLUSTERED INDEX Column. You can restore the other index as a NON CLUSTERED INDEX. In our case since we have considered an already existing table, I will drop the Clustered index and re create it one more time. For doing that, I use the below query.

IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'Person') AND name = N'PK_Person_BusinessEntityID') 
ALTER TABLE Person DROP CONSTRAINT [PK_Person_BusinessEntityID] 
GO 
  
ALTER TABLE Person ADD CONSTRAINT [PK_Person_BusinessEntityID] PRIMARY KEY CLUSTERED  
( 
    [BusinessEntityID] ASC 
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) 
ON PartitionScheme([BusinessEntityID]) 
GO 

At the end of the above SQL statement, we wrote a reference to the Partition Scheme we have created and make partitions accordingly using that Scheme for the column BusinessEntityID.
**
Step 7:** Time to see the results(partitions). After following all the steps which means we have successfully created partitions to our table, its time to verify if the partitions were really created or not. We will use the same query we used earlier to find the partition information.

SELECT o.name AS ObjectName,i.name AS IndexName, partition_id, partition_number, [rows] FROM sys.partitions p INNER JOIN sys.objects o ON o.object_id=p.object_id INNER JOIN sys.indexes i ON i.object_id=p.object_id and p.index_id=i.index_id WHERE o.name = 'Person' 

When executed in SSMS, it should return data as shown below. http://sqlsaga.com/wp-content/uploads/2014/01/p21.jpg

Remember one thing, there will be no change in the way you insert the data or you query the data, but partitioning optimizes the data retrieval when you use the column you have partitioned on. I found a great article while reading about SQL Server Database Partitioning myths and truths.