다음을 통해 공유


SQL Server Database Index an Overview

Introduction

"A database index is a data structure that improves the speed of data retrieval operations on a database table at the
cost of additional  writes and the use of more storage space to maintain the extra copy of data. Indexes are used to quickly
locate data without having to search every row in a database table every time a database table is accessed. Indexes can
be created using one or more columns of a  database table, providing the basis for both rapid random lookups and efficient
access of ordered records."* Wikipedia

Index, the lighter side of life.

If you are reading a book. There are two sections that you can use to locate a topic. The table of contents (toc),
as we know it here in technet wiki, which is located at the front of the book and the index which is on the back
of the book.


Figure 1. Table of Contents at the front of the Book


Figure 2. Index at the back of the book

These two archaic technology for locating a particular topics in a book corresponds
to index. The table of contents corresponds to the clustered index and the index on figure 2
corresponds to the non clustered index.

Types of Index

A. Clustered Index

A clustered index determines the physical order of data in a table. A clustered index is analogous to a telephone directory,
which arranges data by last name. Because the clustered index dictates the physical storage order of the data in the table,
a table can contain only one clustered index. However, the index can comprise multiple columns (a composite index), like
the way a telephone directory is organized by last name and first name.

B. Non Clustered Index

Non clustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index
 key values and each key value entry has a pointer to the data row that contains the key value. The pointer from an index
 row in a non clustered index to a data row is called a row locator.

The structure of the row locator depends on whether the  data pages are stored in a heap or a clustered table. For a heap,
 a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.

C. Columnstore index.

Columnstore index is a new type of index introduced in SQl server 2012. It is an in-memory,  non-clustered, read-only,
 X-velocity index that you can use for large "star", "fact"  tables.   It is not intended to be used on small all dimension.
Columnstore  are design to do a very fast aggregation. It is design using  Vertifaq, the same technology used
by power pivot, which make it the first index that does is not stored or make use of the B-tree technology.

Columnstore consumes a considerable amount of space but SQL Server 2014 is headed to use compression among other
improvements.

For more design consideration for columnstore index please consult TechNet's documentation.

Covering Index

A covering index is an index that contains all the columns you need for your query.

For example if you have a select statement like

Select column1, colum2, column 3  from table 1 where column4=5

If you have  to build a non clustered for example that needs to cover that query you will have a single index with multiple
columns that includes column1, column2, column3 and column4. The order of the columns and the sorting of the index 
columns should be the same as that of the query

Since covering index is expected to include many column it is more often a non-clustered index. When covering index is
fully utilized there's no need to look up using the clustered index therefore returning the data much faster.

Spoiler.

So, if covering index is faster and its non clustered, can we just add a covering index on all the queries?

Index consumes space and it slows down insert. Indexing a database requires mastery of the index technology.
Reusability and compacting the index should be at the back of your mind. Also you don't want to do covering
index for queries you don't usually use. Based on Dan's suggestion index the queries you use very frequently
and less for others. Link to forum.

Included Columns

Harddisk is a cheap resource nowadays and you can easily buy and plug-in a 1 TB USB drive, But that is not the
case for SAN drives. And in the  case of enterprise application like CIO suite where we have more that 500
Line of business (LOB) application each of which having its own dev and staging environment, harddisk is
a scant resource. Although azure will be able to provide us with a pluggable storage its is still best practice to
conserve storage.

If your in the database business indexing strategy will provide you a better way not only to conserve your
Harddisk space but also your CPU utilization. Included columns were introduced in SQl Server 2005 with that in
mind. Prior to included columns to do a covering index you will need to make sure that all the columns are included
in the index. This in turn consume a lot of space.

To effectively use included columns, use only the key columns for the index and include the columns that
are most likely not going to be used the database engine such as widekeys. For more discussions on
included columns please refer to this TechNet link.

Non-clustered index can only allow you up to 16 columns but that's not going to be your reason to start using
included columns. Included columns are only for non-clustered index.

Filtered index.

The criteria for a good index design are as follows: optimized query plan, minimum index storage and maintainability.
Which bring forth another improvement to SQL Server 2005. This is known as the filtered index.

For example you are maintaining a database of a university with more than 35,0000 students. And when  you are dealing
with students you would be most likely to be interested only on enrolled student. Instead of indexing the entire
student table you could probably just index the enrolled students. The way filtered index work is very same as the where
clause without the where.

For more design consideration please see the following link

Indexed views

Indexed views or materialized views has been around since SQL Server 2000. It is basically a view and you put an index
on it.  In the example above, we are using the enrollment system and true enough most of the times the queries' point
of origin will be most likely from the student table. What if you applied all your index and aligned on that direction?
What if all of a sudden you were ask to designed a query that originates from the subjects going back to the students.
Let's say for example you were asked to print a grading sheet per subject with students arranged based on last name
or their ranking based on their average grades.

This type of query will involved multiple table tables and the best way to optimize it is to use indexed view. Indexed view
however has a tradeoff with insert. It will slow down the insert plus materializing the view will definitely consume
storage space. However for returning the query indexed view is extremely fast.

For more information on indexed view please see design guideline in TechNet

See Also