Store and Index Documents in SQL Server 2012 - An End-to-End Walkthrough
SQL Server 2012 introduces two new features that make it both easier and more valuable to store your documents and files in the database:
- FileTables let you save files into the database quickly and conveniently by dragging and dropping them in Windows Explorer, or saving them from your everyday Windows applications. FileTables build on top of the FILESTREAM feature that was introduced in a previous version of SQL Server.
- Semantic Search helps you get more value out of your documents by tagging documents with the most important key phrases that they contain, and by identifying similar and related documents automatically. Semantic Search builds on top of the Full-Text Search feature that was introduced in a previous version of SQL Server.
This article captures a timed walkthrough of the steps required to configure a SQL Server 2012 document storage solution from scratch. It includes a screen shot of every step.
What we'll achieve in this walkthrough
- Configure an instance of SQL Server 2012 from scratch for document storage.
- Create and configure a new database and a new table to store documents.
- Copy 137 files and folders totaling 105 MB into the new table.
- Enable indexing of the stored documents.
- Demonstrate the success of our solution by running sample Full-Text and Semantic Search queries over the indexed documents.
We'll also see how quickly this solution can be completed.
Steps
- Configure document storage on the server.
- At the instance level: Enable and configure FILESTREAM; specify a folder.
- At the database level: Enable non-transactional access; specify a folder.
- At the table level: Create a new FileTable. This looks like a folder to users.
- Populate the table with documents by drag-and-drop.
- Index the documents.
- Create a new Full-Text and Semantic index.
- Check the status of indexing and wait for indexing to finish.
- Demonstrate the success of the indexing.
- Query the Full-Text index.
- Query the Semantic Search index.
Before I started the timer...
I installed this software on the computer:
- Installed SQL Server 2012.
- Installed, attached, and registered the Semantic Language Statistics database (an additional prerequisite for Semantic Search).
- Installed the Filter Pack for indexing Office 2010 documents and enabled the new filters in SQL Server.
I prepared these queries in SQL Server Management Studio:
- A query to show that the documents are stored in the database.
- A query to check the status of document indexing.
- A sample Full-Text Search query.
- A sample Semantic Search query.
I opened these windows on the desktop:
- SQL Server Configuration Manager.
- SQL Server Management Studio, with an open tab for each of the queries written in advance.
- Windows Explorer, with the source folder of the documents to be copied into the database.
Step 1. Configure document storage in SQL Server
Step 1.1. Configure document storage at the instance level
In SQL Server Configuration Manager, enable FILESTREAM on the instance and specify a folder name for the instance-level share. By default, the share has the same name as the instance.
In SSMS, check the corresponding instance-level server properties.
Let's open the instance-level share in Windows Explorer to check it.
Step 1.2. Configure document storage at the database level
Create a new database for this demo. We'll name it EndToEndFileSearch.
Enable non-transactional access to support FileTables in this database, and specify a folder name for the database-level share. By default, the share has the same name as the database.
Add a FILESTREAM filegroup to the new database. We'll call it EndToEndFileSearch.
Add a file to the new FILESTREAM filegroup. We'll call it EndToEndFileSearch_FS.
Let's open the database-level share in Windows Explorer to check it.
Step 1.3. Configure document storage at the table level
Create a new FileTable. By default, the folder name for the table-level share is the same as the name of the table itself. A FileTable has a fixed schema, so you don't have to specify a list of columns.
This action scripts a CREATE TABLE statement template to a new query window.
Fill in parameter values for the template, and run the script to create the new FileTable.
Here's the newly-created FileTable in SSMS Object Explorer.
Let's open the table-level share in Windows Explorer to check it.
Step 2. Populate the table with documents
In Windows Explorer, let's drag and drop 137 files and folders totaling 105 MB into the new FileTable. (You cannot convert an existing file system folder to a FileTable.) This corpus of documents includes all white papers published to the MSDN Library by the SQL Server team.
Let's query the FileTable to confirm that the documents are now stored in SQL Server.
Step 3. Index the documents
Step 3.1. Create a new Full-Text and Semantic index
Launch the Full-Text Indexing Wizard for the new FileTable.
Select the system-defined unique index on the FileTable's rowguidcol as the unique index to be used by the full-text index. This selection corresponds to the KEY INDEX clause in the Transact-SQL CREATE FULLTEXT INDEX statement.
Select the file_stream column that contains the documents as the column to be indexed. Specify the file_type column as the type column, and check the box under Statistical Semantics to enable Semantic Search.
Enable automatic change-tracking.
Create a new full-text catalog for the new database and set it as the default catalog. This catalog will be used for the new full-text index. (The catalog is only a logical container.)
Finish the wizard and finish creating the full-text and semantic indexes.
Step 3.2. Check the status of indexing and wait for indexing to finish
Query two dynamic management views to check the status of full-text and semantic indexing.
Indexing is finished! (When you specify automatic population, the status of the indexing remains at "Starting" while waiting to process more updates.)
Step 4. Demonstrate the success of the indexing
First, let's query the full-text index for a list of white papers that mention "SSIS" and sort them in descending order by rank.
Now, let's query the semantic index for a list of white papers in which "ETL" is a key phrase and sort them in descending order by score.
How long did all this take?
Four minutes and twenty seconds.
What have we achieved in this walkthrough?
- Configured an instance of SQL Server 2012 from scratch for document storage.
- Created and configured a new database and a new table to store documents.
- Copied 137 files and folders totaling 105 MB into the new table.
- Enabled indexing of the stored documents.
- Demonstrated the success of our solution by running sample Full-Text and Semantic Search queries over the indexed documents.
** How long did all this take? Less than 5 minutes!**
For more information
For more information about these new features in SQL Server 2012, see:
- FileTables in SQL Server 2012 Books Online
- Semantic Search in SQL Server 2012 Books Online
Also, see this blog post: