Using Full Text Searching SQL Server 2008
I have now got SQL Server 2008 RC0 running in my Hyper-V 64-bit virtual machine, complete with the sample adventure works database from connect. Now it’s time to have a look at querying full text indexes. One of the common ways of doing this is to use the contains function as part of a where clause and combine this with a filter on the structured e.g
SELECT
PM.Name AS ModelName,
PC.Name AS Culture,
P.Name AS ProductName,
PD.Description,
P.ProductNumber
FROM
Production.ProductModel PM
INNER JOIN
Production.ProductModelProductDescriptionCulture PDC
ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
Production.ProductDescription PD
ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
Production.Product P
ON PM.ProductModelID = P.ProductModelID
INNER JOIN
Production.Culture PC
ON PDC.CultureID = PC.CultureID
WHERE
CONTAINS(PD.Description, 'Performance')
AND (P.ProductNumber LIKE N'FR%')
One thing to note about this query is that it will be much faster in SQL Server 2008 as it will yield a single execution plan:
with a table valued function for the full text match. BTW the issues I was having getting Full text to work in CTP6 have gone away and all of this just worked after install.
The contains function can support all sorts of searching operators for proximity matching, applying different weight to different search terms as well as the ability to search from a list of columns in the same table, provided they are in the same language i.e. they have been indexed with the same language. Check Books On line for more information here
The same syntax is also good if the column you’re searching is actually a document in Varbinary(MAX), and the new Filestream data type. So if I index the documents table (as I did on Thursday ) then I can run a query like this…
SELECT
FileName,
[Document]
FROM
Production.[Document]
WHERE
CONTAINS(Production.[Document].[Document], 'red')
and get back this
of course you will want to write some code to stream that document back to the user in a more readable form, but the point is that the search has got inside the document and realised it contains the word ‘red’.
If you want to search across multiple tables for a term then your query will look more like this (BTW I setup the product table for full text searching before running this):
DECLARE @SearchTerm Nvarchar(50) = N'road'
SELECT
PM.Name AS ModelName,
PC.Name AS Culture,
P.Name AS ProductName,
PD.Description,
P.ProductNumber
FROM
Production.ProductModel PM
INNER JOIN
Production.ProductModelProductDescriptionCulture PDC
ON PM.ProductModelID = PDC.ProductModelID
INNER JOIN
Production.ProductDescription PD
ON PDC.ProductDescriptionID = PD.ProductDescriptionID
INNER JOIN
Production.Product P
ON PM.ProductModelID = P.ProductModelID
INNER JOIN
Production.Culture PC
ON PDC.CultureID = PC.CultureID
WHERE
CONTAINS(PD.Description , @SearchTerm )
OR
CONTAINS((P.Name, P.ProductLine) , @SearchTerm )
I have highlighted the multiple column clause here in orange , as I included both of these columns when I setup full text indexing ion the Product table.
The query plan for this statement looks like this:
This is not such a good performance story and books on line recommends here that you don’t have multiple contains statements because of this. In this example we are stuffed as we are full text searching across multiple tables. If you need to do alot of full text searching on a large index , then design your database with full text searching in mind i.e. try and keep the number of tables that you will want to combine in a text search to a minimum. If you think about this it’s no different from using T-SQL to search lots of columns for the same thing, and so this limitation is not as much of a constraint as it initially appears.
So contains is no substitute for good database design it’s just another tool, to quote Abraham Maslow…
“If the only tool you have is a hammer, then you tend to see every problem as a nail”.
Technorati Tags: SQL Server,contains function,full text search
Comments
Anonymous
June 12, 2008
Hi, Thanks for the post. I'm currently looking at using SQL server full text indexing on a new project.. But... I need to retrieve the actual text where the search terms were found.. I need excerpts from the searched text, highlighting (google style) hit counts, ranking and so forth.. Am I missing something or does SQL server not support this.. I can't see the point of all this technology if all I can get back are rows.. Maybe I need to be looking at Lucene.NET? Thanks for any pointers..Anonymous
August 06, 2008
i am using full text search for an application for searching for specific adverts and displaying the details of the advertisers.Am i doing the right thing or i should also be looking at lucene though i am a little biased towards SQL Server.Anonymous
February 05, 2010
Hello.. I'm trying to run a full text search on PDFs but the PDF files are not showing in the item count in the catalog. My table also includes DOCs and they are showing in the count and is being indexed/searched properly but not PDFs. I'm lost and need help. I installed the Adobe PDF iFilter for 64-bit v9.0. I'm using SQL Server 2008. Why isn't PDFs being indexed? Could it be a 64-bit or a SQL Server 2008 issue? Thanks in advance..Anonymous
February 08, 2010
My only suggestion is.. Have you got the ifilters setup for the pdf format, i.e. the ifilter is installed and there is column in your table that indicates that the document typre (ifilter to use). AndrewAnonymous
March 24, 2010
Hi, I also have problems with the Adobe iFilter for 64-bits platform on a 64-bits SQL Server running on Win2008x64. Documents are stored in an image column type. I have checked the fulltext_document_types and added the bin path of the iFilter dll to the PATH system variable. Still, no PDFs are being indexed... The crawl logs report no errors. Today I uninstalled the Adobe iFilter and installed the trial version of Foxit iFilter. It indexed all the PDFs without any problem! I have googled around a lot but I can't find any possible explanation to why I'm not able to use the Adobe iFilter.Anonymous
March 30, 2010
Sivert, This is obviously down to the ifilter rather than SQL Server so a good idea to se foxit, whihc I will pass on. I will try and find out why adobe doesn't work and get back to you.