Behavior Changes to Full-Text Search
This topic describes behavior changes in full-text search. Behavior changes affect how features work or interact in SQL Server 2012 as compared to earlier versions of SQL Server.
Behavior Changes in Full-Text Search in SQL Server 2012
SQL Server 2012 installs a new version of the word breakers and stemmers for US English (LCID 1033) and UK English (LCID 2057). However you can switch to the previous version of these components if you want to retain the previous behavior. For more information, see Change the Word Breaker Used for US English and UK English.
New Word Breakers and Stemmers Installed
SQL Server 2012 updates all the word breakers and stemmers used by Full-Text Search and Semantic Search. For consistency between the contents of indexes and the results of queries, we recommend that you repopulate existing full-text indexes.
There are new word breakers for English. If you have to retain the previous behavior, see Change the Word Breaker Used for US English and UK English.
The third-party word breakers for Danish, Polish, and Turkish that were included with previous releases of SQL Server have been replaced with Microsoft components. The new components are enabled by default.
There are new word breakers for Czech and Greek. Previous releases of SQL Server Full-Text Search did not include support for these two languages.
Behavior Changes of New Word Breakers and Stemmers
The new components might return different results than the older components when you populate and query full-text indexes. The following tables demonstrate some of the differences that can be expected in English results.
If you have to retain the previous behavior of the word breakers and stemmers, see the following topics:
In some cases, the new components return more results:
Term |
Results with previous word breaker and stemmer |
Results with new word breaker and stemmer |
---|---|---|
cat-dog |
cat dog |
cat cat-dog dog |
cat@dog.com |
cat com dog |
cat cat@dog.com com dog |
12/11/2011 (where the term is a date) |
12/11/2011 dd20111211 |
11 12 12/11/2011 2011 dd20111211 |
In some cases, the new components return similar results:
Term |
Results with previous word breaker and stemmer |
Results with new word breaker and stemmer |
---|---|---|
100$ |
100$ nn100$ |
100$ nn100usd |
022 |
022 nn022 |
022 nn22 |
10:49AM (where the term is a time) |
10:49am tt1049 |
10:49am tt24104900 |
In some cases the new components return fewer results or results that may be unexpected by applications:
Term |
Results with previous word breaker and stemmer |
Results with new word breaker and stemmer |
---|---|---|
jěˊÿqℭžl (where the terms are not valid English characters) |
‘jěˊÿqℭžl’ |
je yq zl |
table's |
table’s table |
table’s |
cat- |
cat cat- |
cat |
v-z(where v and z are noise words) |
(no results) |
v-z |
$100 000 USD |
$100 000 nn000 nn100$ usd |
$100 000 usd nn100000usd |
beautiful U.S land |
beautiful land u.s us |
beautiful land |
Mt. Kent and Mt Challenger |
challenger kent mt mt. |
mt kent challenger |
Behavior Changes in Full-Text Search in SQL Server 2008
In SQL Server 2008 and later versions, the Full-Text Engine is integrated as a database service into the relational database as part of the server query and storage engine infrastructure. The new full-text search architecture achieves the following goals:
Integrated storage and management—Full-text search is now integrated directly with the inherent storage and management features of SQL Server, and the MSFTESQL service no longer exists.
Full-text indexes are stored inside the database filegroups, rather than in the file system. Administrative operations on a database, such as creating a backup, automatically affect its full-text indexes.
A full-text catalog is now a virtual object that does not belong to any filegroup; it is a logical concept that refers to a group of full-text indexes. Therefore, many catalog-management features have been deprecated, and deprecation has created breaking changes for some features. For more information, see Deprecated Database Engine Features in SQL Server 2012 and Breaking Changes to Full-Text Search.
Note
SQL Server 2005 Transact-SQL DDL statements that specify full-text catalogs work correctly.
Integrated query processing—The new full-text search query processor is part of the Database Engine and is fully integrated with the SQL Server Query processor. This means that, the query optimizer recognizes full-text query predicates and automatically executes them as efficiently as possible.
Enhanced administration and troubleshooting—Integrated full-text search provides tools to help you analyze search structures such as the full-text index, the output of a given word breaker, stopword configuration, and so forth.
Stopwords and stoplists have replaced noise words and noise-word files. A stoplist is a database object that facilitates manageability tasks for stopwords and improves the integrity between different server instances and environments. For more information, see Configure and Manage Stopwords and Stoplists for Full-Text Search.
SQL Server 2008 and later versions include new word breakers for many of the languages that exist in SQL Server 2005. Only the word breakers for English, Korean, Thai, and Chinese (all forms) remain the same. For other languages, if a full-text catalog was imported when a SQL Server 2005 database was upgraded to SQL Server 2008 or a later version, one or more languages used by the full-text indexes in full-text catalog might now be associated with new word breakers that might behave slightly differently from the imported word breakers. For more information about how to ensure consistency between queries and the full-text index content, see Upgrade Full-Text Search from SQL Server 2005.
A new FDHOST Launcher (MSSQLFDLauncher) service has been added. For more information, see Get Started with Full-Text Search.
Full-text indexing works with a FILESTREAM column in the same way that it does with a varbinary(max) column. The FILESTREAM table must have a column that contains the file name extension for each FILESTREAM BLOB. For more information, see Query with Full-Text Search,Configure and Manage Filters for Search, and sys.fulltext_document_types (Transact-SQL).
The full-text engine indexes the contents of the FILESTREAM BLOBs. Indexing files such as images might not be useful. When a FILESTREAM BLOB is updated it is reindexed.
See Also
Concepts
Upgrade Full-Text Search from SQL Server 2005
Get Started with Full-Text Search