Share via


Enterprise Search SQL Query Language Information

Enterprise Search in Microsoft Office SharePoint Server 2007 provides content crawling and search features that support full-text searching. The query language used by Enterprise Search extends the standard SQL-92 and SQL-99 database query syntax to enhance its usefulness with text-based searches.

Identifiers

Identifiers specify the names of columns (sometimes referred to as properties), catalogs, and aliases. Literals, by contrast, specify string and numeric values.

You can create identifiers that are up to 128 characters in length, and in one of two types, distinguished by the characters used in the identifier name:

  • **Regular identifiers   **Contain only the characters A-Z, a-z, 0-9, and underscore, and begin with a letter. You do not need to enclose regular identifiers in double quotation marks.

  • Delimited identifiers   Can contain any valid Unicode character, and must be enclosed in double quotation marks.

Identifiers that you create as Uniform Resource Names (URNs) must contain special characters, are always delimited identifiers, and must be enclosed in double quotation marks.

Note

You can use the asterisk (*) as a special column identifier when you want to specify that Enterprise Search includes all of the indexed properties in the query. Although it is not a regular identifier, it does not require double quotation marks.

Literals

A literal is a string of characters that represents a value in a query statement. You use literals to compare column values or to specify search terms. Enterprise Search in Microsoft Office SharePoint Server 2007 supports the following types of literals.

String

String literals are not limited in length, and can contain either American National Standard Institute (ANSI) or Unicode characters. You must enclose string literals in single quotation marks. To include a single quotation mark inside a string literal, use two single quotation marks. Represent an empty string as two consecutive single quotation marks ('').

Numeric

Numeric literals represent numbers-including positive and negative integers, decimal numbers, and currency values-and can contain the digits 0-9, a period, and the letter E (or e). You can define numeric literals by using scientific notation (for example, 2.3E-05). Do not enclose a numeric literal in single quotation marks or it is interpreted as a string literal and is compared by using string comparison techniques. Currency values cannot contain currency symbols.

Hexadecimal

A hexadecimal literal represents an unsigned integer specified in hexadecimal notation, and can contain the digits 0-9 and the letters A-F and a-f. Hexadecimal literals must begin with 0x.

Note

The SQL-92 standard requires that hexadecimal literals be enclosed in single quotation marks, however, Enterprise Search does not support that notation.

Boolean

Boolean literals represent logical values, and can be either TRUE or FALSE. Do not enclose a Boolean literal in single quotation marks, or it is interpreted as a string literal.

Date

Date literals represent specific dates, time stamps, or relative times, and are enclosed in single quotation marks. You must put dates in the form year/month/day or year-month-day, where the month, day, and year are numbers. Specify the year with a four-digit value, for example, 2004. Time values must be in the form hours:minutes:seconds. Relative time syntax is based on the DATEADD Function in Enterprise Search SQL Syntax.

Case Sensitivity

Enterprise Search in Microsoft Office SharePoint Server 2007 queries are not case-sensitive. This includes query keywords (SELECT is identical to Select, select, and sELect) and search terms. If an identifier or literal uses Unicode characters that have case-mapping semantics, then the search engine considers that all the cases are equivalent.

Accent Insensitivity

Enterprise Search queries are not sensitive to case, nor are they sensitive to accents when using either the FREETEXT predicate or CONTAINS predicate. A search for the term "resume" also returns documents that contain "résumé". If an identifier or literal contains accented Unicode characters, the search query also matches entries that contain Unicode combined characters that use the base character of the accented character in the identifier or literal.

Note

This is not the case for the character ranges 0x2e81-f8ff and 0x1100-0x11ff.

Understanding Relevance Values

When used in a relational database, rows that are returned by a search query must meet all the conditions called for by the query. In contrast, an Enterprise Search query can return documents that meet the search conditions to varying degrees.

For example, a search for the term "program" in a relational database produces records that contain that specific spelling of the word. Whether a record contains one or one hundred instances of the word has no impact on the results. In contrast, Enterprise Search returns a relevance value associated with the matching documents. The relevance of documents having "program" in the title is higher than those that contain the word only in the last paragraph. Similarly, documents containing variations of the search term, for example "programs" and "programming", also match and are returned by the query.

Enterprise Search queries return integer relevance values in the column named "rank".

In addition:

  • Rank values returned by the query are integers ranging from 0 to 1000.

  • Higher rank values indicate documents that match the search conditions better than others.

  • Rank values apply only to the current query, so they cannot be compared for results across queries.

  • Rank values are relative to the other documents matching the query. Therefore, the rank value of a particular document depends on the other documents that also match the query.

  • The rank value for items matching a purely relational predicate is 1000.

Using Localized Searches

Enterprise Search in Microsoft Office SharePoint Server 2007 supports full Unicode searching. You can index documents that use Unicode, and create search queries that use Unicode identifiers and literals. However, the language setting of an application or the operating system can have far-reaching effects on how Enterprise Search works.

Document and System Locale Settings

When the operating system—or even an application—is set to use a particular language and locale, many settings are affected. These settings include numeric format, date format, currency format, uppercase and lowercase mapping, dictionary sort ordering, tokenization, and others. Although these settings help Enterprise Search provide excellent localized support, unexpected results can occur when documents from one locale are searched by using a system set to another locale.

For example, the list of noise words (words discarded during indexing and from queries because they give no meaning or context) in each language is very different. In German, the word die is equivalent to the English word the. If you index a German document and then search for "die" by using an English query system, documents might be returned, even though the word should be ignored. A German system that is given the same query would return an error stating that the search query contained only noise words.

When the IFilter object processes a document's text properties and content, it reports the language of that document to the content indexer. By using this information, Enterprise Search can apply the appropriate word breaker and noise words list.

Specifying Languages

You can specify the language used in search queries. Both the FREETEXT and CONTAINS predicates for the WHERE clause support specifying a language. You can indicate the query language by providing a numeric locale identifier (LCID) in the CONTAINS or FREETEXT predicate, by using the following syntax:

CONTAINS | FREETEXT

([<column_identifier>,]'<content_search_condition>' [,LCID])

For more information, see the syntax for the CONTAINS and FREETEXT predicates.

In This Section

SQL Extensions in Enterprise Search

SQL Features Unavailable in Enterprise Search

SQL Syntax Changes from SharePoint Portal Server 2003

See Also

Reference

Enterprise Search Keyword Syntax Reference

Concepts

Enterprise Search SQL Syntax Reference
Enterprise Search URL Syntax Reference
Enterprise Search Query Object Model Overview
Enterprise Search Query Web Service Overview