Manage stop word files (SharePoint Server 2010)
Aplica-se a: SharePoint Server 2010
Tópico modificado em: 2015-03-09
A stop word, or noise word, is a word that the search system ignores in end-user search queries. A word might be designated as a stop word because it occurs in the language so frequently that it is unlikely to be helpful for identifying or narrowing search results. Articles such as “an” and “the” are typically specified as stop words for English, for example. If a user types the English query “the highest mountain”, “the” is removed from the query if it is a stop word, so that the query becomes “highest mountain”. Potentially offensive words are also sometimes specified as stop words.
In this article:
Understanding stop word files
Edit a stop word file
Stop word files by language
Understanding stop word files
The stop words for a given language are listed in the stop word file for that language. The Microsoft SharePoint Server 2010 installation program automatically installs one stop word file for each language that the product supports. Following installation, many of the stop word files contain some typical stop words for the associated language. For example, by default the U.S. English stop word file (noiseenu.txt) contains the words a, and, is, in, it, of, the, to. At any time after product installation, the search administrator can add or remove words in a stop word file to improve relevance of search results or to meet organization standards. For information about adding or removing words in a stop word file, see Edit a stop word file later in this article. For information about supported languages, see Stop word files by language later in this article.
At query time, the word breaker for the language of the query identifies individual words in the search query by determining word boundaries based on the lexical rules of the language. The word breaker then removes any words from the query that are listed in the stop word file.
By default, the stop word files for all supported languages are installed at %ProgramFiles%\Microsoft Office Servers\14.0\Data\Office Server\Config. When a farm administrator creates a Search service application, the search system automatically copies the stop word files from the installation location (including any stop word files there that a search administrator has edited) to %ProgramFiles%\Microsoft Office Servers\14.0\Data\Applications\GUID-query-n\Config, where GUID is the GUID of the new Search service application and query-n is the query component that is created when the index component is built. The search system performs the same operation on every query server that is running the new Search service application. In this way, there is a copy of each stop word file on each query server that is running that Search service application.
Observação
It is not a good practice to directly edit such a copy of a stop word file, because if you change the search topology, or if you create a mirror of the query component, the copy of the stop word file will automatically be overwritten with the stop word file from the installation location.
Edit a stop word file
If you edit a stop word file in the installation location, the system automatically propagates the edited stop word file to Search service applications that are created afterward. However, the edited stop word file is not automatically propagated to existing Search service applications. For each existing Search service application to which you want the changes to apply, you must manually copy the edited file to the Search service application folder on each query server that is running that Search service application.
Observação
-
If you delete a stop word file, the search system might consider all single characters as stop words and remove them from search results. A stop word file must contain at least one entry, even if the entry is merely a period (.) character.
-
If you delete a stop word file and then restart the SharePoint Server Search 14 service, the search system automatically replaces the file by copying the file of the same name from %Program Files%Microsoft Office Servers\14.0\Data\Office Server\Config to the folder where the file was deleted.
Use the following procedure to edit a stop word file.
To edit a stop word file
Verify that the user account that is performing this procedure is member of the local server Administrators group.
Open the stop word file in a text editor. For information about locating and identifying the appropriate stop word file, see Understanding stop word files earlier in this article.
Edit the file so that it includes only the words that you want the search system to ignore in search queries.
Save the stop word file.
Observação
When you save a stop word file, always use the default Encoding value, which is Unicode.
Restart the SharePoint Server Search 14 service by following these steps:
Click Start, point to Administrative Tools, and then click Services.
Right-click SharePoint Server Search 14, and then click Restart.
Stop word changes take effect after the SharePoint Server Search 14 service restarts.
Observação
In Microsoft Office SharePoint Server 2007, the search system excluded stop words from queries and from the index. Therefore, after an administrator removed a word from a stop word file, it was necessary to perform a full crawl to index any instances of that stop word that the crawler might encounter. In contrast, in SharePoint Server 2010, the search system excludes stop words from queries, but by design it does not exclude stop words from the index. Therefore, in SharePoint Server 2010, if you remove a word from a stop word file, it is not necessary to perform a new crawl because the stop word is already in the index if it was encountered during a crawl. (If you add a word to a stop word file, it is not necessary to perform a new crawl either, because the search system does not look for stop words in the index.)
Stop word files by language
When you install SharePoint Server 2010, stop word files are installed for the following languages. If a stop word file does not exist for a language, the search system uses the neutral stop word file noiseneu.txt.
Language | Stop word file name |
---|---|
Arabic |
noiseara.txt |
Bengali |
noiseben.txt |
Bulgarian |
noisebul.txt |
Catalan |
noisecat.txt |
Czech |
noiseces.txt |
Chinese (Simplified) |
noisechs.txt |
Chinese (Traditional) |
noisecht.txt |
Croatian |
noisecro.txt |
Danish |
noisedan.txt |
Dutch (Netherlands) |
noisenld.txt |
English (United Kingdom) |
noiseeng.txt |
English (United States) |
noiseenu.txt |
Finnish |
noisefin.txt |
French |
noisefra.txt |
German |
noisedeu.txt |
Greek |
noisegrc.txt |
Gujarati |
noiseguj.txt |
Hebrew |
noiseheb.txt |
Hindi |
noisehin.txt |
Hungarian |
noisehun.txt |
Icelandic |
noiseice.txt |
Indonesian |
noiseind.txt |
Italian |
noiseita.txt |
Japanese |
noisejpn.txt |
Kannada |
noisekan.txt |
Korean |
noisekor.txt |
Language neutral |
noiseneu.txt |
Latvian |
noiselav.txt |
Lithuanian |
noiselit.txt |
Malay |
noisemal.txt |
Malayalam |
noisemly.txt |
Marathi |
noisemar.txt |
Norwegian (Bokmal) |
noisenor.txt |
Polish |
noiseplk.txt |
Portuguese (Portugal) |
noisepor.txt |
Portuguese (Brazil) |
noiseptb.txt |
Punjabi |
noisepun.txt |
Romanian |
noiserom.txt |
Russian |
noiserus.txt |
Serbian (Cyrillic) |
noisesbc.txt |
Serbian (Latin) |
noisesbl.txt |
Slovak |
noisesvk.txt |
Slovenian |
noiseslo.txt |
Spanish |
noiseesn.txt |
Swedish |
noisesve.txt |
Tamil |
noisetam.txt |
Telugu |
noisetel.txt |
Thai |
noisetha.txt |
Turkish |
noisetur.txt |
Ukrainian |
noiseurk.txt |
Urdu (Pakistan) |
noiseurd.txt |
Vietnamese |
noisevie.txt |