crawlerconsistency.exe reference
Applies to: FAST Search Server 2010
Use the crawlerconsistency tool to verify and repair the consistency of the crawler item and metadata structures on disk. You can also use the tool to verify and maintain internal crawler store consistency, or when recovering a damaged crawl store.
By default, the tool detects and attempts to repair the following inconsistencies:
Items referenced in metadatabases, but not found in the item store.
Invalid items in the item store.
Unreferenced items in the item store (requires the docrebuild mode.)
Duplicate database checksums not found in metadatabases.
Multiple checksums assigned to the same URI in the duplicate database.
These inconsistencies are automatically corrected in the doccheck or docrebuild mode, followed by the metacheck mode. Any non-consistent URIs are logged, and a delete operation is issued to the indexer (which can be disabled) to ensure synchronization.
In a multi-node crawler environment, you can also use the tool to rebuild a duplicate server from the contents of per-node scheduler post-process checksum databases by using the ppduprebuild mode. Since this mode builds the duplicate server from scratch, you can also use it to change the number of duplicate servers that are used, by first changing the configuration and then rebuilding.
Note
To use a command-line tool, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.
Syntax
<FASTSearchFolder>\bin\crawlerconsistency [options]
Parameters
Parameter | Description |
---|---|
<FASTSearchFolder> |
The path of the folder where you have installed FAST Search Server 2010 for SharePoint, for example C:\FASTSearch. |
crawlerconsistency options
Option | Value | Required | Description |
---|---|---|---|
-M |
<mode>[<mode,..., <mode>] |
Yes |
Specifies one or more modes to run the tool in:
Additionally, you can add the following modifiers:
Include additional modifiers in a comma-separated <mode> list. For example: -M doccheck,docrebuild,updatestat Applies to multi-node crawlers only. |
-O |
<path> |
Yes |
The folder for all output logs. The tool creates a subfolder named with the current date: <year><month><date>. If the subdirectory already exists, a number is appended (e.g. ".1") to the name. |
-d |
<path> |
No |
Location of crawl data, run-time configuration, and logs in subdirectories in the specified directory. Default: data |
-U |
No |
Indicates that you are running the tool on the multi-node scheduler and that data is in a different folder: data\crawler\config\multinode vs. data\crawler\config\node. Applies to routecheck mode. |
|
-C |
<crawl_collection>[,<crawl_collection>,...,<crawl_collection>] |
No |
A comma-separated list of collections to check. Default: all collections |
-c |
<cluster>[,<cluster>,...,<cluster>] |
No |
A comma-separated list of clusters to check. Applies to doccheck and docrebuild modes. Default: all clusters |
-S |
<crawl_site>[,<crawl_site>,...,<crawl_site>] |
No |
Only process the specified site(s). Applies to doccheck mode. Default: All sites |
-z |
No |
Compresses items in the item store when you run the docrebuild mode. This overrides the collection level option to compress items (if you specify it). Default: off |
|
-i |
No |
Skips free disk space checks. Normally the tool checks the free disk space periodically; if it drops under 1GB, it will stop the operation and exit. Warning Use this option with caution. |
|
-n |
No |
Indicates that delete operations are not submitted to the indexer; they are only logged to files. To ensure that deleted items are not left in the index, manually delete those items, or refeed the collection into an empty index. |
|
-F |
<file> |
No |
Loads the crawler global configuration from <file>. Any conflicting options on the command line override values in the file. |
-T |
No |
Runs the tool in test mode. The tool does not delete anything from disk or issue any deletes to the indexer. |
|
-h |
No |
Displays help. |
|
-v |
No |
Displays version information. |
|
-l |
<log-level> |
No |
Specifies the kind of information to log:
|
Examples
The following example verifies and repairs item store and metastore consistency, and updates statistics counters.
<FASTSearchFolder>\bin\crawlerconsistency -M doccheck,metacheck,updatestat -O <FASTSearchFolder>\var\log\crawler\consistency\ -C MyCollection
This will verify each metadatabase entry, verify any corresponding item content in the crawler store, and log inconsistencies in the specified log files.
Remarks
The tool generates the following log files. Log files are only created when the first URI is written to a file.
Log file name | Description |
---|---|
<mode>_ok.txt |
Lists URIs found that were not removed as inconsistencies. The output from the metacheck mode lists every URI with a unique checksum, useful for comparing against the index. Note Items may have been dropped by the pipeline, leaving URIs in this file that are not in the index. You can safely remove URIs in the index that are not in this file. |
<mode>_deleted.txt |
Lists URIs deleted by the tool. Unless you disabled indexer deletes with the -n option, the URIs were removed from the index. Since these URIs were deleted as crawler inconsistencies, they may still exist on the Web servers and should be indexed. Recrawl this list of URIs with the crawleradmin tool by using the --addurifile option (also use the --force option to speed up crawling). |
<mode>_deleted_reasons.txt |
This log file is identical to the <mode>_deleted.txt file but also includes an "error code" to identify each URI delete reason. Definitions for each error code include the following:
|
<mode>_wrongnode.txt |
Used for multi-node crawls, this file outputs all URIs removed from a node because of incorrect routing. These URIs should be crawled by a different master node. The URIs are logged, but not deleted from the index. |
<mode>_refeed.txt |
Lists URIs that had their URI equivalence class updated by running the tool. To bring the index in sync, use postprocess refeed with the -i option to refeed the contents of this file. Or, perform a full refeed. |
Note
Always redirect the stdout and stderr output to a log file on disk.