beconfig.xml reference
Applies to: FAST Search Server 2010
Use beconfig.xml to configure options for the browser engine component in Microsoft FAST Search Server 2010 for SharePoint. For example, use beconfig.xml to alter browser engine cache sizes or time-out settings.
The browser engine reads the beconfig.xml file in <FASTSearchFolder>\etc on startup.
Customizing beconfig.xml
Note
To modify a configuration file, verify that you meet the following minimum requirements: You are a member of the FASTSearchAdministrators local group on the computer where FAST Search Server 2010 for SharePoint is installed.
Use a text editor (e.g. Notepad), not a general purpose XML editor, to change beconfig.xml.
To edit this file:
Edit beconfig.xml in a text editor to specify settings. Use the existing file in <FASTSearchFolder>\etc\ as a starting point. Do not remove any attribute sections from the file.
Run nctrl.exe restart browserengine to restart the browser engine process, with the new options.
beconfig.xml quick reference
The following table contains a list of the elements in beconfig.xml. These elements can appear in any order, but must occur inside other elements as specified in this table.
Element | Description |
---|---|
<browserengine> |
Identifies this as a browser engine configuration file. |
<browser> |
Specifies options for the virtual Web browser window. Can only occur inside a browserengine element. |
<proxy> |
Specifies options for the internal proxy server. Can only occur inside a browserengine element. |
<process> |
Specifies options that affect the processing of individual items. Can only occur inside a browserengine element. |
<excludes> |
Contains one or more regexp elements, which specify regular exception rules that are used to exclude particular URIs from processing. Can only occur inside a browserengine element. |
<regexp> |
Specifies a regular exclude rule. Can only occur inside an excludes attribute. |
<pipeline> |
Specifies the processing pipeline options, and the pipeline steps to be performed on each item that is processed. Contains one or more extractor elements. Can only occur inside a browserengine element. |
<extractor> |
Specifies an extractor. Must contain both a type and an assembly element, and may contain a parameters element. Can only occur inside a pipeline element. Note The list of extractors and their sub-elements, as provided in <FASTSearchFolder>\etc\beconfig.xml, must not be altered. |
beconfig.xml file format
XML elements in beconfig.xml begin with <
and end with />
.
The basic element format is as follows:
<element_name [attribute_name="value"] [attribute_name="value"] … />
For example:
<process maxOperations="1000" maxMemoryMB="1024" timeout="300" />
Elements and attributes are case-sensitive. Attribute values must be enclosed in quotation marks (" ") and are not case-sensitive.
An element definition can span multiple lines. Spaces, carriage returns, line feeds, and tab characters are ignored in an element definition.For example:
<process
maxOperations="1000"
maxMemoryMB="1024"
timeout="300"
/>
For long element definitions, position attributes on separate lines and use indentation to make the file easier to read.
The basic structure of the beconfig.xml file is as follows:
<?xml version="1.0"?>
<browserengine>
<browser ... />
<proxy ... />
<process ... />
<excludes>
...
</excludes>
<pipeline>
...
</pipeline>
</browserengine>
Comments can be added anywhere and are delimited by <!--
and -->
.
browserengine element
Top level element.
Attributes
None
browser element
This element specifies options to the embedded Web browser component within the browser engine. Use this element to adjust the Web page item loading time-out period. For example, increase the time-out value if Web pages frequently time out during item loading.
Attributes
Attribute | Value | Description |
---|---|---|
width |
<pixels> |
Web pages are rendered in an invisible Web browser window. This option specifies the width of this window in pixels. Default: 1280 |
height |
<pixels> |
Specifies the height of the invisible Web browser window in pixels. Default: 1024 |
visible |
true|false |
Makes the Web browser window visible during processing. Use for debugging only. Makes the Web browser window invisible during processing. Default: false |
images |
true|false |
Specifies that the browser engine should load the images contained on Web pages. Use for debugging only. Specifies that the browser engine should not load the images contained on Web pages. Default: false |
timeout |
<seconds> |
Specifies the time-out period, in seconds, for the browser engine to load the Web page being processed. If a Web page takes longer to load, it will be discarded. This option does not account for the time taken to run the processing pipeline after loading is completed. Default: 60 |
Example
<browser width="1280" height="1024" visible="false" images="false" timeout="60"/>
proxy element
This element specifies options for the internal Web proxy and memory cache used by the browser engine. Use this element to adjust the cache size and maximum age of JavaScripts in the cache.
Attributes
Attribute | Value | Description |
---|---|---|
maxsize |
<bytes> |
Specifies the maximum size of a single JavaScript that will be downloaded from the Web or the Web crawler. Items that exceed this threshold will be discarded. Default: 10485760 |
timeout |
<timeout> |
Specifies the time-out period, in seconds, for any JavaScript or Web page downloaded from the Web or the Web crawler. If a download exceeds this time-out, it will be discarded. Default: 60 |
cacheSize |
<megabytes> |
Specifies the maximum size of the JavaScript cache within the browser engine. It is used for keeping frequently used JavaScripts available without re-downloading them. Default: 25 |
cacheTTL |
<seconds> |
Specifies the maximum age, in seconds, of JavaScripts in the cache before they are evicted. A JavaScript may be evicted earlier if the cache fills up. Default: 3600 |
Example
<proxy maxsize="10485760" timeout="60" cacheSize="25" cacheTTL="3600"/>
process element
This element specifies options that relate to the processing of Web items in the browser engine. Use this element to adjust the maximum memory usage and the pipeline time-out period.
Attributes
Attribute | Value | Description |
---|---|---|
maxOperations |
<operations> |
Specifies the maximum number of Web pages to be processed before the browser engine automatically restarts. This is useful to handle potential memory leaks and stuck processing that may be caused by some Web pages. Default: 1000 |
maxMemoryMB |
<megabytes> |
Specifies the maximum memory usage, in MB, before the browser engine automatically restarts. This is useful to handle potential memory leaks and stuck processing that may be caused by Web pages. Default: 1024 |
timeout |
<timeout> |
Specifies the time-out period, in seconds, for extracting hyperlinks from any specific Web page. This time-out is required to handle cases in which, for example, a JavaScript prevents the processing pipeline from completing processing of a Web page. Default: 300 |
Example
<process maxOperations="1000" maxMemoryMB="1024" timeout="300"/>
excludes element
This element specifies one or more regular expression rules used to prevent the download of specific JavaScript and cascading style sheet URIs. A typical use excludes known advertising scripts to speed up Web page processing and to prevent the scripts from appearing in the content index.
Attributes
None
Example
<excludes>
<regexp value="http://ads\."/>
</excludes>
regexp element
This element specifies a single regular expression exclude rule and can only occur inside an excludes element. This element can occur multiple times.
Attributes
Attribute | Value | Description |
---|---|---|
value |
<regexp> |
Specifies a regular expression that is matched against all external JavaScript and cascading style sheet URIs discovered during processing the Web item. URIs matching the regular expression are not downloaded or included during Web page processing. Default: See <FASTSearchFolder>\etc\beconfig.xml for the default value. |
Example
See excludes element example.
pipeline element
This element specifies the set of extractors that are executed on each Web page during processing in the browser engine. An extractor performs a set of operations, such as extracting a certain kind of hyperlink or HTTP cookies, generating checksum and the final item HTML used for content indexing.
Attributes
Attribute | Value | Description |
---|---|---|
name |
default |
Specifies the name of the pipeline. Only a single pipeline is supported and the name must be "default". |
maxFrameLevels |
<levels> |
Specifies the number of HTML frame levels to process. Normally this option is set to 1, which means that only the top level frame and its immediate child frames (the frameset) are processed. Increasing this number will recursively process multiple frame sets. Default: 1 |
timeout |
<seconds> |
Specifies the maximum time that the processing pipeline can run on a single Web page before it is stopped. Increasing this value will decrease browser engine throughput, but can help reduce Web page processing timeouts. Decreasing the value may improve throughput at the expense of possibly more timeouts. Default:300 |
iterations |
1 |
Specifies the number of iterations to run the pipeline on each Web page. Only one iteration is supported. |
abortOnFailure |
true|false |
Specifies that the processing of a Web page should be stopped if any single extractor fails. Specifies that the processing of a Web page should continue even if some extractors fail. This may improve link extraction, but can (in the worst case) lead to partial items being sent to the content index. |
default |
true |
Specifies that this pipeline is the default pipeline. Because only one pipeline is supported, this value must always be set to "true". |
Example
<pipeline name="default" maxFrameLevels="1" timeout="180" iterations="1" abortOnFailure="true" default="true">
..
</pipeline>
extractor element
This element specifies a single extractor in the pipeline. The list of extractors as provided in <FASTSearchFolder>\etc\beconfig.xml must not be altered.