Share via


How to Create a FAST Search for SharePoint Test Document Using XMLMapper

FAST Search Server 2010 for SharePoint (FAST Search for SharePoint) is an alternative to the out-of-box SharePoint Server search (the built-in enterprise search solution in SharePoint Server 2010). If you want to understand the differences between SharePoint Server search and FAST Search for SharePoint, see [[How FAST Search for SharePoint fits into SharePoint 2010]].

One of the nice capabilities is a flexible item processing architecture, which also includes XML processing.

Using synthetic items (documents) for testing

The simplest way to test a search index is to crawl a bunch of items from your SharePoint site. If you want to test various capabilities of your search deployment, it may be useful with a set of synthetic items with well-defined content and metadata. The advantage is that you can specify all properties of the indexed items. If you want to test ranking features, it is often simpler to use small, synthetic items.

Using XML is a convenient way to create synthetic items. 

Prepare the item processing pipeline

You can use the XMLMapper custom processing stage to input XML documents. In this example we create a configuration to handle synthetic items with the following crawled properties (and the corresponding managed property mappings):

  • mytitle: Title of the item. Mapped to Title.
  • mybody: Main body text of the item. Mapped to body.
  • mysize: Size of the document. Overrides the detected size of the item. Mapped to Size.
  • mydate: The update date of the item. Mapped to Write.
  • mytags: Contains some metadata tags for the item. Mapped to Tags.
  • myint1 - myint4: Four general purpose integer properties. Mapped to managed properties with the same name.
  • mytext1 - mytext4: Four general purpose text properties. Mapped to managed properties with the same name. 

The following steps configure the item processing pipeline. For more information, see Customizing item processing (MSDN).
Unless otherwise specified, you should apply the commands in a PowerShell window on the FAST Search for SharePoint administration server.

  1. Create a new crawled property category for the new crawled properties. All text crawled properties will be mapped to the default fulltext index:

  2. The following three configuration files are for the item processing pipeline. You must update these files on all servers that runs item processing in your deployment (<document-processor> in deployment.xml). 

    1. Edit C:\FASTSearch\etc\config_data\DocumentProcessor\formatdetector\user_converter_rules.xml to override the default document type detection for .xml documents. Details: Format detection and item parsing.

    2. Edit C:\FASTSearch\etc\config_data\DocumentProcessor\optionalprocessing.xml to activate the XMLMapper stage and the FFDDumper stage (for pipeline debugging). Change the following two elements to have active="yes":

      <processor name="XMLMapper" active="yes" />
      <processor name="FFDDumper" active="yes" />

      Note: You should only activate 'FFDDumper' on the test deployment, and not if you crawl a larger number of documents. For more information about FFDDumper output format, see [[How To Identify Crawled Properties and Their Values]]

    3. Create C:\FASTSearch\etc\config_data\DocumentProcessor\XMLMapper.xml. This is the configuration of the mapping from XML elements to crawled properties. For the mapping indicated above, you use this configuration:

      <XMLPropertiesCreator>
         <propset><GUID as created above></propset>
         <type>31</type>
         <XMLMappings>
            <Mapping attr="mytitle" path="//Title"/>
            <Mapping attr="mybody" path="//Body"/> 
            <Mapping attr="mysize" path="//Size" type="3"/> 
            <Mapping attr="mydate" path="//Date" type="64"/> 
            <Mapping attr="mytags" path="//Tags"/> 
            <Mapping attr="myint1" path="//Int1" type="3"/> 
            <Mapping attr="myint2" path="//Int2" type="3"/> 
            <Mapping attr="myint3" path="//Int3" type="3"/> 
            <Mapping attr="myint4" path="//Int4" type="3"/> 
            <Mapping attr="mytext1" path="//Text1"/> 
            <Mapping attr="mytext2" path="//Text2"/> 
            <Mapping attr="mytext3" path="//Text3"/> 
            <Mapping attr="mytext4" path="//Text4"/>
         </XMLMappings>
      </XMLPropertiesCreator>

      For more information, see XML mapper schema (MSDN) .

  3. Update the item processing configuration on the FAST Search for SharePoint servers (on any FAST Search for SharePoint server):

    psctrl reset

Create and submit a test item

  1. Create the test document. This is a simple xml document you can use as the initial test document. The commands below assumes you create the file as C:\XMLMapper\doc1.xml:

    <Document>

      <Title>Document 1</Title>

      <Date>2011-01-01T08:00:00Z</Date>

      <Size>128</Size>

      <Body>This is the first test document. alpha bravo charlie delta echo foxtrot golf hotel.</Body>

      <Tags>

        <Tag>alpha</Tag>

        <Tag>bravo</Tag>

        <Tag>charlie</Tag>

      </Tags>

      <Int1>1</Int1>

      <Int2>2</Int2>

      <Int3>3</Int3>

      <Int4>4</Int4>

      <Text1>alpha</Text1>

      <Text2>alpha bravo</Text2>

      <Text3>alpha bravo charlie</Text3>

      <Text4>alpha bravo charlie delta</Text4>

    </Document>

  2. Submit the document using 'docpush':

    docpush -c sp -u file:/// C:\XMLMapper\doc1.xml

    This command submits the XML document to the pipeline. When you have submitted this first document, the crawled properties are automatically created.

  3. Verify that the document is added. Inspect the processing log (from FFDDumper) in C:\FASTSearch\data\ffd\ For more details, see [[How To Identify Crawled Properties and Their Values]].
     

Create managed properties and crawled property mappings

By submitting the initial test document you have created the necessary crawled properties. Now you need to create the custom managed properties and set up the mapping to managed properties.

  1. Create the mapping for the existing managed properties:

    $mp = Get-FASTSearchMetadataManagedProperty -Name body
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mybody
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = Get-FASTSearchMetadataManagedProperty -Name Title
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mytitle
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = Get-FASTSearchMetadataManagedProperty -Name Size
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mysize
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = Get-FASTSearchMetadataManagedProperty -Name Write
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mydate
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    Note that mapping from XML to body should only be used for testing. Such a mapping implies that certain standard item processing of the body will not take place, such as property extraction. If you want to test such features, it is better to submit a plain text file using docpush.

  2. The managed properties Write, Size and Title already contains crawled property mappings, and these properties will have default values also for our synthetic documents. In order to override the default mappings, you must ensure that our new mappings gets first in the list of crawled property mappings. This can be done in PowerShell, but is more convenient to do in the GUI. To do that, you must:

    1. Go to the Query SSA server, and go to Central Administration --> Manage service applications.
    2. Select the name of your Query SSA.
    3. Select FAST Search Administration --> Managed properties
    4. For each of the three properties:
      1. Search the name, click on the property.
      2. In Mappings to Crawled Properties, move the crawled property you have created up to the top of the list and click OK.
  3. Create the new managed properties with associated mappings. 'myint1' and 'mytext1' also have the following additional features enabled:

    • Sorting enabled
    • Query refinement enabled

    $mp = New-FASTSearchMetadataManagedProperty -Name mytext1 -Type 1
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext1
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp
    Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -SortableType 1
    Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -RefinementEnabled 1

    $mp = New-FASTSearchMetadataManagedProperty -Name mytext2 -Type 1
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext2
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = New-FASTSearchMetadataManagedProperty -Name mytext3 -Type 1
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext3
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = New-FASTSearchMetadataManagedProperty -Name mytext4 -Type 1
    $cp = Get-FASTSearchMetadataCrawledProperty -Name mytext4
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = New-FASTSearchMetadataManagedProperty -Name myint1 -Type 2
    $cp = Get-FASTSearchMetadataCrawledProperty -Name myint1
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp
    Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -SortableType 1
    Set-FASTSearchMetadataManagedProperty -ManagedProperty $mp -RefinementEnabled 1

    $mp = New-FASTSearchMetadataManagedProperty -Name myint2 -Type 2
    $cp = Get-FASTSearchMetadataCrawledProperty -Name myint2
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = New-FASTSearchMetadataManagedProperty -Name myint3 -Type 2
    $cp = Get-FASTSearchMetadataCrawledProperty -Name myint3
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

    $mp = New-FASTSearchMetadataManagedProperty -Name myint4 -Type 2
    $cp = Get-FASTSearchMetadataCrawledProperty -Name myint4
    New-FASTSearchMetadataCrawledPropertyMapping -ManagedProperty $mp -CrawledProperty $cp

  4. Re-submit the test document using 'docpush' (to populate the managed properties you have mapped)

  5. Run a test query in one of the following ways:

    1. Make a test query on the internal query interface on the query processing server (<query> role in deployment.xml). Use the following URL: http://localhost:13280/   (assuming you use the default base port)
      Search for 'xmlmapper'. You will see an internal XML query result with all managed properties that are returned in query results.
    2. Use the simple PowerShell script as described in this blog: http://blogs.msdn.com/b/knutbran/archive/2011/04/01/some-hints-on-testing-custom-managed-properties-and-queries.aspx