SharePoint 2010 - Improve Search Relevance
Introduction
I recently developed a SharePoint 2010 solution which includes an advanced search web part to allow users to perform enterprise searches and view the returned results in a graphical rich representation.
As an SharePoint developer / architect I want to ensure that the search results that are returned match what the user wanted to find and that the results that are returned on the first page are the most relevant, so the user does not have to look through several pages of results to find the best matches for their search. This is called Search Relevancy.
It is very important to realize the difference between Sorting and Ranking.
In my own words I would describe Sorting as the process or arranging objects according to a specific attribute of the item. An example would be the books in a library.
Ranking is where an item takes precedence over other items based on a combination of attributes. Examples of this would be how tools in a workshop are arranged, how equipment is arranged in an E.R. room, how individuals are ranked based on the role they play in the military, or even how food items are arranged in the supermarket. One realize from these examples that there is no single property which can be used to determine the ranking of such items and that Ranking is based on the importance an item has in a given situation.
The Role of the SharePoint Developer
Developers focus on ensuring that search engine they developed works well. Performance and accuracy of search results are normally the main focus points. The developer wants to ensure the code works well, but providing a comprehensive search solution requires that the developer also consider the search experience from the end-user point of view.
http://lh5.ggpht.com/_SX0lCwhJEzc/TdZ6-4y7yiI/AAAAAAAAAhk/vKjcCV82eK0/s1600/clip_image002%5B5%5D.jpg
The end-user is not really concerned about what happens on the server and will most likely not appreciate the development effort which goes into building robust search engines.
Management spend up to 27% of their time looking for information and when using the search functionality in your SharePoint solution only one question will be asked; “Did I find what I was searching for on the first page of the results?”
If the item the user searched for is not part of the first 10 search result items then then user will most likely believe the item does not exist in the system.
Fortunately SharePoint includes a Search Ranking Model which developers can use to enhance the search experience.
The SharePoint Search Ranking Model
Relevance is about how closely the search results that are returned to the user match what the user wanted to find.
SharePoint uses a formula to determine which items are most relevant but the end-users have a different view on what should be considered the most relevant items. Even different users who use the same solution might have different opinions about this. (To be honest I can relate to this as I will sometimes at home arrange our DVD collection in a particular order and another family member would prefer a completely different order)
As an example when the user performs a search using the keyword ‘Demo’ the SP Search Engine returns the items in the following order:
http://lh5.ggpht.com/_SX0lCwhJEzc/TdZ7ArTYzlI/AAAAAAAAAhs/TX-a41U3hBw/s1600/clip_image003%5B6%5D.png
SharePoint evaluates all properties and even the contents of the documents returned in the search results and used a complex algorithm to determine which items are most relevant and which items are least relevant.
Users are not interested in algorithms and might not even consider the fact that the search engine also takes into calculation less obvious dynamics like “Click Distance”, “Document Location (URL Depth)”, “Document Popularity”, “Language Detection”, “File Type”, etc. In actual fact the algorithm is very complex and it is unfortunate that very few end users appreciate at how clever the search engine is.
Some users might feel that the search results should be ranked in a way so that the Title of the item is considered the most important determining factor.
In this example we executed the same search we tested earlier but because a different preference of ranking were specified two of the result items were allocated a lower ranking order and items where the Title field are closer to the search criteria used moved up in the ranking.
http://lh4.ggpht.com/_SX0lCwhJEzc/TdZ7CZoQvgI/AAAAAAAAAh0/sgbN_S6wVMY/s1600/clip_image004%5B5%5D.png
There might even be a requirement to rank the search results based on a custom field like ‘Client Name’.
You can see from the same search results below the item where the search criteria was not found as part of the ‘Client Name field moved down in the ranking order and items which contains the search criteria in the ‘Client Name’ field moved up in the rank.
http://lh3.ggpht.com/_SX0lCwhJEzc/TdZ7DzfBM9I/AAAAAAAAAh8/8LnANm_CtUM/s1600/clip_image005%5B4%5D.png
SharePoint Enterprise Search includes a ranking engine developed in collaboration with Microsoft Research. It is specifically tuned for the unique requirements of searching enterprise content.
The great news is that it is possible for IT Professionals to customize the way SharePoint rank search results.
This can be done by:
- creating a custom model (ranking model) and
- instructing SharePoint to use the custom model in a particular area of the solution
- or even to set the new ranking model as the default for SharePoint Search.
This article will provide you with the necessary background SharePoint Ranking Models and will guide you through the process of creating and implementing your own custom ranking models.
Structure
As a developer I used to hate the theory part of an article… the stuff which can be very boring, but believe me, if you really want to become one of the best SharePoint professional you have to consider the ground-rules, even if they are not always applicable to the immediate problem which you are trying to solve, you have to understand how the pieces fit together.
So here we go….
Ranking Model are based on a XML schema which contains an unique identifier, name, description and specifics of the components as part of the formula when calculating numeric scores which indicate which search results are more relevant than others. (For those who are interested in the formula please research the BM25 rating model)
The formula to calculate relevance uses two areas of ranking called Static and Dynamic ranking.
Dynamic Ranking (query-dependent ranking) is where the property values or content of an item affects the ranking score. As example the ‘Title’ field can be evaluated against the search criteria and the more important we consider the field to be, the higher the ranking score will be for items where the value in the ‘Title’ field have a closer match to the search criteria.
Static Ranking (query-independent ranking) is where the content or property values of an item do not determine the ranking of the item.
Dynamic Ranking contains the following components:
- Property Weighting is used to assign a weighting to a property so that they are weighted more heavily in the relevance calculation. Configure this setting to a value between 1 and 75.
- Property Length Normalization – because properties of an item vary in length, evaluation of the values cannot be treated equally and we need to adjust the rank of a content item, based on the length of the property, and the length normalization setting. This only applies to properties that contain text.The range of possible values for this setting is 0 to 1. For long text-managed properties, you usually want to set this to a value near to 0.7, which is the approximate setting for the body property. For properties that contain a small amount of text use a value of approximately 0.5.
- Title Extraction - only performed on Microsoft Office files. In scenarios where the ‘Title’ field of an Office file does not accurately reflect the contents of the item (example when a title of a file is ‘Presentation 1’ or ‘Document 1’), Enterprise Search detects another candidate for title within the body of the content item, and includes this value with the actual title when calculating relevance. (I think this is really cool!)
Static Ranking contains the following components:
- Click Distance is the number of links between a content item and an "expert" page linking to the content item. The more links that the crawler must travel from an authoritative page to the content item, the lower the relevance score. If there are multiple paths to a content item, relevance is calculated based on the shortest path.
- URL Depth refers to how many levels deep within a site the content item is found. The level is determined by reviewing the number of slash ("/") characters in the URL; the greater the number of slash characters in the URL path, the deeper the URL is for that content item. A large URL depth number can lower the relevance of that content.
- Automatic Language Detection determines the user's language based on "Accept-Language" headers from the browser they are using and content that is retrieved in the user's language is considered more relevant than content in other languages, with the exception of English language content.
- File Type Biasing - in most search scenarios, certain file types are more relevant than others. For example, HTML pages and Word documents are usually more relevant to a user's search than an Excel spread sheet or a plain text file.
Tutorial (…the exciting part)
What you need
- SharePoint Central Administration
- SharePoint Site with a document library
- 5 Sample documents (no other content in site)
- A Search Scope to the particular SharePoint site which you want to search in.
- Understanding of Manage Metadata Properties (only if you want to search on custom site columns.
- PowerShell
- Utility or source code to test our ranking model
Test Harness
Before we build a new custom ranking model let us first develop a small utility to help us test our ranking model.
You can also export the standard SharePoint Core Search Results Web Part and modify the xml to specify the ranking model to use, but in most cases I prefer to use a test harness instead of making changes to SharePoint until I have finished quality assurance of my custom code.
Create a new Visual Studio console application and add the following code to your Program.cs class:
using System; using System.Collections.Generic; using System.Linq; using System.Text; using Microsoft.Office.Server.Search; using Microsoft.Office.Server.Search.Query; using Microsoft.SharePoint; class Program { static void Main(string[] args) { ////Replace with your web URL using (SPSite site = new SPSite("http://yourwebURL")) { using (SPWeb web = site.OpenWeb()) { ResultType resulttype = ResultType.RelevantResults; ////VERY IMPORTANT TO USE DEFAULTPROPERTIES instead of * ////Replace 'SPSearchDemoSearchScope' with your Search Scope Name string query = "SELECT RANK, Filename, Title FROM SCOPE() WHERE FREETEXT(DefaultProperties, '*demo* ') AND \"SCOPE\" = 'SPSearchDemoSearchScope' ORDER BY \"Rank\" DESC "; FullTextSqlQuery fulltextsqlquery = new FullTextSqlQuery(site); fulltextsqlquery.QueryText = query; fulltextsqlquery.ResultTypes = resulttype; ////This is where we will later specify the custom ranking model. ////fullTextSqlQuery.RankingModelId = ""; ResultTableCollection resulttablecollection = fulltextsqlquery.Execute(); ResultTable resulttable = resulttablecollection[resulttype]; while (resulttable.Read()) { Console.WriteLine("Rank:" + resulttable["RANK"].ToString() + " Title:" + resulttable["TITLE"].ToString()); } Console.ReadKey(); } } } }
Please note the following:
- Your namespace will be different.
- Replace http://yourwebURL/ with a valid web URL
- Replace 'SPSearchDemoSearchScope' with your Search Scope name.
- I included a placeholder (//fullTextSqlQuery.RankingModelId = "";) to later specify our custom ranking model to use.
Compile the application and run it.
If there are no problems you should see the search results returned and ranked using the default SharePoint Ranking model.
You can also refer to: Best Practices: Writing SQL Syntax Queries for Relevant Results in Enterprise Search
Now that we have a test harness to use for testing different ranking models we can proceed to Build and implement a new custom ranking model.
We need the following values in order to construct our XML:
Item Description How to find value Name The name of your custom ranking model Any value you want to use. Id The unique identifier for the ranking model Use GUIDGen Tool to create a new GUID. Description A description of your ranking model Any text you want to use. One or more queryDependentFeatures pid The property ID of a managed property in the search schema. Run the following PowerShell command to export all the PIDs to a text file: Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty >> C:\PID.txt name The name of a managed property Run the following PowerShell command to export all the Property names to a text file: Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty >> C:\PID.txt weight The weight setting for a managed property. Any value between 0 and infinity. If the value is 0 the property is ignored. Normally the value is between 1 and 75.
The XML schema for a ranking model is structured in the following way:
<rankingModel name="string" id="GUID" description="string" xmlns="http://schemas.microsoft.com/office/2009/rankingModel"> <queryDependentFeatures> <queryDependentFeature pid="PID" name="string" weight="weightValue" lengthNormalization="lengthNormalizationSetting" /> </queryDependentFeatures> <queryIndependentFeatures> <categoryFeature pid="PID" default="defaultValue" name="string"> <category value="categoryValue" name="string" weight="weightValue" /> </categoryFeature> <languageFeature pid="PID" name="string" default="defaultValue" weight="weightValue" /> <queryIndependentFeature pid="PID" name="string" default="defaultValue" weight="weightValue"> <transformRational k="value" /> <transformInvRational k="value" /> <transformLinear max="maxValue" /> </queryIndependentFeature> </queryIndependentFeatures> </rankingModel>
The following link describes the different elements in the XML: http://msdn.microsoft.com/en-us/library/ee558793.aspx
So, in order to create a new ranking model, open a xml editor and use the following example:
<?xml version="1.0" encoding="utf-8"?> <rankingModel name="DemoRankingModel" id="302A9E0E-F8B9-4b21-8180-C327ECCBBA94" description="Demo Custom Ranking Model" xmlns="http://schemas.microsoft.com/office/2009/rankingModel"> <queryDependentFeatures> <queryDependentFeature pid="56" name="Filename" weight="75" lengthNormalization="10" /> <queryDependentFeature pid="2" name="Title" weight="25" lengthNormalization="10" /> </queryDependentFeatures> </rankingModel>
Next, run the following PowerShell command to upload your new custom ranking model:
Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel –RankingModelXML ‘{your xml pasted as a string}’
Using the new Ranking Model
There are two ways of using the custom ranking model:
- Search logic implemented through the object model (shown in this tutorial)
- Extending the Core Results Web Part http://msdn.microsoft.com/en-us/sp2010devtrainingcourse_extendingsharepointsearchlab_topic4#_Toc280092619
You can use the test harness described in the beginning of this tutorial to test your new ranking model. To do that, uncomment the following line and supply the GUID of your new ranking model:
//This is where we will later specify the custom ranking model to use. fullTextSqlQuery.RankingModelId = "302A9E0E-F8B9-4b21-8180-C327ECCBBA94"; //remember to replace the ID value with the GUID of your ranking model
You will notice a change in the way your test harness return the results.
The following diagram shows how your code and the SharePoint Foundation will use your custom Ranking Model to provide different results:
http://lh5.ggpht.com/_SX0lCwhJEzc/TdZ7HBdKypI/AAAAAAAAAiU/k78wXS5hlpc/s1600/clip_image007%5B7%5D.jpg
Additional PowerShell Commands
List all of the Managed Properties in SharePoint Search:
Get-SPEnterpriseSearchServiceApplication | Get-SPEnterpriseSearchMetadataManagedProperty >> C:\PID.txt
Add a new ranking model to SharePoint :
Get-SPEnterpriseSearchServiceApplication | New-SPEnterpriseSearchRankingModel –RankingModelXML ‘{Ranking model XML PASTED AS A STRING}’
List the ranking models:
Get-SPEnterpriseSearchServiceApplication|Get-SPEnterpriseSearchRankingModel
Delete a ranking model:
Remove-SPEnterpriseSearchRankingModel -Identity ‘xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx’ -SearchApplication Get-SPEnterpriseSearchServiceApplication
I hope that you will make full use of all the SharePoint capabilities like custom ranking models to unlock your full potential to providing world-class SharePoint solutions.
Enjoy !
References
Search Relevance & Ranking Model Schema:
Enterprise Search Relevance Architecture Overview
Ranking Model Schema
FullTextSqlQuery Members
Query.RankingModelId Property
Powershell:
Use Windows PowerShell cmdlets to administer and configure search in SharePoint 2013
SQL Syntax Queries:
Best Practices: Writing SQL Syntax Queries for Relevant Results in Enterprise Search