Sharepoint 2013 Search Ranking and Relevancy Part 1: Let’s compare to FS14
I’m very happy to do some “guest” blogging for my good friend Leo and continue diving into various search-related topics. In this and upcoming posts, I’d like to jump right into something that interests me very much, and that is taking a look at what makes some documents more relevant than others as well as what factors influence rank score calculations.
Since Sharepoint 2013 is already out, I’d like to touch upon a question that comes up often when someone is considering moving from FAST ESP or FAST for Sharepoint 2010 to Sharepoint 2013 : “So how are rank scores calculated in Sharepoint 2013 Search as opposed to previous FAST versions”?
In upcoming posts, I will go more into “internals” of the current Sharepoint 2013 ranking model as well as introduce the basics of relevancy calculation concepts that apply across many search engines and are not necessarily specific to FAST or Sharepoint Search.
There are some excellent blog posts out there that go in-depth on how Sharepoint 2013 Search rank models work, including the ones below from Alexey Kozhemiakin and Mikael Svenson.
https://powersearching.wordpress.com/2013/03/29/how-sharepoint-2013-ranking-models-work/
https://techmikael.blogspot.com/2013/04/rank-models-in-2013main-differences.html
To avoid being repetitive, what I’ve tried to do is to create an easy to see comparison chart between factors that influence rank calculations in FS14 to Sharepoint 2013 Search. I may update this chart in the future to include FAST ESP, although the main factors involved in both ESP and FS14 are somewhat similar to each other as opposed to Sharepoint 2013 Search(which is closer related to Sharepoint 2010 Search model).
One of the main differences is with the fact that Sharepoint 2013 Search uses a 2-stage process for rank calculations: a linear ranking model as a 1st stage and a Neural Network as a 2nd stage. The 1st stage is “light” and we can afford to apply it to all documents in a result set. There are specific rank features that are part of this stage that are applied to all documents. The top 1000 documents(candidates) based on Stage 1 Rank are input to Stage 2. This stage is more performance intensive and re-computes the rank score for documents used as an input, which is why it is only applied to a limited set. It consists of all the same rank features as Stage 1 plus 4 additional Proximity features.
For my comparison below, I was mainly using a model called “Search Ranking Model with Two Linear Stages”, which has been put in place as of August 2013 CU. This model is recommended to use as a template when creating custom rank models, as it provides you with proximity without a Neural Network.
Rank Factor |
FS14 |
SP2013 Search |
Rank Models | 1 OOTB rank model | 16 Rank Models |
Freshness | Available OOTB and customizable | N/A OOTB, possible to be configured |
Dynamic Ranking (field weighting/managed properties) | Context Boost:
Title, DocSubject, Keywords, DocKeywords, urlkeywords, Description, Author, CreatedBy, ModifiedBy, MetadataAuthor, WorkEmail, Body, crawledpropertiescontent |
Document MP’s + Usage/Social data
Title, QLogClickedText, SocialTag, Filename, Author, AnchorText, body |
FileType | Field-Boost weight/Managed Property Boost(OOTB -4000 points):
Format: Unknown Format, XML, XLS
FileExtension: CVS, TXT, MSG, OFT, ZIP, VSD, RTF
IsEmptyList, IsListItem |
FileType rank feature:
PPT, Sharepoint site, DOC, HTML, ListItems, Image, Message, XLS, TXT
|
Language | N/A | Dynamic Rank(query-based). LCID, i.e locale ID is used. |
Social Distance | N/A | Static Rank(colleague relationship to the person issuing the query).
0 bucket – No colleague relationship 1 bucket – first level(direct) relationship 2 bucket – second level(indirect) relationship |
Static Rank Boost (Query-Independent) | Quality Weight Components:
hwboost docrank siterank urldepthrank
Authority Weight– Partial and Complete
|
Now part of Analytics Processing Component. Static Rank features calculated with Search and Usage Analytics:
QLogClicks QLogSkips QLogLastClicks EventRate
|
Proximity | Enabled by default | MinSpan (Neural Networks 2nd stage, parameters for proximity minimal span
|
Anchortext (Query-Dependent) | Extnumocc = part of Dynamic Rank calculations, query-time hits in anchortext
|
AnchortextComplete |
URLDepth (Query-Dependent) | N/A – in FS14, this was a static rank feature. | UrlDepth – Depth of the document URL(number of slashes)
|
Click-Through Weight(Query-Dependent) | Query-Authority weight: click-through weight, dynamic rank | N/A
Now part of static rank features used in Analytics processing Component(QLogClicks, etc)
|
Rank Tuning |
FS14 |
SP2013 Search |
GUI-based applications. Ease of tuning rank calculations and user-friendliness | N/A
Rank calculations and scores can be seen either via ranklog output or via Codeplex tools such as FS4SP Query Logger. However, there isn’t a user-friendly tool to help you make the changes and push them live, or preferably see them in “Preview” mode offline. A separate ‘spreladmin’ tool is needed for click analysis.
|
Rank Tuning App(coming soon). A GUI-based and user-friendly way to tune/customize ranking and impact relevancy. Includes a “preview”, i.e offline mode. |
Rank logging availability | Server-side:
Ranklog is available via QRServer output. However, it is server-side and only available to Admins with local access to QRServer port 13280.
Client-side:
N/A |
Server-side:
Rank tuning app/ULS logs
Client-side:
ExplainRank template available to clients.
https://powersearching.wordpress.com/2013/01/25/explain-rank-in-sharepoint-2013-search/
|