다음을 통해 공유


SharePoint 2013 Search: People Search – “Why are my results so bad?” Understanding Relevancy, the Rank Model, Full-Text Index, and Fuzzy Matching

I’ve heard this question come up enough now that I think it warrants deeper examination.  Several customers have complained about “bad results” in people search.  They search for Katherine Doe, but the number-one result is for Cathy Smith and it’s ranked higher than the Katherine Doe result.  While this seems like bad relevancy, if we step back and understand why the relevancy and ranked results are technically correct, we can make some informed choices that will bring us closer to the desired results.

The first step is understanding the difference between Recall and Precision.  The second is understanding how the default People Search Rank Model and the People Full-Text Index, called PeopleIdx, contribute to Recall and Precision. 

Recall and Precision

In the context of information retrieval, or search, Recall and Precision measure the completeness and accuracy of a query’s results.  For our purposes, Recall defines the set of all possible documents that match a query, and Precision is the sort of the recalled set by a relevancy score.  In SharePoint 2013, we can equate Recall to the Full-Text Index and Precision to the Ranking Model.

Full-Text Index

In SharePoint 2013, when users create a Managed Property and set the Searchable property to “true”, they have the opportunity to assign the property to a Full-Text Index in the Advanced Searchable Settings.  SharePoint 2013 has several Full-Text Indexes, but only two are used by default: 

  1. Default—the Full-Text Index for the "Local SharePoint Results" result source or the “Everything” vertical
  2. PeopleIdx— the Full-Text Index for the “Local People Results” result source or the “People” vertical.  All the people relevant managed properties are assigned to the PeopleIdx Full-Text Index

Some simple PowerShell reveals all Managed Properties assigned to PeopleIdx:

> Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa | ? {$_.FullTextIndex -eq "PeopleIdx"} | ft Name, FullTextIndex, Context

Name FullTextIndex Context

---- ------------- -------

AccountName PeopleIdx 9

AMEXFirstName PeopleIdx 15

AMEXLastName PeopleIdx 15

CombinedName PeopleIdx 14

ContentsHidden PeopleIdx 5

JobTitle PeopleIdx 2

Memberships PeopleIdx 6

NgramPhoneNumbers PeopleIdx 0

PreferredName PeopleIdx 1

Pronunciations PeopleIdx 8

RankingWeightHigh PeopleIdx 7

RankingWeightLow PeopleIdx 4

RankingWeightName PeopleIdx 13

Responsibilities PeopleIdx 3

SipAddress PeopleIdx 12

UserName PeopleIdx 10

WorkEmail PeopleIdx 11

 

These are all the Managed Properties that will be used for Recall when doing a people search.  There is a loose coupling between a Full-Text Index and a Rank Model.  Note that a Rank Model may not show all of the Managed Properties assigned to a Full-Text Index, so don’t forget that, despite their absence, all Managed Properties in the Full-Text Index contribute to recall, but with no weighting.  Also, any Managed Property that shares the same context as a property weighted in a Rank Model will receive the same weighting during rank score calculation.  So, if you add a Managed Property named Interests and assign it to context 3, and the Rank Model weights Responsibilities (e.g., w=”1.0”), Interests will receive the same weighting, despite not being specifically named in the Rank Model.

People Search Rank Models

The default People Search Rank Model is named “People Search application ranking model” (https://technet.microsoft.com/library/7c8ddec1-c8ff-4a90-afae-387b27a653f1.aspx#Ranking_Models).  Rank Models provide Precision or Relevancy Ranking for a recalled search result set.  Note that of the 16 OOB rank models, six are specific to People Search.  One of these rank models may serve your needs better than the default.  To choose which model is best for you, you’ll need to understanding the rank model beyond its description and understand the configuration of the rank model.

PowerShell provides some nifty Cmdlets that export the rank model configuration XML (https://msdn.microsoft.com/en-us/library/office/dn169052(v=office.15).aspx\#sp15\_using\_custom\_ranking\_model).  If you look at the XML config for the default People Search ranking model, you should notice a couple things.  First, the model has two stages (<RankingModel2NN>).  The first stage is a lighter calculation used to sort all the results that were matched or recalled from the search index.  The second stage is used to resort the top 1000 results.  It uses a more complex sorting algorithm, adding proximity weights to the calculation.  Not all ranking models have two stages; some models (e.g. People Search name ranking model) only have one stage.  Second, the ranking model names all the fields that will be weighted with their weight values and a bias value.

Here is an excerpt of the properties definition and weightings from the default people search rank model.

       <Properties>

          <Property name="RankingWeightName" w="0.5" b="0.5" propertyName="RankingWeightName" />

          <Property name="PreferredName" w="1.0" b="0.5" extractOccurrence="1" propertyName="PreferredName" />

          <Property name="JobTitle" w="2.0" b="0.5" extractOccurrence="1" propertyName="JobTitle" />

          <Property name="Responsibilities" w="1.0" b="0.85714285719" extractOccurrence="1" propertyName="Responsibilities" />

          <Property name="RankingWeightLow" w="0.2" b="0.85714285719" extractOccurrence="1" propertyName="RankingWeightLow" />

          <Property name="ContentsHidden" w="0.1" b="0.85714285719" extractOccurrence="1" propertyName="ContentsHidden" />

          <Property name="Memberships" w="0.25" b="0.85714285719" extractOccurrence="1" propertyName="Memberships" />

          <Property name="RankingWeightHigh" w="2.0" b="0.5" extractOccurrence="1" propertyName="RankingWeightHigh" />

          <Property name="Pronunciations" w="0.05" b="0.5" propertyName="Pronunciations" />

        </Properties>

“Why are my results so bad?”  

Back to the Katherine Doe/Cathy Smith scenario where searching for Katherine Doe produces results with Cathy Smith at the top.

In reality, the results aren’t bad.  They are exactly what they should be.  Understanding the metadata and how it’s used in the rank model and Full-Text Index will explain why.  First, some background for our scenario.  Katherine Doe is an Executive and Cathy Smith is her Assistant.  In Cathy Smith’s User Profile under Responsibilities it lists “Executive Assistant for Katherine Doe”, while under Memberships, it shows Cathy is a member of “Katherine Doe Direct Reports” group.

If we refer to the default People Rank Model, we see that it includes, along with PreferredName, Memberships and Responsibilities.  So a search for Katherine Doe produces hits in Cathy Smith’s Memberships and Responsibilities Managed Properties, while no hits are in the corresponding Managed Properties for Katherine Doe (Katherine’s name does not appear in these fields.  She is not a member of her direct reports group and she is not her own Executive Assistant).  The hits in the non-name fields are often forgotten or not noticed because they are not displayed in the default Item_Person.html display template and are only partially displayed in the Item_Person_HoverPanel.html display template.  Katherine does get a hit, for both first and last name, in the PreferredName field, but so does Cathy, at least on the first name.  Cathy is a nickname for Katherine, so Fuzzy matching, which is enabled by default on People Searches, will add “Cathy” as a nickname synonym, for the Katherine term to the query.

Fuzzy Matching

To be honest, the documentation on the Fuzzy Matching feature in SharePoint 2013 is quite… fuzzy.  Often phonetic matching is mentioned when talking about SP 2013 Fuzzy Matching components, but to be clear, there is no intrinsic phonetic matching in SP 2013.  It did exist in SharePoint 2010, but it is not in SP 2013. 

SharePoint 2013 Fuzzy Matching is comprised of three services: (1) Core Fuzzy Name Search, (2) Name Suggestions, and (3) Name Intent Search.  The latter, Name Intent Search, is only used as part of a Query Rule condition used to identify People Queries from a “non-People” vertical.  Core Fuzzy Name Search is a spelling or distance algorithm that measures similarity of terms.  Results from this service can look a lot like a phonetic match, but they’re not.  Most of the confusion comes from the now poorly named “EnablePhonetics” parameter, found in a Search web part’s DataProviderJSON configuration or in the Query Object Model.  The vestigial “EnablePhonetics” parameter enables Core Fuzzy Name Searching, which unfortunately has no phonetic matching features.  Name Suggestions is the nickname synonym service.  Names found in the nickname dictionary will have their nickname mappings added to the query.

Use the following PowerShell to see what nicknames are handled by the service:

> Get-SPEnterpriseSearchLanguageResourcePhrase -type NickName -Language en-US -SearchApplication $ssa -Owner $owner | ft Phrase, Mapping

Conclusion

So what to do about the Cathrine Doe > Kathy Smith scenario?  First, decide on the purpose of your “People” search vertical.  Is it a people name directory look up?  An expertise search?  Should the sorting be influenced by social distance or your organizational chart?  Then pick your Ranking Model.  You might find that the “People Name search ranking model” better suits your needs than the default “People Search application ranking model.”  Take time to look at which Managed Properties each ranking model weighs most heavily.  You might even notice that the “People Search expertise ranking model” is very poorly named.