Condividi tramite


SharePoint 2013 Search: People Search – “Why are my results so bad?” Understanding Relevancy, the Rank Model, Full-Text Index, Fuzzy Matching, and Social Distance

I’ve heard this question come up enough now that I think it warrants deeper examination.  Several customers have complained about “bad results” in people search.  They search for Katherine Doe, but the number-one result is for Cathy Smith and it’s ranked higher than the Katherine Doe result.  While this seems like bad relevancy, if we step back and understand why the relevancy and ranked results are technically correct, we can make some informed choices that will bring us closer to the desired results.

The first step is understanding the difference between Recall and Precision.  The second is understanding how the default People Search Rank Model and the People Full-Text Index, called PeopleIdx, contribute to Recall and Precision.

Recall and Precision

In the context of information retrieval, or search, Recall and Precision measure the completeness and accuracy of a query’s results.  For our purposes, Recall defines the set of all possible documents that match a query, and Precision is the sort of the recalled set by a relevancy score.  In SharePoint 2013, we can equate Recall to the Full-Text Index and Precision to the Ranking Model.

Full-Text Index

In SharePoint 2013, when users create a Managed Property and set the Searchable property to “true”, they have the opportunity to assign the property to a Full-Text Index in the Advanced Searchable Settings.  SharePoint 2013 has several Full-Text Indexes, but only two are used by default:

  1. Default—the Full-Text Index for the "Local SharePoint Results" result source or the “Everything” vertical
  2. PeopleIdx— the Full-Text Index for the “Local People Results” result source or the “People” vertical.  All the people relevant managed properties are assigned to the PeopleIdx Full-Text Index

Some simple PowerShell reveals all Managed Properties assigned to PeopleIdx:

> Get-SPEnterpriseSearchMetadataManagedProperty -SearchApplication $ssa | ? {$_.FullTextIndex -eq "PeopleIdx"} | ft Name, FullTextIndex, Context

Name                                    FullTextIndex                                                           Context

----                                    -------------                                                           -------

AccountName                             PeopleIdx                                                                     9

MyFirstName                           PeopleIdx                                                                    15

MyLastName                            PeopleIdx                                                                    15

CombinedName                            PeopleIdx                                                                    14

ContentsHidden                          PeopleIdx                                                                     5

JobTitle                                PeopleIdx                                                                     2

Memberships                             PeopleIdx                                                                     6

NgramPhoneNumbers                       PeopleIdx                                                                     0

PreferredName                           PeopleIdx                                                                     1

Pronunciations                          PeopleIdx                                                                     8

RankingWeightHigh                       PeopleIdx                                                                     7

RankingWeightLow                        PeopleIdx                                                                     4

RankingWeightName                       PeopleIdx                                                                    13

Responsibilities                        PeopleIdx                                                                     3

SipAddress                              PeopleIdx                                                                    12

UserName                                PeopleIdx                                                                    10

WorkEmail                               PeopleIdx                                                                    11

 

These are all the Managed Properties that will be used for Recall when doing a people search.  There is a loose coupling between a Full-Text Index and a Rank Model.  Note that a Rank Model may not show all of the Managed Properties assigned to a Full-Text Index, so don’t forget that, despite their absence, all Managed Properties in the Full-Text Index contribute to recall, but with no weighting.  Also, any Managed Property that shares the same context as a property weighted in a Rank Model will receive the same weighting during rank score calculation.  So, if you add a Managed Property named Interests and assign it to context 3, and the Rank Model weights Responsibilities (e.g., w=”1.0”), Interests will receive the same weighting, despite not being specifically named in the Rank Model.

People Search Rank Models

The default People Search Rank Model is named “People Search application ranking model” (https://technet.microsoft.com/library/7c8ddec1-c8ff-4a90-afae-387b27a653f1.aspx#Ranking_Models).  Rank Models provide Precision or Relevancy Ranking for a recalled search result set.  Note that of the 16 OOB rank models, six are specific to People Search.  One of these rank models may serve your needs better than the default.  To choose which model is best for you, you’ll need to understanding the rank model beyond its description and understand the configuration of the rank model.

PowerShell provides some nifty Cmdlets that export the rank model configuration XML (https://msdn.microsoft.com/en-us/library/office/dn169052(v=office.15).aspx#sp15_using_custom_ranking_model ).  If you look at the XML config for the default People Search ranking model, you should notice a couple things.  First, the model has two stages (<RankingModel2NN>).  The first stage is a lighter calculation used to sort all the results that were matched or recalled from the search index.  The second stage is used to resort the top 1000 results.  It uses a more complex sorting algorithm, adding proximity weights to the calculation.  Not all ranking models have two stages; some models (e.g. People Search name ranking model) only have one stage.  Second, the ranking model names all the fields that will be weighted with their weight values and a bias value.

Here is an excerpt of the properties definition and weightings from the default people search rank model.

       <Properties>

          <Property name="RankingWeightName" w="0.5" b="0.5" propertyName="RankingWeightName" />

          <Property name="PreferredName" w="1.0" b="0.5" extractOccurrence="1" propertyName="PreferredName" />

          <Property name="JobTitle" w="2.0" b="0.5" extractOccurrence="1" propertyName="JobTitle" />

          <Property name="Responsibilities" w="1.0" b="0.85714285719" extractOccurrence="1" propertyName="Responsibilities" />

          <Property name="RankingWeightLow" w="0.2" b="0.85714285719" extractOccurrence="1" propertyName="RankingWeightLow" />

          <Property name="ContentsHidden" w="0.1" b="0.85714285719" extractOccurrence="1" propertyName="ContentsHidden" />

          <Property name="Memberships" w="0.25" b="0.85714285719" extractOccurrence="1" propertyName="Memberships" />

          <Property name="RankingWeightHigh" w="2.0" b="0.5" extractOccurrence="1" propertyName="RankingWeightHigh" />

          <Property name="Pronunciations" w="0.05" b="0.5" propertyName="Pronunciations" />

        </Properties>

 Social Distance

An important note needs to be made here about Social Distance Ranking.  As you explore the default people ranking model, you will notice dynamic ranking features such as FirstLevelColleagues, SecondLevelColleagues and LevelsToTop.  These are part of the Social Distance ranking feature, and their intention is to add a relevancy boost to colleagues and management close to you, with respect to your company's organizational chart.  The issue is that, out of the box, it doesn't work correctly.**  I've heard conflicting reports that recent patches have fixed this issue for both SPO and On-prem SharePoint environments.  I have not been able to verify.  Please test thoroughly.  

If it is not working, an examination of the ULS log for a "default" People search query will reveal the following entry:

07/17/2014 01:13:34.85  NodeRunnerQuery1-963b71f2-33dc- (0x1280)    0x1584  Search  Query Processing             aiziq                Monitorable                Microsoft.Office.Server.Search.Query.Pipeline.Processing.PersonalizationDataInjectionEvaluator : Field: PersonalizationData 'null'. Personalized Search queries will not work       0be6b1d5-3bd8-4aac-a1f1-08d7c75867cc

The web part is not setting the "PersonalizationData" parameter.  The consequence is the null parameter prevents Social Distance Ranking from being calculated.  Fortunately, there are three workarounds, but the last is the most simple to implement. For all workarounds, verify the Web Application hosting the Enterprise Search center is associated with a UPSA proxy (This association should work OOB with a 2013 proxy, but not 2010). (1) Use the Query REST API and add the "PersonalizationData" parameter with the search user's User Profile GUID as the value.  (2) Create a custom search web part and add the User Profile GUID.  SCOM uses the parameter name "QueryPersonalizationData".  (3) Edit the People Search results page, edit the People Search Core Results web part, and click Change Query:

Old:

{searchboxquery}

New:

{?(({searchboxquery} XRANK(cb=2) firstlevelcolleagues:{User.userprofile_guid}) XRANK(cb=1) secondlevelcolleagues:{User.userprofile_guid}) XRANK(cb=7.5) userprofile_guid:{User.userprofile_guid}}

Or if you want to get real fancy:

 {?((((((({searchboxquery} XRANK(cb=2) firstlevelcolleagues:{User.userprofile_guid}) XRANK(cb=1) secondlevelcolleagues:{User.userprofile_guid}) XRANK(cb=7.5) userprofile_guid:{User.userprofile_guid}) XRANK(cb=1) levelstotop:1) XRANK(cb=0.8) levelstotop:2) XRANK(cb=0.6) levelstotop:3) XRANK(cb=0.4) levelstotop:4) XRANK(cb=0.2) levelstotop:5}

Note that the XRANK operator can have an impact on query latency, so test this (or any solution) before putting it into production.  Also, note that latency is directly tied to index size, so what performs well in a small test environment may perform poorly in a very large production environment.

**A big thanks to Mikkel Conradi for doing all the leg work on this issue.

“Why are my results so bad?”  

Back to the Katherine Doe/Cathy Smith scenario where searching for Katherine Doe produces results with Cathy Smith at the top.

In reality, the results aren’t bad.  They are exactly what they should be.  Understanding the metadata and how it’s used in the rank model and Full-Text Index will explain why.  First, some background for our scenario.  Katherine Doe is an Executive and Cathy Smith is her Assistant.  In Cathy Smith’s User Profile under Responsibilities it lists “Executive Assistant for Katherine Doe”, while under Memberships, it shows Cathy is a member of “Katherine Doe Direct Reports” group.

If we refer to the default People Rank Model, we see that it includes, along with PreferredName, Memberships and Responsibilities.  So a search for Katherine Doe produces hits in Cathy Smith’s Memberships and Responsibilities Managed Properties, while no hits are in the corresponding Managed Properties for Katherine Doe (Katherine’s name does not appear in these fields.  She is not a member of her direct reports group and she is not her own Executive Assistant).  The hits in the non-name fields are often forgotten or not noticed because they are not displayed in the default Item_Person.html display template and are only partially displayed in the Item_Person_HoverPanel.html display template.  Katherine does get a hit, for both first and last name, in the PreferredName field, but so does Cathy, at least on the first name.  Cathy is a nickname for Katherine, so Fuzzy Matching, which is enabled by default on People Searches, will add “Cathy” as a nickname synonym, for the Katherine term to the query.

Fuzzy Matching

To be honest, the documentation on the Fuzzy Matching feature in SharePoint 2013 is quite… fuzzy.  Often phonetic matching is mentioned when talking about SP 2013 Fuzzy Matching components, but to be clear, there is no intrinsic phonetic matching in SP 2013.  It did exist in SharePoint 2010, but it is not in SP 2013.

SharePoint 2013 Fuzzy Matching is comprised of three services: (1) Core Fuzzy Name Search, (2) Name Suggestions, and (3) Name Intent Search.  The latter, Name Intent Search, is only used as part of a Query Rule condition used to identify People Queries from a “non-People” vertical.  Core Fuzzy Name Search is a spelling or distance algorithm that measures similarity of terms.  Results from this service can look a lot like a phonetic match, but they’re not.  For example, the query 'katherine' will return results for what appears to be its phonetic match 'catherine'.  Examination of verbose ULS logs reveals the real story.  The term 'catherine' is added to the query because of its spelling similarity or distance from 'katherine', not for its phonetic equivalency. Additionally, not only is 'catherine' considered, but so are 'gathering', 'katharine'.

06 /04/2015 12:14:49.21 NodeRunnerQuery1-aaaaedc2-86d0- (0x0F24) 0x129C Search Fuzzy Name Search ajyg0 VerboseEx FuzzyNameSearcher : CoreCCAFuzzySearcher mined [Candidate:catherine GeometricSimilarity: 0.999740619082072 NormalizedConfidence:0.999740619082072] for Query:katherine IsComparable:true  531f0d9d-9c5c-c0a8-27b6-5ef65f3bbadb

06/04/2015 12:14:50.58 NodeRunnerQuery1-aaaaedc2-86d0- (0x0F24) 0x12B4 Search Query Processing aizf6 High Microsoft.Office.Server.Search.Query.Pipeline.Executors.LinguisticQueryProcessingExecutor : QSC: All Annotations: <Annotation ID="1" Name="token" Range="[0,9)" Attributes={normalizedForm="katherine"} NumericalAttributes={}/>,<Annotation ID="2" Name="querysegment" Range="[0,9)" Attributes={} NumericalAttributes={}/>,<Annotation ID="2001" Name="spellcheck" Range="[0,9)" Attributes={Fuzzy="catherine",Score="0.888888888888889",dynamic="",value="5"} NumericalAttributes={}/>,<Annotation ID="2002" Name="spellcheck" Range="[0,9)" Attributes={Fuzzy="gathering",Score="0.777777777777778",dynamic="",value="5"} NumericalAttributes={}/>,<Annotation ID="2003" Name="spellcheck" Range="[0,9)" Attributes={Fuzzy="katharine",Score="0.888888888888889",dynamic="",value="6"} NumericalAttributes={}/>,<Annotation ID="2004" Name="spellcheck" Range="[0,9)" Attributes={Fuzzy="katherine",Score="1",dynamic="",value="5"} NumericalAttributes={}/>,<Annotation ID="4001" Name="qsc_known_word" Range="[0,9)" Attributes={value="9"} NumericalAttributes={}/> 531f0d9d-6cb7-c0a8-27b6-5022c0ff2a0d

Most of the confusion comes from the now poorly named “EnablePhonetics” parameter, found in a Search web part’s DataProviderJSON configuration or in the Query Object Model.  The vestigial “EnablePhonetics” parameter enables Core Fuzzy Name Searching, which unfortunately has no phonetic matching features.  Name Suggestions is the nickname synonym service.  Names found in the nickname dictionary will have their nickname mappings added to the query.

Use the following PowerShell to see what nicknames are handled by the service:

> Get-SPEnterpriseSearchLanguageResourcePhrase -type NickName -Language en-US -SearchApplication $ssa -Owner $owner | ft Phrase, Mapping

Conclusion

So what to do about the Cathrine Doe > Kathy Smith scenario?  First, decide on the purpose of your “People” search vertical.  Is it a people name directory look up?  An expertise search?  Should the sorting be influenced by social distance or your organizational chart?  Then pick your Ranking Model.  You might find that the “People Name search ranking model” better suits your needs than the default “People Search application ranking model.”  Take time to look at which Managed Properties each ranking model weighs most heavily.  You might even notice that the “People Search expertise ranking model” is very poorly named.

Comments

  • Anonymous
    August 06, 2015
    Can the name suggestions be displayed on Everything Vertical (which is a non-people vertical). I have created a query rule to bring through people results but unfortunately it do not work.

  • Anonymous
    August 06, 2015
    There is an OOTB Query Rule that does this.  Try patterning your rule after it. In the SSA admin pages, look at the Query Rules and select the "Local SharePoint Results" result source.  In the list of rules you will see the "People Name in SharePoint Search" rule.  This rule is triggered if the query exactly matches an entry in the People dictionary and the action is to search the index using the template "{subjectTerms} ContentClass=urn:content-class:SPSPeople".  Note that the name must match exactly how it is stored in the dictionary.  Typically this is first and last name. If you are not seeing this rule triggered, make sure your "people" content is being successfully crawled, the People dictionary is being populated and query terms submitted match a People dictionary entry.

  • Anonymous
    September 03, 2015
    Very detailed post on SP 2013 search. Thank you!

  • Anonymous
    October 22, 2016
    The comment has been removed

  • Anonymous
    March 30, 2017
    What about the case where you search for Katherine Doe and you get multiple Katherine Doe(s) back? for a single person. Meaning these all point to the same person or profile. How do I find where or what is causing duplicates?

    • Anonymous
      March 30, 2017
      Good question. First, I'd check the URL of each "Katherine Doe" returned in the search results. Are they truly duplicates? Are the URLs different? Then I'd look at the Crawl Logs | URL View in your SSA Admin. Using the URLs, identify the crawled item. Does the same URL have different Item IDs? Every crawled item has an Item Id (referenced internally as a DocId or WorkId). The answers to these questions will help determine how you got the duplicates and what you need to do to address them. It may be a problem in the AD (multiple user objects), the sync and or User Profile creation (multiple User Profiles in SharePoint), or may have orphaned duplicates in the index (identical URLs with different IDs). Depending on the cause, you may need to clean up the source (AD, User Profile) or delete content in the index and re-crawl. If you have orphaned duplicates, I recommend reaching out to MS Support Services for assistance.