When, why and how to deal with Custom Security Trimmer in Enterprise Search? - Part I

First of all, security trimming is very important in Enterprise Search. Users who have no rights to the documents should not see descriptions in their search results. They should not be aware of those items at all.

When building Enterprise Search solutions using Microsoft SharePoint Server 2007(MOSS), you can find that MOSS support file share, sharepoint, lotus notes security trimming out of the box. This means, the protocol handler picks up ACL in index time, and security trimming will be applied at query time. Such query behavior is like a SQL sentence:

SELECT * from scope() where freetext("keyword") and YourCurrentUserRight="True"  (This is not the real sql sentence, just to give you an idea)

So query performance will not be impacted.

But, what about other stuff like website, database, or a custom content source?

Custom Security Trimmer(CST) is used in MOSS, to support security trimming of such things. The behavior of CST is quite different from build-in security trimmer. It is run at query time, but because the YourCurrentUserRight value is not there, CST will access target system to retrieve this value after the search results come out. It will check one by one, for example, there're 4,000 items in the result about "jokes", but you can only access 100 items. So the process changed to:

1. Do a search for "jokes", this is like SELECT * from scope() where freetext("keyword"), and no security trimming applied. 4,000 items returned in the result. This is not displayed to the user.

2. Because CST is registered with "crawl rules"(this is one of the worst name examples I had ever seen in Microsoft, wth, a rule applied at query time is called CRAWL RULE?), if the path of the item meet the rule, CST will be launched to check if current user has the permission to read this item. If he has, CST will report a "True". Note, multiple CST instances will be launched at the same time to check different items, and it seems you cannot control this number. I think it's around 4-5.

3. After the "True" number of items in one page is meet, for example 10 items CST reported True after checked against about 200 items, the first page of result will be displayed.

Let's do some basic calculation job. What will happen if a bad CST is applied? The key point is how much time will be used to check the permission in CST. If a CST will need one second to check one item, meanwhile 4 CSTs are launched, 200 items will need you 50 seconds to complete the job. This means, you have to wait for 50 seconds to get the search result showed in your browser!

Terrible, right? Even worse, if you have 100,000 items in a result array, and you only have permission against 4 items, the search service will crush because of the timeout.

So that's why CheckLimit is also needed in the implementation of CST.

Now, the best practice when you want to create a CST:

1. Reduce the time needed to check the permission. You can do some trick to make it faster, for example, store the permission mapping in a local SQL table first, and use CST to check local table not the remote one, so you can bypass the network delay.

2. Correctly set CheckLimit. Return a more user friendly message when the limit is met.

To implement a CST, you can refer to SDK, or these articles:

https://msdn2.microsoft.com/en-us/library/aa981236.aspx
https://msdn2.microsoft.com/en-us/library/aa981563.aspx

These are not very good articles, some of the information are misleading. So far as I know, there're some much better articles on the way, but I don't know the exact date of when they will be published on MSDN.  Later in another post I will go through the code to explain more.