SharePoint 2010/2013/2016: Best Practices for Indexing File Share Content
File Share Best Practices
Problem Case 1
Some content on a file share does not appear in the search results. You have verified that the root folder contains the correct user account and most of the file share content is searchable. Further investigation shows that a set of documents are not searchable and have not been indexed.
Why this happens
Users are able to change the inheritance rules on folders. This can cause a problem with the search user that is assigned to crawl the content. Typically, a search user is granted access to the root folder and it is anticipated that rights will be inherited to all folders below the root. In practice, this user has access to all folders until a folder is reached that does not allow the inheritance to continue. Users are able to change inheritance permissions on folders and this action can cause any content in the folder and sub folder tree to not be indexed.
Resolution Overview
The solution to this problem is to ensure the search user is assigned read access to all folders in the directory. There are two tools you can use to assist with this task. One is a deprecated version of the other, but is a little easier to work with in some respects. The tools are CACLS.exe and ICACLS.exe and are included in a standard Windows OS installation.
Resolution Technical Details
The official documentation is here:
ICACLS – http://technet.microsoft.com/en-us/library/cc753525(WS.10).aspx
CACLS – http://technet.microsoft.com/en-us/library/cc732245(WS.10).aspx (deprecated)
CACLS is a little easier to use when setting permissions for a specific user. For example, to set the “Read” permission for a specific user do the following:
cacls .\start_dir /T /E /G <domain>\username>:r
The /R revokes rights the /G grants it.
Example
cacls c:\Root /t /e /g group1:R
This will:
- Process the folder c:\Root
- Process all subdirectories (/t)
- Edit the ACLs (/e) instead of replacing them (do NOT forget the /e, or you'll have to reassign the permissions)
- Grant (/g) the group "group1" read access (R). If it's a global group, you might have to add the domain name in front of the group name: YourDomain\group1.
Note
The cacls command has been deprecated and icacls is the replacement. (icacls example anyone? ) One limitation of cacl is that is does not handle file path lengths that are greater the 256 characters. Other tools that are useful are Security Explorer/sxpbackup to set the permissions and FolderSizes to check for path lengths greater than 256 characters.
Example
icacls c:\Root /grant /t <SID>:RD
icacls c:\Root /grant /t group1:RD
icacls c:\Root /grant /t user1:RD
This will:
- Process the folder c:\Root
- Process all subdirectories (/t)
- Edit the ACLs by default instead of replacing them (use /grant:r for replace permissions)
- Grant <SID> read access (RD). SIDs may be in either numerical or friendly name form. If you use a numerical form, affix the wildcard character * to the beginning of the SID.
Summary
The cacl command will grant a user or group with read access to all folders in a directory. This command should be run prior to the first file share crawl to ensure all the content is searchable. (Note: the ACL information is attached to the document so only the appropriate users are permitted to view the content and retrieve the content in the search results.) Be sure to set up a procedure to run this command on a periodic basis so future changes to folder inheritance rules are continuously resolved. This command must be executed from the command line by a user that has administrative rights to the file share content.
Problem Case 2
Documents that start with a filename of ~ are not indexed. This is normal as these are temp files that get created when a user is editing a document.
Problem Case 3
We don't want to index exe and dll files. These file types are very large and do not add value to the index. We also have other file types that we want to exclude from the search results.
Resolution
This is solved by configuring an exclude clawl rule.
Set the Path =
file://gr06/.*((exe|dll))
Check Use regular expression syntax for matching this rule
Select the radio Exclude all items in this path
See this blog post for more examples on crawl rule regex support
Problem Case 4
Security changes are not being reflected after performing an incremental crawl. Security changes are only captured when a full crawl is performed.
See TechNet article Plan for crawling and federation (SharePoint Server 2010) http://technet.microsoft.com/en-us/library/cc262926.aspx
Problem Case 5
• .lnk documents are not indexed. This is normal as these are shortcut files.
Please note: Also check out the SharePoint 2010 Best Practice Overview page at http://social.technet.microsoft.com/wiki/contents/articles/8666.sharepoint-2010-best-practices-en.aspx
Problem Case 6
Users are able to see documents in the search results but when they click on the document they do not have access. This can be a serious security hole. A common cause for this behavior is when the share permissions are different than the file system permissions. The share is used during the crawl to navigate and used by the search results to display a clickable link. However, the crawl uses the permissions set on the file system. If the file system has broader security settings than the share, then users will see documents in the search results that they do not have access to.
Resolution
The recommendation is to ensure that the share permission matches the root of the file system and is inherited down through the entire folder structure. It is not uncommon for users to change a folder permission setting so that it does not inherit. This has two negative side effects. First, the search user may be removed and therefore none of the folder contents are searched. Second, the permissions may be widened to include more than what is on the share. This would allow users to see search results for documents they cannot access. It is recommended to have a corporate practice which audits and fixes any discrepancies.
Problem Case 7
The file share crawls take a much longer time than is expected.
Resolution
Some customers have noticed that when using an FQDN on the path that the crawl progresses much faster.
Problem Case 8
How can we prevent the crawler from sending certain files to the index?
Resolution
One possibility it to flag the files you don’t want crawled as offline: http://support.microsoft.com/kb/980085/EN-US
Problem Case 9
How can we prevent the crawler from sending certain files to the index?
Resolution
Another possibility is to use Content Enrichment Web Service to drop documents based on a managed property value. You’d still be crawling the items and need crawled/managed property mapping, but the documents would never get to the index.