Content Sources Overview
For Enterprise Search in Microsoft Office SharePoint Server 2007, content sources represent content that should be crawled by the search service, along with information about how the crawl for that content source is configured. Crawl configuration is unchanged from Microsoft Office SharePoint Portal Server 2003 Search, however, there are changes in how content sources work and in the classes you use to configure them.
Following are the major changes you can expect:
Multiple start addresses for a single content source are now allowed. This simplifies the process of managing content sources by reducing how many are required.
By default, one content source is configured in Search, the Local Office SharePoint Server Sites content source. This content source includes all content that is stored in the sites within the server or server farm, as well as user profiles.
Two new content source types are added:
SharePointContentSource This type simplifies the content source configuration process for SharePoint sites; all you need to specify is the start address. The SharePointContentSource type is designed to automatically include or exclude the appropriate content from the crawl, without additional configuration.
BusinessDataContentSource This type allows you to configure Enterprise Search to crawl content from back-end server applications, such as SAP or Siebel. To use this content source, you must first configure the Business Data Catalog to access the data in the back-end server applications. For more information, see Business Data Catalog.
Content Source Object Model
The Enterprise Search Microsoft.Office.Server.Search.Administration namespace contains several classes to represent the different content source types in the content source object model, as shown in the following figure.
The following table describes the content source types.
Content Source Type | Comments |
---|---|
ContentSource |
Base class for all content source types. |
WebContentSource |
Used to include Web content. |
SharePointContentSource |
Includes all Windows SharePoint Services content. |
BusinessDataContentSource |
Used to include content from applications configured in the Business Data Catalog. |
HierarchicalContentSource |
Base class. |
Used to include file share content. |
|
ExchangePublicFolderContentSource |
Used to include Microsoft Exchange Server public folder content. |
LotusNotesContentSource |
Used to include Lotus Notes content. Not configured by default. |
Used to include content from custom content sources. |
Scheduling Crawls
The crawl schedule is linked to the content source, so you use the content source classes to manage the crawl schedule for that particular set of content.
To configure the crawl schedule for content sources, you can choose from four schedules. (All schedules inherit from the Schedule base class.)
DailySchedule Use to specify the number of days between crawls.
WeeklySchedule Use to specify the number of weeks between crawls.
MonthlySchedule Use to specify the days of the month and months of the year when the crawl should occur.
MonthlyDayOfWeekSchedule Use to specify the days of the month, weeks of the month, and months of the year when the crawl should occur.
See Also
Tasks
How to: Retrieve the Content Sources for a Shared Services Provider
How to: Add a Content Source
How to: Delete a Content Source
How to: Programmatically Manage the Crawl of a Content Source
How to: Programmatically Configure a Crawl Schedule for a Content Source
Concepts
Getting Started with the Enterprise Search Administration Object Model