如何:以编程方式配置内容源的爬网计划
在 Microsoft Office SharePoint Server 2007 企业级搜索中,通过为搜索服务的共享服务提供程序 (SSP) 配置的内容源来指示搜索索引服务对哪些内容进行爬网。通过使用企业级搜索管理对象模型,可以向 SSP 的内容源集合添加新的内容源。有关详细信息,请参阅如何:添加内容源。
为内容添加内容源只是任务的一部分。对于要在内容索引中包含的内容,搜索索引组件还必须对内容进行实际爬网。
通过调用 [ContenSource] 类的适当方法,可以手动启动对内容源的完全或增量爬网(以及暂停、恢复或停止爬网)。有关详细信息,请参阅如何:以编程方式管理对内容源的爬网。
但是,如果希望对内容源的内容进行有规则的持续性爬网,则建议您设置一个爬网计划。也可以通过使用企业级搜索管理对象模型来执行此操作。
下面的过程描述如何:
设置控制台应用程序以使用企业级搜索管理对象模型。
使用 [WeeklySchedule] 类为内容源配置完全爬网计划。
使用 [DailySchedule] 类为内容源配置增量爬网计划。
设置应用程序以使用企业级搜索管理对象模型
在应用程序中设置对以下 DLL 的引用:
Microsoft.SharePoint.dll
Microsoft.Office.Server.dll
Microsoft.Office.Server.Search.dll
在控制台应用程序的类文件中,将下面的 using 语句添加到带有其他命名空间指令的代码的顶部旁边:
using Microsoft.SharePoint; using Microsoft.Office.Server.Search.Administration;
创建函数以将使用信息写入控制台窗口。
private static void Usage() { Console.WriteLine("Manage Content Source Crawl Status"); Console.WriteLine("Usage: ManageCrawlStatus.exe <ContentSource>"); Console.WriteLine("<ContentSourceName> - Specify the content source name."); }
在控制台应用程序的 Main() 函数中,添加代码以检查 args[] 参数中的项目数量;如果此数目小于 1,则意味着没有指定值来标识内容源,这将调用步骤 3 中定义的 Usage() 函数。
if (args.Length < 1 ) { Usage(); return; }
紧跟步骤 4 中的代码添加以下内容以检索 SSP 的搜索上下文。
/* Replace <SiteName> with the name of a site using the SSP */ string strURL = "http://<SiteName>"; SearchContext context; using (SPSite site = new SPSite(strURL)) { Context = SearchContext.GetContext(site); }
使用 DailySchedule 类创建爬网计划
创建一个 DailySchedule 类的实例。
DailySchedule daily = new DailySchedule(context);
若要指示开始对内容源进行爬网的时间以及爬网的频率,请配置 DailySchedule 属性。例如:
//Indicates the schedule starts on the 15th day of the month. daily.BeginDay = 15; //Indicates the schedule starts in January. daily.BeginMonth = 1; //Indicates that the schedule starts in 2007. daily.BeginYear = 2007; //The next two lines of code indicate that the schedule starts at 2:30 in the morning. daily.StartHour = 2; daily.StartMinute = 30; //Indicates that the content should be crawled every day. daily.DaysInterval = 1;
检索为 SSP 的搜索服务配置的内容源的集合。
Content sspContent = new Content(context); ContentSourceCollection sspContentSources = sspContent.ContentSources;
使用 WeeklySchedule 类创建爬网计划
创建一个 WeeklySchedule 类的实例。
WeeklySchedule weekly= new WeeklySchedule(context);
若要指示开始对内容源进行爬网的时间以及爬网的频率,请配置 WeeklySchedule 属性。例如:
//Indicates the schedule starts on the 1st day of the month. weekly.BeginDay = 1; //Indicates the schedule starts in January. weekly.BeginMonth = 1; //Indicates that the schedule starts in 2007. weekly.BeginYear = 2007; //The next two lines of code indicate that the schedule starts at 11:15 at night. weekly.StartHour = 23; weekly.StartMinute = 15; //Indicates that the content should be crawled every week. weekly.WeeksInterval = 1;
配置内容源以使用新计划
检索 args[0] 参数中指定的值,并验证 SSP 的内容源集合是否包含带有该名称的内容源。
string strContentSourceName = args[0]; if(sspContentSources.Exists(strContentSourceName) ) { <…> } else { Console.WriteLine("Content source does not exist."); }
检索带有指定名称的内容源,并将 FullCrawlSchedule 和 IncrementalCrawlSchedule 属性设置为新计划。
ContentSource cs = sspContentSources[strContentSourceName]; cs.IncrementalCrawlSchedule = daily; cs.FullCrawlSchedule = weekly; cs.Update();
示例
下面是本主题中描述的示例控制台应用程序的完整代码。
先决条件
- 确保已创建共享服务提供程序。
项目引用
运行此示例之前,在控制台应用程序代码项目中添加下面的项目引用:
Microsoft.SharePoint
Microsoft.Office.Server
Microsoft.Office.Server.Search
using System;
using System.Collections.Generic;
using System.Text;
using Microsoft.SharePoint;
using Microsoft.Office.Server.Search.Administration;
namespace ManageCrawlStatus
{
class Program
{
static void Main(string[] args)
{
try
{
if (args.Length < 1)
{
Usage();
return;
}
/*
Replace <SiteName> with the name of a site using the Shared Services Provider
*/
string strURL = "http://<SiteName>";
SearchContext context;
using (SPSite site = new SPSite(strURL))
{
Context = SearchContext.GetContext(site);
}
DailySchedule daily = new DailySchedule(context);
//Indicates the schedule starts on the 15th day of the month.
daily.BeginDay = 15;
//Indicates the schedule starts in January.
daily.BeginMonth = 1;
//Indicates that the schedule starts in 2007.
daily.BeginYear = 2007;
//The next two lines of code indicate that the schedule starts at 2:30 in the morning.
daily.StartHour = 2;
daily.StartMinute = 30;
//Indicates that the content should be crawled every day.
daily.DaysInterval = 1;
WeeklySchedule weekly = new WeeklySchedule(context);
//Indicates the schedule starts on the 1st day of the month.
weekly.BeginDay = 1;
//Indicates the schedule starts in January.
weekly.BeginMonth = 1;
//Indicates that the schedule starts in 2007.
weekly.BeginYear = 2007;
//The next two lines of code indicate that the schedule starts at 11:15 at night.
weekly.StartHour = 23;
weekly.StartMinute = 15;
//Indicates that the content should be crawled every week.
weekly.WeeksInterval = 1;
string strContentSourceName = args[0];
Content sspContent = new Content(context);
ContentSourceCollection sspContentSources = sspContent.ContentSources;
if (sspContentSources.Exists(strContentSourceName))
{
ContentSource cs = sspContentSources[strContentSourceName];
cs.IncrementalCrawlSchedule = daily;
cs.FullCrawlSchedule = weekly;
cs.Update();
}
else
{
Console.WriteLine("Content source does not exist.");
}
}
catch (Exception e)
{
e.ToString();
}
}
private static void Usage()
{
Console.WriteLine("Configure Crawl Schedule");
Console.WriteLine("Usage: ConfigureCrawlSchedule.exe <ContentSourceName>");
Console.WriteLine("<ContentSourceName> - Specify the content source name.");
}
}
}