Partilhar via


Differences Between MOSS Content Sources

I have been getting this question few times now with some of my MOSS Search customers. I have been asked what is the difference in setting up a "SharePoint Sites" MOSS Content Source versus a "Web Sites" MOSS Content Source.

Here I have tried to add some of my thoughts on these two:

  • Both these Content Sources allow you to have Name to describe the Content Source, so that you know the name for tracking and other Search related tasks

  • They both allow you to have Start Addresses for crawling this Content Source. In the Web Sites Content Course, this can include any content, from a single web page to a whole entire web site. For a SharePoint Sites Content Source, this can include Office SharePoint Server sites and WSS Sites.

  • For setting the Crawl Settings, this is where the difference between the two come about. In a Web Sites Content Source, you can specify that you only want to crawl the server of which you entered the Start Address above, or only crawl the first page of the start address above, or .... (which is my favorite) ... have a custom crawl settings set up. Here you can specify Server Hops and Page Depths. These two options are not available in a SharePoint Sites Content Source.

  • Page Depths are the number of links to follow on the same hostname. So for your SharePoint Sites Content Source, if you have a Page Depth of 1, the crawler will follow links from the home page and then stop.

  • Server Hops are the number of host name changes that the crawler will make. For example, if you have a Server hop of 1, a link on your site will be followed to any other host name, but it will not be followed to another server hop.

  • One additional difference is that the SharePoint content source allows users to crawl a single WSS site collection, which is not possible in a Web Sites content source. Meaning, if you want to crawl only a site collection, you have to put its URL in a SharePoint content source like https://myserver.com/sites/mikesitecollection and select the radio button to “Crawl only the SharePoint site of each start address”. If you put the same start address in a Web Sites content source, it will go all the way to the top ([https://myserver.com](blocked::https://myserver.com/ "blocked::https://myserver.com/

    https://myserver.com/")) and start crawling because that is the default for all SharePoint content.

  • Also the Web Sites Content Source can figure out that the starting address is a SharePoint site from the response header during the crawl, and then switch protocol handlers for crawling.

  • Of course, both content sources allow Full and Incremental Crawls and they both allow you to create schedules.

Hope that helps.

Thanks
Mike

Comments

  • Anonymous
    May 31, 2007
    Hello Mike, The above information is really helpful. But I am finding a little different issue than this, and not being able to find any answer anywhere. I am trying to integrate MELL with MOSS so that on moss site I want to search the content of MELL. Can you help me on this ? Thanks, Kalyan

  • Anonymous
    May 31, 2007
    Hello Mike, The above information is really helpful. But I am finding a little different issue than this, and not being able to find any answer anywhere. I am trying to integrate MELL with MOSS so that on moss site I want to search the content of MELL. Can you help me on this ? My email ID is kalyan_guin@rediffmail.com Thanks, Kalyan

  • Anonymous
    August 08, 2007
    The comment has been removed

  • Anonymous
    August 11, 2007
    The comment has been removed

  • Anonymous
    August 13, 2007
    Blog compare different models Cannon and Nikon cameras

  • Anonymous
    August 22, 2007
    The only difference <a href="http://nikkor-lens.blogspot.com/">Nikkor Lens</a> that everyone knew that there had been no separation and these were no <a href="http://nikkor-lens.blogspot.com/">Nikkor Lens</a>

  • Anonymous
    August 22, 2007
    The comment has been removed

  • Anonymous
    August 27, 2007
    The comment has been removed

  • Anonymous
    August 28, 2007
    The comment has been removed

  • Anonymous
    August 29, 2007
    The comment has been removed

  • Anonymous
    August 29, 2007
    The comment has been removed

  • Anonymous
    September 12, 2007
    The comment has been removed

  • Anonymous
    September 13, 2007
    The comment has been removed

  • Anonymous
    November 22, 2007
    The comment has been removed

  • Anonymous
    November 28, 2007
    The comment has been removed

  • Anonymous
    December 27, 2007
    Hi. Pictures are not show :( http://nikon-vs-canon-battle.blogspot.com/

  • Anonymous
    January 01, 2008
    The comment has been removed

  • Anonymous
    January 15, 2008
    Hi. This is blog about http://madrid-hotel-reviews.blogspot.com/