SharePoint 2010: How to Bypass the Firewall, TMG and Proxy During a Content Crawl in SharePoint Publishing Farm

아티클
1/17/2024

This topic is with DMZ and SharePoint Server.

For my client I had to configure the Search Service Application for crawling data from the Internal Domain to the DMZ zone. But as explained on my previous posts; you cannot get directly in a DMZ zone because you have to bypass the Firewall, proxy, TMG …

We have external contributors who are filling data such as Word Documents, Excel sheets and PDF files in to the DMZ zone and internal users should be able to search on that data. So this is more Infrastructure work than SharePoint.

A Microsoft example should be: A partner is writing a custom document and other partner has to read this document by passing “Microsoft Portal”.

http://gokanx.files.wordpress.com/2013/06/azer.png?w=480&h=392

First step is to create a Search Service Application and configure it: (http://technet.microsoft.com/en-us/library/gg502597.aspx )

To create a Search service application

Verify that the user account that is performing this procedure is a member of the Farm Administrators group for the farm for which you want to create the service application.
On the Central Administration home page, in the Application Management section, click Manage service applications.
On the Manage Service Applications page, on the ribbon, click New, and then click Search Service Application.
On the Create New Search Service Application page, do the following:
1. Accept the default value for Service Application name, or type a new name for the Search service application.
2. In the Search Service Account list, select the managed account that you registered in the previous procedure to run the Search service.
3. In the Application Pool for Search Admin Web Service section, do the following:
  1. Select the Create new application pool option, and then specify a name for the application pool in the Application pool name text box.
  2. In the Select a security account for this application pool section, select the Configurable option, and then from the list select the account that you registered to run the application pool for the Search Admin Web Service.
4. In the Application Pool for Search Query and Site Settings Web Service section, do the following:
  1. Choose the Create new application pool option, and then specify a name for the application pool in the Application pool name text box.
  2. In the Select a security account for this application pool section, select the Configurable option, and then from the list select the account that you registered to run the application pool for the Search Query and Site Settings Web Service.

Four our Extranet Site we have to create a Content Source and begin a Full Crawl for the first time.

What is a Content Source? A content source is a set of options that you can use to specify what type of content is crawled, what URLs to crawl, and how deep and when to crawl. You must create at least one content source before a crawl can occur. After you create a content source, you can edit or delete it at any time.

What is a Full Crawl? When you perform a full crawl, all content specified by the content source is crawled even if the content already exists in the index. To perform a full crawl, you must crawl the content defined in a particular content source individually. Note that using the Start all crawls link on the Manage Content Sources page results in all content sources being crawled using an incremental crawl, unless the system detects that a full crawl is required.

But I received the following error:

http://gokanx.files.wordpress.com/2013/06/azer12.png?w=480&h=159

“This item could not be crawled because the repository did not respond within the specified timeout period. Try to crawl the repository at a later time, or increase the timeout value on the Proxy and Timeout page in search administration. You might also want to crawl this repository during off-peak usage times.”

Investigation

So the first investigation should be to check if you can login to the page via your Brower or with Telnet

Open a Command Prompt and hit Telnet gokanx.wordpress.extranet:80. I received a “connection failed” on my black box. This was meaning that I couldn’t reach my site from my internal server.

http://gokanx.files.wordpress.com/2013/06/asx1.png?w=480&h=73

What is Telnet? Telnet is a network protocol used on the Internet or local area networks to provide a bidirectional interactive text-oriented communication facility using a virtual terminal connection. User data is interspersed in-band with Telnet control information in an 8-bit byte oriented data connection over the Transmission Control Protocol (TCP).

** **

What is a Browser? A web browser (commonly referred to as a browser) is a software application for retrieving, presenting and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier (URI) and may be a web page, image, video or other piece of content. Hyperlinks present in resources enable users easily to navigate their browsers to related resources.

** http://gokanx.files.wordpress.com/2013/06/azer3.png?w=133&h=91**

** **

Direction the Security team and ask kindly if they can make an expectation or rule (name it as you want) on the F5/Firewall for having traffic between my Internal Server and DMZ zone. After a while I got an answer that everything was done, so I could restart my Full Crawl.

One minute later, I got another problem:

“An unrecognized HTTP response was received when attempting to crawl this item. HTTP Status 407”.

** http://gokanx.files.wordpress.com/2013/06/azer4.png?w=480**

Many contributors on the internet were quite sure that the LAN settings on IE should resolve my error but it didn’t

http://gokanx.files.wordpress.com/2013/06/azer5.png?w=480

Go into your browser on the indexer and go to Tools–> Internet Options–> Connections–> Lan Settings

Check the box in Proxy server and put in your proxy server. Check the box for Bypass proxy server for local addresses. Click advanced and put in *.yourdomain.com. (It can maybe help your problem)

What is a LAN? A local area network (LAN) is a computer network that interconnects computers in a limited area such as a home, school, computer laboratory, or office building using network media. The defining characteristics of LANs, in contrast to wide area networks (WANs), include their usually higher data-transfer rates, smaller geographic area, and lack of a need for leased telecommunication lines.

http://gokanx.files.wordpress.com/2013/06/azer6.png?w=480

What is a Proxy? Proxy server is a server (a computer system or an application) that acts as an intermediary for requests from clients seeking resources from other servers. A client connects to the proxy server, requesting some service, such as a file, connection, web page, or other resource available from a different server and the proxy server evaluates the request as a way to simplify and control its complexity. Today, most proxies are web proxies, facilitating access to content on the World Wide Web.

http://gokanx.files.wordpress.com/2013/06/azer7.png?w=480

The error was for me clear enough. It talks about Proxy and access denied which means that I had to go to the Proxy them and ask “is there a problem?”

Actually yes, there was a problem; nothing was “whitelisted” on the proxy about *.extranet extensions.

Great; finally where there another Full Crawl and YEEEE… NO! Another error:

“Access Denied. Verify that either the default Content Access Account has access to this repository. If the repository being crawled is a SharePoint repository, verify that the account you are using has Full Read on the Web Application” HTTP status 401

http://gokanx.files.wordpress.com/2013/06/azer8.png?w=480

Normal (!) Because I’m trying to crawl with a user who has no access on the DMZ Zone (webapp).

Therefore connect to the Central Administration on the SharePoint Platform in INTRANET zone and on the Web Application give the Search Managed Account Full Read Permissions.

http://gokanx.files.wordpress.com/2013/06/azer9.png?w=480&h=343

** **

Ah la la!!! Finally my content could be crawled parsing TMG, PROXY and FIREWALL.

**http://gokanx.files.wordpress.com/2013/06/azer10.png?w=480 **

다음을 통해 공유

SharePoint 2010: How to Bypass the Firewall, TMG and Proxy During a Content Crawl in SharePoint Publishing Farm

Investigation

See Also

추가 리소스