Crawling the website with noindex nofollow META tag will not provide any search results

Article
02/10/2015

You can use a specific <META> tag to tell search engine robots to not index the content of a specific webpage.You might have observed that crawling a site "https://xyz.com" with noindex nofollow META tag using FAST ESP 5.3 enterprise crawler will not provide any search results.

robots META tags can be embedded within any HTML web page, within the "head" section. For a META tag of name "robots", the content value will indicate the actions to take, or to avoid. While a page without such tags will be parsed to find new URIs before being indexed, the possible settings can prevent either or both of these actions by a crawler.

In the following example, the page is being effectively blocked from further processing by any crawler that downloads it.

Example:

<html>

<head>

</head>

<html>

 Robots META Tags Settings

In the page, if you have this setting : <meta name="robots" content="noindex,nofollow" /> the page will not be crawled. You can also cross verify the fetch logs to see if we are getting the noindex, nofollow in it.

You can bypass this settings in ESP by updating Check meta robots value to No. Cross verify that check_meta_robots option is successfully updated to “no” in the config file.

Restart the crawler service and crawl the website again.

Published by - Prasad Joshi

Partager via

Crawling the website with noindex nofollow META tag will not provide any search results

Ressources supplémentaires