Creating search index on website only has one page

VK 0 Reputation points
2023-12-07T02:14:40.9933333+00:00

We use AI search to index our website <website removed> - when creating index only home page gets indexed?

How do we make it crawl entire website?

Thanks.

Azure AI Search
Azure AI Search
An Azure search service with built-in artificial intelligence capabilities that enrich information to help identify and explore relevant content at scale.
1,166 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Amira Bedhiafi 28,066 Reputation points
    2023-12-09T17:38:01.49+00:00

    I think you should consider the following points:

    • Sitemaps
    • Robot.txt
    • Website Structure and Navigation
    • Internal Linking
    • Meta tags

    Check this old thread :

    https://learn.microsoft.com/en-us/answers/questions/1294445/azure-cognitive-search-indexing-and-external-websi

    1. Create a Cognitive Search service in the Azure portal if you haven't already done so.
    2. Create a search index and define the schema for the data you want to index.
    3. Create a data source for the external website you want to index. You can use the "Web" data source type and specify the URL of the website.
    4. Create a skillset that defines the custom skill you want to use to extract data from the website. You can use an open-source library like BeautifulSoup or Scrapy to extract data from the HTML of the website.
    5. Add the custom skill to the skillset and configure it to extract the data you want to index.
    6. Create an indexer that uses the data source, skillset, and search index you created earlier. The indexer will automatically extract data from the website and add it to your search index.

    Here are some resources that can help you get started:

    Another way to achieve this is by using Azure Cognitive Search + Azure Functions + Azure Blob Storage. Please see more details in this similar Stack Overflow thread for more details.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.