I think you should consider the following points:
- Sitemaps
- Robot.txt
- Website Structure and Navigation
- Internal Linking
- Meta tags
Check this old thread :
- Create a Cognitive Search service in the Azure portal if you haven't already done so.
- Create a search index and define the schema for the data you want to index.
- Create a data source for the external website you want to index. You can use the "Web" data source type and specify the URL of the website.
- Create a skillset that defines the custom skill you want to use to extract data from the website. You can use an open-source library like BeautifulSoup or Scrapy to extract data from the HTML of the website.
- Add the custom skill to the skillset and configure it to extract the data you want to index.
- Create an indexer that uses the data source, skillset, and search index you created earlier. The indexer will automatically extract data from the website and add it to your search index.
Here are some resources that can help you get started:
- Indexers in Azure Cognitive Search
- Pulling data into an index
- Example: Create a custom skill using the Bing Entity Search API
- Add a custom skill to an Azure Cognitive Search enrichment pipeline
Another way to achieve this is by using Azure Cognitive Search + Azure Functions + Azure Blob Storage. Please see more details in this similar Stack Overflow thread for more details.