Share via


Overview of Azure Search

Overview of Azure Search

Azure provides a search capability as a service. This service enables the developers and business team to build the search based application. As a part of search service, Azure provides search capability such as type-ahead suggestions, hit highlighting, faceted navigation and supports the linguistic features that are appropriate to specified language.

Azure Search as Service

Azure Search based on Elastic Search. Azure Search service is an API based service.  Azure Search stores data in an index that can be searched through full text queries. The schema of these indexes can either be created in the Azure Portal, or programmatically using the client libraries or REST APIs.

Azure Search service can quickly setup by administrator through azure management portal and it can be easily scale up and down through the portal. Business team and developers can focus on the business goal and quickly connect to azure search with great control over the search ranking. Azure search service simplified the natural language processing and it supports many languages.

Developers and administrator can consume the service for building search driven applications and managing the azure search service. Azure simplified the complexity of the search system like managing the search service, scaling and fine tuning the search results. Azure search is available in 8/17 regions. Administration of the search either can by Azure management portal or through the REST API.

Search Application Scenarios

Search driven applications are quite popular now and it can solve the business problem especially in online retail and e-commerce applications. If an application contains a huge amount of data, it’s a good design approach to use search and expose the data through search service rather than querying an application. Example: Line of business application which contains huge amount of data.

Azure Search service is secured by two type of keys:-Query keys – Multiple query keys can be generated for querying the service. This can be shared with multiple application development team. Administration keys – Two keys can be generated and this is mainly for administration purpose.

 Azure Search Index and Supported data types

Azure Search Index consists of fields and documents. In developer terminology, fields is type of column and document is a type of row in table. Azure search queries executed against these index. Fields can be searchable and facetable. It means that developers can have full control over the schema and they can define which field can exposed to end users. For example, Guid value field doesn’t required to be indexed because there is no sense to show the guid field to end users. Search service can have multiple indexes and it has limits based on the pricing.. 

Supported data types in Azure Search for indexing

Azure Search supports the following data types to be created and it can be indexed in search

S.No

Data Type

S.No

Data Type

1.

String

4.

Date time

2.

Double

5.

Integer

3.

Collection

6.

geography Point

 

Pull Model

Developers can index the data either through push or pull model. The pull model is provided through indexers that can be configured for a demand or scheduled updates. It allows developers to easily ingest data and data changes from an Azure DocumentDB, Azure SQL Database, or SQL Server hosted in an Azure VM.

Push Model

The other model is push model where developers can write a scheduler (Azure scheduler or Azure Worker role) or simple console application to feed the data to index through the service API (REST API or .NET Client API).  Even though this model has a little amount of work, but it is worth and developer can have the full control of the data feeding process. In this model, developers can avoid the extra load on the crawlers by using the batch process i.e. developers can feed 1000 documents per batch. Each batch can contain of 1000 documents per batch.  And also developers can avoid the latency that comes with schedule indexing. For example, on black Friday, developer can update the search to reflect the inventory without waiting for the scheduler to update. Push model can give you that degree of precision.

Type of operation during Indexing

The operation can be of upload, Merge, delete or mergeOrUpload. When developers can feed for first time, can use the upload operation type. There are scenarios where developers needs to update the data in search index that time developers can use the merge operation. MergeOrUpload is similar to update if there is a document in index otherwise create a document in index. Once the data is indexed, the documents will be available for search in few seconds.

Search data in Azure Search through service

Azure search service allows the developer to search the data in the search index through the service API It allows the developers to call a service with different options. The following table describes the frequently used parameter in search driven application.

Parameter

Description

api-version=[string]

The api-version parameter is required. The current version is api-version=2015-02-28. See Azure Search Service Versioning for details and alternative versions.

search=[string]

Optional. The text to search for.

searchMode=any|all

Optional. Defaults to any. Specifies whether any or all of the search terms must be matched in order to count the document as a match.

searchFields=[string]

Optional. The list of comma-separated field names to search for the specified text. Target fields must be marked as searchable.

$count=true|false

Optional. Defaults to false. Specifies whether to fetch the total count of results.

$orderby=[string]

Optional. A list of comma-separated expressions to sort the results by.

$select=[string]

Optional. A list of comma-separated fields to retrieve. If unspecified, all fields marked as retrievable in the schema are included.

facet=[string]

Zero or more fields to facet by.

$filter=[string]

Optional. A structured search expression in standard OData syntax.

highlight=[string]

Optional. A set of comma-separated field names used for hit highlights. Only searchable fields can be used for hit highlighting.

Developers can refer the complete list in documentation.

Type-ahead suggestions.

The Type-ahead suggestions helps the users to find the suggestion from the search system based on the partial search keyword.  Azure search supports both infix (matching any part of the content) and fuzzy matching suggestions. Fuzzy suggestion enables more flexibility for spelling mistakes. Developers can retrieve maximum of upto 100 suggestions per keyword.  While querying a suggestion from search index, developers can use the following query parameters

Parameter

Description

search=[string]

The search text to use to suggest queries. Must be at least 3 characters, and no more than 25 characters.

suggesterName=[string]

The name of the suggester as specified in the suggesters collection that's part of the index definition. A suggester determines which fields are scanned for suggested query terms.

fuzzy=[boolean]

Optional. Default = false. When set to true, this API will find suggestions even if there is a substituted or missing character in the search text. While this provides a better experience in some scenarios, it comes at a performance cost as fuzzy suggestion searches are slower and consume more resources.

searchFields=[string]

Optional. The list of comma-separated field names to search for the specified search text. Target fields must be enabled for suggestions.

$top=#

Optional. Default = 5. The number of suggestions to retrieve. The value must be a number between 1 and 100.

$filter=[string]

Optional. An expression that filters the documents considered for suggestions.

$orderby=[string]

Optional. A list of comma-separated expressions to sort the results by. function.

$select=[string]

Optional. A list of comma-separated fields to retrieve. If unspecified, only the document key and suggestion text is returned.

api-version=[string]

The api-version parameter is required. The current version is api-version=2015-02-28.

 

Capacity and scale features

Azure Search service can scale up and down through the Azure portal management. Each azure service have minimum of one replica and one partition. By default one replica and one partition supports 15 queries per second and 15 millions of documents.

One Search Unit = Replicas * Partition

 Replicas supports high availability whereas partition supports for high volume index data.Each replica has one copy of an index, so adding one more replica translates to one more index that can be used to service query requests. Currently, the rule of thumb is that you need at least 3 replicas for high availability.

Most service applications have a built-in need for more replicas rather than partitions, as most applications that utilize search can fit easily into a single partition that can support up to 15 million documents.

Ranking and Tuning using score profiles

Each item in search results has a score. This score is an indicator of an item relevance in the context of current search operation. The higher the score, the more relevant item. In search results, items are ordered from high to low based on the search scores. Azure Search use the default scoring for computing the score. Developers can customize the scoring based on the business needs.

By default, a search score is computed based on statistical properties of the data and the query. Azure search finds the document based on the TF-IDF approach (Term frequency-Inverse document frequency). Search score values can be repeated throughout a result set. For example, if there are items with same score, the ordering of the same scored items are defined. Given two items with an identical score, there is no guarantee which one appears first

Custom Scoring

Developers can create custom scoring when the default ranking behavior doesn’t meet the business requirements. For example, there might be scenario, where customer wants to promote certain products during festive or holiday season.  The example of the custom scoring schema defined in the following documentation.