ai_similarity
function
Applies to: Databricks SQL Databricks Runtime
Important
This feature is in Public Preview.
In the preview:
- The underlying language model can handle several languages, however these functions are tuned for English.
- There is rate limiting for the underlying Foundation Model APIs. See Foundation Model APIs limits to update these limits.
- Due to rate limiting, this function is designed for testing on small datasets that have less than 100 rows. For use cases with over 100 rows of data, Databricks recommends using
ai_query
and a provisioned throughput endpoint. See Perform batch LLM inference using ai_query.
The ai_similarity()
function invokes a state-of-the-art generative AI model from Databricks Foundation Model APIs to compare two strings and computes the semantic similarity score using SQL.
Requirements
Important
The underlying models that might be used at this time are licensed under the Apache 2.0 License, Copyright © The Apache Software Foundation or the LLAMA 3.1 Community License Copyright © Meta Platforms, Inc. All rights reserved. Customers are responsible for ensuring compliance with applicable model licenses.
Databricks recommends reviewing these licenses to ensure compliance with any applicable terms. If models emerge in the future that perform better according to Databricks’s internal benchmarks, Databricks might change the model (and the list of applicable licenses provided on this page).
Currently, GTE Large (English) is the underlying model that powers this AI function.
- This function is only available on workspaces in AI Functions using Foundation Model APIs supported regions.
- This function is not available on Azure Databricks SQL Classic.
- Check the Databricks SQL pricing page.
Note
In Databricks Runtime 15.1 and above, this function is supported in Databricks notebooks, including notebooks that are run as a task in a Databricks workflow.
Syntax
ai_similarity(expr1, expr2)
Arguments
expr1
: ASTRING
expression.expr2
: ASTRING
expression.
Returns
A FLOAT
value, representing the semantic similarity between the two input strings. The output score is relative and should only be used for ranking. Score of 1 means the two text are equal.
Examples
> SELECT ai_similarity('Apache Spark', 'Apache Spark');
1.0
> SELECT
company_name
FROM
customers
ORDER BY ai_similarity(company_name, 'Databricks') DESC
LIMIT 10;
Databricks Inc.