Generate vector embeddings in Azure Database for PostgreSQL - Flexible Server with locally deployed LLM (Preview)
Prerequisites
An Azure Database for PostgreSQL Flexible Server instance running on a memory optimized VM SKU. Learn more about Azure memory optimized VMs here: Azure VM sizes - Memory - Azure Virtual Machines
You may want to enable the vector extension, as it provides functionality to store and efficiently index text embeddings in your database.
Enable the extension
Before you can enable azure_local_ai on your Azure Database for PostgreSQL flexible server instance, you need to allowlist the azure_local_ai
extension as described in allow an extension.
Important
Hosting language models in the database requires a large memory footprint. To support this requirement, azure_local_ai
is only supported on memory-optimized Azure VM SKUs with a minimum of 4 vCores.
If you are using a VM that does not meet the minimum requirements, the azure_local_ai
extension will not appear in the list of allowed values for the azure.extensions
server parameter.
Once the extension is allowlisted, you can follow the instructions provided in create extensions to install the extension in each database from where you want to use it.
Note
Enabling Azure Local AI will deploy the multilingual-e5-small model to your Azure Database for PostgreSQL Flexible Server instance. The linked documentation provides licensing terms from the e5 team. Additional third-party open-source models might become available for installation on an ongoing basis.
Installing the extension azure_local_ai
creates a schema called azure_local_ai
in which the extension stores tables, functions, and any other SQL-related objects it requires to implement and expose its functionality.
Important
You may want to enable the vector extension, as it is required to store text embeddings in your PostgreSQL database.
Functions provided by the extension
The azure_local_ai
extension provides a set of functions. These functions allow you to create vector embeddings from text data, making it easier to develop generative AI applications. The extension offers functions for creating embeddings, getting settings, and more. By using these functions, you can simplify the development process and reduce latency by eliminating the need for additional remote API calls to AI embedding models hosted outside of the PostgreSQL boundary.
Schema | Name | Result data type | Argument data types |
---|---|---|---|
azure_local_ai | create_embeddings | TABLE(embedding real[]) | model_uri text, inputs text[], batch_size bigint DEFAULT 128, timeout_ms integer DEFAULT 3600000 |
azure_local_ai | create_embeddings | real[] | model_uri text, input text, timeout_ms integer DEFAULT 3600000 |
azure_local_ai | get_setting | jsonb | keys text[] DEFAULT ARRAY[]::text[], timeout_ms integer DEFAULT 3600000 |
azure_local_ai | get_setting | text | key text, timeout_ms integer DEFAULT 3600000 |
azure_local_ai | model_metadata | jsonb | model_uri text |
These functions can be displayed using the following psql meta-command:
\df azure_local_ai.*
azure_local_ai.create_embeddings
The azure_local_ai
extension allows you to create and update embeddings both in scalar and batch format, invoking the locally deployed LLM.
azure_local_ai.create_embeddings(model_uri text, input text, batch_size bigint DEFAULT 128, timeout_ms integer DEFAULT 3600000);
azure_local_ai.create_embeddings(model_uri text, array[inputs [text]], batch_size bigint DEFAULT 128, timeout_ms integer DEFAULT 3600000);
Arguments
model_uri
text
name of the text embedding model invoked to create the embedding.
input
text
or text[]
single text or array of texts, depending on the overload of the function used, for which embeddings are created.
batch_size
bigint DEFAULT 128
number of records to process at a time (only available for the overload of the function for which parameter input
is of type text[]
).
timeout_ms
integer DEFAULT 3600000
timeout in milliseconds after which the operation is stopped.
azure_local_ai.get_setting
Used to obtain current values of configuration options.
SELECT azure_local_ai.get_setting(key TEXT)
azure_local_ai
supports reviewing the configuration parameters of ONNX Runtime thread-pool within the ONNX Runtime Service. Changes aren't allowed at this time. See ONNX Runtime performance tuning.
Arguments
Key
Valid values are:
intra_op_parallelism
: Sets total number of threads used for parallelizing single operator by ONNX Runtime thread-pool. By default, we maximize the number of intra ops threads as much as possible as it improves the overall throughput much (all available cpus by default).inter_op_parallelism
: Sets total number of threads used for computing multiple operators in parallel by ONNX Runtime thread-pool. By default, we set it to minimum possible thread, which is 1. Increasing it often hurts performance due to frequent context switches between threads.spin_control
: Switches ONNX Runtime thread-pool's spinning for requests. When disabled, it uses less cpu and hence causes more latency. By default, it's set to true (enabled).
Return type
TEXT
representing the current value of the selected setting.
Examples
Create embeddings from existing texts
Following is an example that can be used in your own environment to test embedding generation with the locally deployed multilingual-e5 model.
-- Create documents table
CREATE TABLE documents(doc_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, doc_contents TEXT NOT NULL, last_update TIMESTAMPTZ DEFAULT now());
--Insert data into the docs table
INSERT INTO documents(doc_contents) VALUES
('Create in-database embeddings with azure_local_ai extension.'),
('Enable RAG patterns with in-database embeddings and vectors on Azure Database for PostgreSQL - Flexible server.'),
('Generate vector embeddings in PostgreSQL with azure_local_ai extension.'),
('Generate text embeddings in PostgreSQL for retrieval augmented generation (RAG) patterns with azure_local_ai extension and locally deployed LLM.'),
('Use vector indexes and Azure OpenAI embeddings in PostgreSQL for retrieval augmented generation.');
-- Add a vector column and generate vector embeddings from locally deployed model
ALTER TABLE documents
ADD COLUMN doc_vector vector(384) -- multilingual-e5 embeddings are 384 dimensions
GENERATED ALWAYS AS (azure_local_ai.create_embeddings('multilingual-e5-small:v1', doc_contents)::vector) STORED; -- TEXT string sent to local model
--View floating point entries in the doc_vector column
SELECT doc_vector FROM documents;
-- Add a single record to the documents table and the vector embedding using azure_local_ai and locally deployed model will be automatically generated
INSERT INTO documents(doc_contents) VALUES
('Semantic Search with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI');
--View all document entries, their contents, embeddings and last time the row was updated
SELECT doc_contents, doc_vector, last_update FROM documents;
-- The following command leverages the overload of azure_local_ai.create_embeddings function which accepts and array of TEXT
-- and produces a table for which each row contains the embedding of one element in the input array
SELECT azure_local_ai.create_embeddings('multilingual-e5-small:v1', array['Recommendation System with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI.', 'Generative AI with Azure Database for PostgreSQL - Flexible Server.']);
Generate embeddings upon insertion of new text
Following is an example that can be used in your own environment to test embedding generation with the locally deployed multilingual-e5 model.
-- Create documents table
CREATE TABLE documents(doc_id INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, doc_contents TEXT NOT NULL, last_update TIMESTAMPTZ DEFAULT now(), doc_vector vector(384) GENERATED ALWAYS AS (azure_local_ai.create_embeddings('multilingual-e5-small:v1', doc_contents)::vector) STORED);
-- Insert data into the documents table
INSERT INTO documents(doc_contents) VALUES
('Create in-database embeddings with azure_local_ai extension.'),
('Enable RAG patterns with in-database embeddings and vectors on Azure Database for PostgreSQL - Flexible server.'),
('Generate vector embeddings in PostgreSQL with azure_local_ai extension.'),
('Generate text embeddings in PostgreSQL for retrieval augmented generation (RAG) patterns with azure_local_ai extension and locally deployed LLM.'),
('Use vector indexes and Azure OpenAI embeddings in PostgreSQL for retrieval augmented generation.');
-- Query embedding text, list results by descending similarity score
WITH all_documents AS (
SELECT doc_id, doc_contents, doc_vector FROM documents
),
target_documents AS (
SELECT azure_local_ai.create_embeddings('multilingual-e5-small:v1', 'Generate text embeddings in PostgreSQL.') doc_vector
)
SELECT all_documents.doc_id, all_docs.doc_contents , 1 - (all_documents.doc_vector::vector <=> target_documents.doc_vector::vector) AS similarity
FROM target_documents, all_documents
ORDER BY similarity DESC
LIMIT 2;
Related content
- Integrate Azure Database for PostgreSQL - Flexible Server with Azure Cognitive Services.
- Integrate Azure Database for PostgreSQL with Azure Machine Learning Services.
- Generate vector embeddings with Azure OpenAI in Azure Database for PostgreSQL - Flexible Server.
- Azure AI extension in Azure Database for PostgreSQL - Flexible Server.
- Generative AI with Azure Database for PostgreSQL - Flexible Server.
- Recommendation System with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI.
- Semantic Search with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI.
- Enable and use pgvector in Azure Database for PostgreSQL - Flexible Server.