Muokkaa

Jaa


Recommendation System with Azure Database for PostgreSQL - Flexible Server and Azure OpenAI

APPLIES TO: Azure Database for PostgreSQL - Flexible Server

This hands-on tutorial shows you how to build a recommender application using Azure Database for PostgreSQL flexible server and Azure OpenAI Service. Recommendations have applications in different domains – service providers frequently tend to provide recommendations for products and services they offer based on prior history and contextual information collected from the customer and environment.

There are different ways to model recommendation systems. This article explores the simplest form – recommendation based one product corresponding to, say, a prior purchase. This tutorial uses the recipe dataset used in the Semantic Search article and the recommendation is for recipes based on a recipe a customer liked or searched for before.

Prerequisites

  1. Create an OpenAI account and request access to Azure OpenAI Service.
  2. Grant access to Azure OpenAI in the desired subscription.
  3. Grant permissions to create Azure OpenAI resources and to deploy models.

Create and deploy an Azure OpenAI Service resource and a model, deploy the embeddings model text-embedding-ada-002. Copy the deployment name as it is needed to create embeddings.

Enable the azure_ai and pgvector extensions

Before you can enable azure_ai and pgvector on your Azure Database for PostgreSQL flexible server instance, you need to add them to your allowlist as described in how to use PostgreSQL extensions and check if correctly added by running SHOW azure.extensions;.

Then you can install the extension, by connecting to your target database and running the CREATE EXTENSION command. You need to repeat the command separately for every database you want the extension to be available in.

CREATE EXTENSION azure_ai;
CREATE EXTENSION vector;

Configure OpenAI endpoint and key

In the Azure AI services under Resource Management > Keys and Endpoints you can find the endpoint and the keys for your Azure AI resource. Use the endpoint and one of the keys to enable azure_ai extension to invoke the model deployment.

select azure_ai.set_setting('azure_openai.endpoint','https://<endpoint>.openai.azure.com');
select azure_ai.set_setting('azure_openai.subscription_key', '<API Key>');

Download & Import the Data

  1. Download the data from Kaggle.
  2. Connect to your server and create a test database, and in it create a table in which you'll import the data.
  3. Import the data.
  4. Add an embedding column to the table.
  5. Generate the embeddings.
  6. Search.

Create the table

CREATE TABLE public.recipes(
    rid integer NOT NULL,
    recipe_name text,
    prep_time text,
    cook_time text,
    total_time text,
    servings integer,
    yield text,
    ingredients text,
    directions text,
    rating real,
    url text,
    cuisine_path text,
    nutrition text,
    timing text,
    img_src text,
    PRIMARY KEY (rid)
);

Import the data

Set the following environment variable on the client window, to set encoding to utf-8. This step is necessary because this particular dataset uses the WIN1252 encoding.

Rem on Windows
Set PGCLIENTENCODING=utf-8;
# on Unix based operating systems
export PGCLIENTENCODING=utf-8

Import the data into the table created; note that this dataset contains a header row:

psql -d <database> -h <host> -U <user> -c "\copy recipes FROM <local recipe data file> DELIMITER ',' CSV HEADER"

Add a column to store the embeddings

ALTER TABLE recipes ADD COLUMN embedding vector(1536);

Generate embeddings

Generate embeddings for your data using the azure_ai extension. In the following, we vectorize a few different fields, concatenated:

WITH ro AS (
    SELECT ro.rid
    FROM
        recipes ro
    WHERE
        ro.embedding is null
        LIMIT 500
)
UPDATE
    recipes r
SET
    embedding = azure_openai.create_embeddings('text-embedding-ada-002', r.recipe_name||' '||r.cuisine_path||' '||r.ingredients||' '||r.nutrition||' '||r.directions)
FROM
    ro
WHERE
    r.rid = ro.rid;

Repeat the command, until there are no more rows to process.

Tip

Play around with the LIMIT. With a high value, the statement might fail halfway through due to throttling imposed by Azure OpenAI. If it fails, wait for at least one minute and execute the command again.

Create a search function in your database for convenience:

create function
    recommend_recipe(sampleRecipeId int, numResults int)
returns table(
            out_recipeName text,
            out_nutrition text,
            out_similarityScore real)
as $$
declare
    queryEmbedding vector(1536);
    sampleRecipeText text;
begin
    sampleRecipeText := (select
                            recipe_name||' '||cuisine_path||' '||ingredients||' '||nutrition||' '||directions
                        from
                            recipes where rid = sampleRecipeId);

    queryEmbedding := (azure_openai.create_embeddings('text-embedding-ada-002',sampleRecipeText));

    return query
    select
        distinct r.recipe_name,
        r.nutrition,
        (r.embedding <=> queryEmbedding)::real as score
    from
        recipes r
    order by score asc limit numResults; -- cosine distance
end $$
language plpgsql;

Now just invoke the function to search for the recommendation:

select out_recipename, out_similarityscore from recommend_recipe(1, 20); -- search for 20 recipe recommendations that closest to recipeId 1

And explore the results:

            out_recipename             | out_similarityscore
---------------------------------------+---------------------
 Apple Pie by Grandma Ople             |                   0
 Easy Apple Pie                        |          0.05137232
 Grandma's Iron Skillet Apple Pie      |         0.054287136
 Old Fashioned Apple Pie               |         0.058492836
 Apple Hand Pies                       |          0.06449003
 Apple Crumb Pie                       |          0.07290977
 Old-Fashioned Apple Dumplings         |         0.078374185
 Fried Apple Pies                      |          0.07918481
 Apple Pie Filling                     |         0.084320426
 Apple Turnovers                       |          0.08576391
 Dutch Apple Pie with Oatmeal Streusel |          0.08779895
 Apple Crisp - Perfect and Easy        |          0.09170883
 Delicious Cinnamon Baked Apples       |          0.09384012
 Easy Apple Crisp with Pie Filling     |          0.09477234
 Jump Rope Pie                         |          0.09503954
 Easy Apple Strudel                    |         0.095167875
 Apricot Pie                           |          0.09634114
 Easy Apple Crisp with Oat Topping     |          0.09708358
 Baked Apples                          |          0.09826993
 Pear Pie                              |         0.099974394
(20 rows)

Share your suggestions and bugs with the Azure Database for PostgreSQL product team.