Python tutorial: Deploy a model to categorize customers with SQL machine learning

Artikel
04/17/2023

Applies to: SQL Server 2017 (14.x) and later Azure SQL Managed Instance

In part four of this four-part tutorial series, you'll deploy a clustering model, developed in Python, into a database using SQL Server Machine Learning Services or on Big Data Clusters.

In part four of this four-part tutorial series, you'll deploy a clustering model, developed in Python, into a database using SQL Server Machine Learning Services.

In part four of this four-part tutorial series, you'll deploy a clustering model, developed in Python, into a database using Azure SQL Managed Instance Machine Learning Services.

In order to perform clustering on a regular basis, as new customers are registering, you need to be able call the Python script from any App. To do that, you can deploy the Python script in a database by putting the Python script inside a SQL stored procedure. Because your model executes in the database, it can easily be trained against data stored in the database.

In this section, you'll move the Python code you just wrote onto the server and deploy clustering.

In this article, you'll learn how to:

Create a stored procedure that generates the model
Perform clustering on the server
Use the clustering information

In part one, you installed the prerequisites and restored the sample database.

In part two, you learned how to prepare the data from a database to perform clustering.

In part three, you learned how to create and train a K-Means clustering model in Python.

Prerequisites

Part four of this tutorial series assumes you have fulfilled the prerequisites of part one, and completed the steps in part two and part three.

Create a stored procedure that generates the model

Run the following T-SQL script to create the stored procedure. The procedure recreates the steps you developed in parts one and two of this tutorial series:

classify customers based on their purchase and return history
generate four clusters of customers using a K-Means algorithm

USE [tpcxbb_1gb]
GO

DROP procedure IF EXISTS [dbo].[py_generate_customer_return_clusters];
GO

CREATE procedure [dbo].[py_generate_customer_return_clusters]
AS

BEGIN
    DECLARE

-- Input query to generate the purchase history & return metrics
     @input_query NVARCHAR(MAX) = N'
SELECT
  ss_customer_sk AS customer,
  CAST( (ROUND(COALESCE(returns_count / NULLIF(1.0*orders_count, 0), 0), 7) ) AS FLOAT) AS orderRatio,
  CAST( (ROUND(COALESCE(returns_items / NULLIF(1.0*orders_items, 0), 0), 7) ) AS FLOAT) AS itemsRatio,
  CAST( (ROUND(COALESCE(returns_money / NULLIF(1.0*orders_money, 0), 0), 7) ) AS FLOAT) AS monetaryRatio,
  CAST( (COALESCE(returns_count, 0)) AS FLOAT) AS frequency
FROM
  (
    SELECT
      ss_customer_sk,
      -- return order ratio
      COUNT(distinct(ss_ticket_number)) AS orders_count,
      -- return ss_item_sk ratio
      COUNT(ss_item_sk) AS orders_items,
      -- return monetary amount ratio
      SUM( ss_net_paid ) AS orders_money
    FROM store_sales s
    GROUP BY ss_customer_sk
  ) orders
  LEFT OUTER JOIN
  (
    SELECT
      sr_customer_sk,
      -- return order ratio
      count(distinct(sr_ticket_number)) as returns_count,
      -- return ss_item_sk ratio
      COUNT(sr_item_sk) as returns_items,
      -- return monetary amount ratio
      SUM( sr_return_amt ) AS returns_money
    FROM store_returns
    GROUP BY sr_customer_sk
  ) returned ON ss_customer_sk=sr_customer_sk
 '

EXEC sp_execute_external_script
      @language = N'Python'
    , @script = N'

import pandas as pd
from sklearn.cluster import KMeans

#get data from input query
customer_data = my_input_data

#We concluded in step 2 in the tutorial that 4 would be a good number of clusters
n_clusters = 4

#Perform clustering
est = KMeans(n_clusters=n_clusters, random_state=111).fit(customer_data[["orderRatio","itemsRatio","monetaryRatio","frequency"]])
clusters = est.labels_
customer_data["cluster"] = clusters

OutputDataSet = customer_data
'
    , @input_data_1 = @input_query
    , @input_data_1_name = N'my_input_data'
             with result sets (("Customer" int, "orderRatio" float,"itemsRatio" float,"monetaryRatio" float,"frequency" float,"cluster" float));
END;
GO

Perform clustering

Now that you've created the stored procedure, execute the following script to perform clustering using the procedure.

--Create a table to store the predictions in

DROP TABLE IF EXISTS [dbo].[py_customer_clusters];
GO

CREATE TABLE [dbo].[py_customer_clusters] (
    [Customer] [bigint] NULL
  , [OrderRatio] [float] NULL
  , [itemsRatio] [float] NULL
  , [monetaryRatio] [float] NULL
  , [frequency] [float] NULL
  , [cluster] [int] NULL
  ,
    ) ON [PRIMARY]
GO

--Execute the clustering and insert results into table
INSERT INTO py_customer_clusters
EXEC [dbo].[py_generate_customer_return_clusters];

-- Select contents of the table to verify it works
SELECT * FROM py_customer_clusters;

Use the clustering information

Because you stored the clustering procedure in the database, it can perform clustering efficiently against customer data stored in the same database. You can execute the procedure whenever your customer data is updated and use the updated clustering information.

Suppose you want to send a promotional email to customers in cluster 0, the group that was inactive (you can see how the four clusters were described in part three of this tutorial). The following code selects the email addresses of customers in cluster 0.

USE [tpcxbb_1gb]
--Get email addresses of customers in cluster 0 for a promotion campaign
SELECT customer.[c_email_address], customer.c_customer_sk
  FROM dbo.customer
  JOIN
  [dbo].[py_customer_clusters] as c
  ON c.Customer = customer.c_customer_sk
  WHERE c.cluster = 0

You can change the c.cluster value to return email addresses for customers in other clusters.

Clean up resources

When you're finished with this tutorial, you can delete the tpcxbb_1gb database.

Next steps

In part four of this tutorial series, you completed these steps:

Create a stored procedure that generates the model
Perform clustering on the server
Use the clustering information

To learn more about using Python in SQL machine learning, see:

Dela via