MLflow 模型漸進式推出至線上端點

發行項
10/16/2024

在本文中，您將瞭解如何逐步更新 MLflow 模型並將其部署至線上端點，而不致造成服務中斷。您可以使用藍綠部署，也稱為安全推出策略，將新版本的 Web 服務引入生產環境。此策略可讓您在完全推出 Web 服務之前，將新版的 Web 服務推出至一小部分的使用者或要求。

關於此範例

線上端點的概念是端點和部署。端點代表客戶用來取用模型的 API，而部署則表示該 API 的特定實作。這項區別可讓使用者將 API 與實作分離，並變更基礎實作，而不會影響取用者。此範例會使用這類概念來更新端點中已部署的模型，而不會造成服務中斷。

我們會使用以 UCI 心臟疾病資料集為基礎的模型。資料庫包含 76 個屬性，但我們使用其中 14 個屬性。此模型會嘗試預測病患是否有心臟疾病。值為 0 (沒有) 到 1 (有) 的整數值。模型已使用 XGBBoost 分類器進行訓練，且所有必要的前置處理都已封裝為 scikit-learn 管線，讓此模型成為從原始資料到預測的端對端管線。

本文中的資訊是以 azureml-examples 存放庫中包含的程式碼範例為基礎。若要在本機執行命令，而不需要複製/貼上檔案，請複製存放庫，然後將目錄變更為 sdk/using-mlflow/deploy。

在 Jupyter Notebook 中跟著做

您可以在下列筆記本中遵循此範例。在複製的存放庫中，開啟筆記本：mlflow_sdk_online_endpoints_progresive.ipynb。

必要條件

遵循本文中的步驟之前，請確定您已滿足下列必要條件：

Azure 訂用帳戶。如果您沒有 Azure 訂用帳戶，請在開始前建立免費帳戶。試用免費或付費版本的 Azure Machine Learning。
Azure 角色型存取控制 (Azure RBAC) 可用來授與 Azure Machine Learning 作業的存取權。若要執行本文中的步驟，您必須為使用者帳戶指派 Azure Machine Learning 工作區的擁有者或參與者角色，或允許 Microsoft.MachineLearningServices/workspaces/onlineEndpoints/* 的自訂角色。如需詳細資訊，請參閱管理對 Azure Machine Learning 工作區的存取。

此外，您必須：

安裝 Azure CLI 和 Azure CLI 的 ml 延伸模組。如需詳細資訊，請參閱安裝、設定和使用 CLI (v2)。

安裝 MLflow SDK 套件 mlflow 和適用於 MLflow azureml-mlflow 的 Azure Machine Learning 外掛程式。
```
pip install mlflow azureml-mlflow
```
如果您未在 Azure Machine Learning 計算中執行，請設定 MLflow 追蹤 URI 或 MLflow 的登錄 URI，以指向您正在處理的工作區。瞭解如何針對 Azure Machine Learning 設定 MLflow。

連線到您的工作區

首先，讓我們連線到我們要在其中工作的 Azure Machine Learning 工作區。

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

工作區是 Azure Machine Learning 的最上層資源，其提供一個集中位置來處理您在使用 Azure Machine Learning 時建立的所有成品。在本節中，我們將連線到您將執行部署工作的工作區。

匯入必要的程式庫：

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Model
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

設定工作區詳細資料，並取得工作區的控制代碼：

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

匯入必要的程式庫

import json
import mlflow
import requests
import pandas as pd
from mlflow.deployments import get_deploy_client

設定 MLflow 用戶端和部署用戶端：

mlflow_client = mlflow.MLflowClient()
deployment_client = get_deploy_client(mlflow.get_tracking_uri())

在登錄中註冊模型

確保已在 Azure Machine Learning 登錄中註冊您的模型。 Azure Machine Learning 中不支援部署未註冊的模型。您可以使用 MLflow SDK 來註冊新的模型：

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

建立線上端點

線上端點是用於線上 (即時) 推斷的端點。線上端點包含準備從用戶端接收資料的部署，並可即時傳回回應。

我們將藉由在相同端點下部署相同模型的多個版本，來利用這項功能。不過，新的部署將會在乞討時收到 0% 的流量。一旦我們確定新模型正常運作，我們會逐漸將流量從一個部署移至另一個部署。

端點需要名稱，這在相同區域中必須是唯一的。讓我們確定建立的不存在：

ENDPOINT_SUFIX=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-5} | head -n 1)
ENDPOINT_NAME="heart-classifier-$ENDPOINT_SUFIX"

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

設定端點

endpoint.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: heart-classifier-edp
auth_mode: key

endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="An endpoint to serve predictions of the UCI heart disease problem",
    auth_mode="key",
)

我們可以使用組態檔來設定此端點的屬性。我們會在下列範例中將端點的驗證模式設定為「金鑰」：

endpoint_config = {
    "auth_mode": "key",
    "identity": {
        "type": "system_assigned"
    }
}

讓我們將此組態寫入 JSON 檔案：

endpoint_config_path = "endpoint_config.json"
with open(endpoint_config_path, "w") as outfile:
    outfile.write(json.dumps(endpoint_config))

建立端點：

az ml online-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

endpoint = deployment_client.create_endpoint(
    name=endpoint_name,
    config={"endpoint-config-file": endpoint_config_path},
)

取得端點的驗證祕密。
```
ENDPOINT_SECRET_KEY=$(az ml online-endpoint get-credentials -n $ENDPOINT_NAME | jq -r ".accessToken")
```
```
endpoint_secret_key = ml_client.online_endpoints.list_keys(
    name=endpoint_name
).access_token
```
MLflow SDK 中無法使用此功能。前往 Azure Machine Learning 工作室，瀏覽至端點，然後從該處擷取祕密金鑰。

建立藍色部署

到目前為止，端點是空的。上面沒有部署。讓我們藉由部署先前所處理的相同模型來建立第一個模型。我們將將此部署稱為「預設值」，代表我們的「藍色部署」。

設定部署

blue-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: default
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

blue_deployment_name = "default"

設定部署的硬體需求：

blue_deployment = ManagedOnlineDeployment(
    name=blue_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

如果您的端點沒有輸出連線，請透過包含引數 with_package=True 使用模型封裝 (預覽)：

blue_deployment = ManagedOnlineDeployment(
    name=blue_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    with_package=True,
)

blue_deployment_name = "default"

若要設定部署的硬體需求，您需要使用所需的組態來建立 JSON 檔案：

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

注意

您可以在受控線上部署結構描述 (v2) 中找到此組態的完整規格。

將組態寫入檔案：

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

建立部署

az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

如果您的端點沒有輸出連線，請透過包含旗標 --with-package 使用模型封裝 (預覽)：

az ml online-deployment create --with-package --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

提示

我們會在 create 命令中設定旗標 --all-traffic，這會將所有流量指派給新的部署。

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

blue_deployment = deployment_client.create_deployment(
    name=blue_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

將所有流量指派給部署

到目前為止，端點有一個部署，但不會為其指派任何流量。讓我們來指派。
Azure CLI 中不需要此步驟，因為我們在建立期間使用了 --all-traffic。
```
endpoint.traffic = { blue_deployment_name: 100 }
```
```
traffic_config = {"traffic": {blue_deployment_name: 100}}
```
將組態寫入檔案：
```
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
```

更新端點組態：

Azure CLI 中不需要此步驟，因為我們在建立期間使用了 --all-traffic。

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

建立範例輸入以測試部署

sample.yml

{
    "input_data": {
        "columns": [
            "age",
            "sex",
            "cp",
            "trestbps",
            "chol",
            "fbs",
            "restecg",
            "thalach",
            "exang",
            "oldpeak",
            "slope",
            "ca",
            "thal"
        ],
        "data": [
            [ 48, 0, 3, 130, 275, 0, 0, 139, 0, 0.2, 1, 0, "normal" ]
        ]
    }
}

下列程式碼會從訓練資料集取樣 5 個觀察、移除 target 資料行 (模型將預測)，並在檔案 sample.json 中建立可搭配模型部署使用的要求。

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

with open("sample.json", "w") as f:
    f.write(
        json.dumps(
            {"input_data": json.loads(samples.to_json(orient="split", index=False))}
        )
    )

下列程式碼會從訓練資料集取樣 5 個觀察、移除 target 資料行 (模型將預測)，並建立要求。

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

測試部署

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    df=samples
)

在端點下建立綠色部署

假設開發小組所建立的模型有新版本，且已準備好在生產環境中。我們可以先嘗試飛出此模型，建立信心後，我們可以更新端點以將流量路由傳送至該模型。

登錄新的模型版本

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

讓我們取得新模型的版本號碼：

VERSION=$(az ml model show -n heart-classifier --label latest | jq -r ".version")

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)
version = model.version

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

設定新部署

green-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: xgboost-model
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

我們會將部署命名如下：

GREEN_DEPLOYMENT_NAME="xgboost-model-$VERSION"

green_deployment_name = f"xgboost-model-{version}"

設定部署的硬體需求：

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

如果您的端點沒有輸出連線，請透過包含引數 with_package=True 使用模型封裝 (預覽)：

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    with_package=True,
)

green_deployment_name = f"xgboost-model-{version}"

若要設定部署的硬體需求，您需要使用所需的組態來建立 JSON 檔案：

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

提示

我們會使用 deployment-config-file 中指出的相同硬體確認。不過，沒有相同的組態需求。您可以根據需求，為不同的模型設定不同的硬體。

將組態寫入檔案：

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

建立新的部署

az ml online-deployment create -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

如果您的端點沒有輸出連線，請透過包含旗標 --with-package 使用模型封裝 (預覽)：

az ml online-deployment create --with-package -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

ml_client.online_deployments.begin_create_or_update(green_deployment).result()

new_deployment = deployment_client.create_deployment(
    name=green_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

測試部署而不變更流量

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name $GREEN_DEPLOYMENT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=green_deployment_name
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    deployment_name=green_deployment_name, 
    df=samples
)

提示

請注意，現在我們要如何指出我們想要叫用的部署名稱。

漸進式更新流量

我們對新的部署有信心後，我們可以更新流量，將其中一些流量路由至新的部署。流量是在端點層級設定：

設定流量：

Azure CLI 中不需要此步驟

endpoint.traffic = {blue_deployment_name: 90, green_deployment_name: 10}

traffic_config = {"traffic": {blue_deployment_name: 90, green_deployment_name: 10}}

將組態寫入檔案：

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

更新端點

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=90 $GREEN_DEPLOYMENT_NAME=10"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

如果您決定將整個流量切換至新的部署，請更新所有流量：

Azure CLI 中不需要此步驟

endpoint.traffic = {blue_deployment_name: 0, green_deployment_name: 100}

traffic_config = {"traffic": {blue_deployment_name: 0, green_deployment_name: 100}}

將組態寫入檔案：

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

更新端點

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=0 $GREEN_DEPLOYMENT_NAME=100"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

由於舊部署未收到任何流量，您可以安全地將其刪除：

az ml online-deployment delete --endpoint-name $ENDPOINT_NAME --name default

ml_client.online_deployments.begin_delete(
    name=blue_deployment_name, 
    endpoint_name=endpoint_name
)

deployment_client.delete_deployment(
    blue_deployment_name, 
    endpoint=endpoint_name
)

提示

請注意，此時已刪除先前的「藍色部署」，而新的「綠色部署」已取代「藍色部署」。

清除資源

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

ml_client.online_endpoints.begin_delete(name=endpoint_name)

deployment_client.delete_endpoint(endpoint_name)

重要

請注意，刪除端點也會刪除其下的所有部署。

共用方式為

MLflow 模型漸進式推出至線上端點

關於此範例

在 Jupyter Notebook 中跟著做

必要條件

連線到您的工作區

在登錄中註冊模型

建立線上端點

建立藍色部署

在端點下建立綠色部署

漸進式更新流量

清除資源

下一步

意見反應

其他資源