온라인 엔드포인트에 대한 MLflow 모델의 점진적 롤아웃

아티클
09/03/2024

이 문서에서는 서비스 중단 없이 온라인 엔드포인트에 MLflow 모델을 점진적으로 업데이트하고 배포하는 방법을 알아봅니다. 안전한 롤아웃 전략이라고도 하는 파란색-녹색 배포를 사용하여 프로덕션에 새 버전의 웹 서비스를 도입합니다. 이 전략을 사용하면 새 버전의 웹 서비스를 완전히 배포하기 전에 사용자 또는 요청의 작은 하위 집합에 롤아웃할 수 있습니다.

이 예에 대해

온라인 엔드포인트에는 엔드포인트 및 배포라는 개념이 있습니다. 엔드포인트는 고객이 모델을 사용하는 데 사용하는 API를 나타내고 배포는 해당 API의 특정 구현을 나타냅니다. 이러한 구분을 통해 사용자는 구현에서 API를 분리하고 소비자에게 영향을 주지 않고 기본 구현을 변경할 수 있습니다. 이 예에서는 이러한 개념을 사용하여 서비스 중단 없이 엔드포인트에서 배포된 모델을 업데이트합니다.

배포할 모델은 UCI 심장 질환 데이터 집합을 기반으로 합니다. 데이터베이스에는 76개의 특성이 포함되어 있지만, 여기서는 그 중 14개만 사용합니다. 이 모델은 환자의 심장병 유무를 예측하려고 시도합니다. 값은 0(심장병 없음)에서 1(심장병 있음) 사이의 정수입니다. 이 모델은 XGBBoost 분류자를 사용하여 학습되었으며, 필요한 모든 전처리가 scikit-learn 파이프라인으로 패키징되어 있으므로 원시 데이터부터 예측까지 아우르는 엔드투엔드 파이프라인입니다.

이 문서의 정보는 azureml-examples 리포지토리에 포함된 코드 샘플을 기반으로 합니다. 파일을 복사/붙여넣기하지 않고 로컬에서 명령을 실행하려면 리포지토리를 복제한 다음 디렉터리를 sdk/using-mlflow/deploy로 변경합니다.

Jupyter Notebooks에서 따라하기

다음 Notebooks에서 이 샘플을 따를 수 있습니다. 복제된 리포지토리에서 mlflow_sdk_online_endpoints_progresive.ipynb Notebook을 엽니다.

필수 조건

이 문서의 단계를 수행하기 전에 다음과 같은 필수 구성 요소가 있는지 확인합니다.

Azure 구독 Azure 구독이 아직 없는 경우 시작하기 전에 체험 계정을 만듭니다. Azure Machine Learning 평가판 또는 유료 버전을 사용해 보세요.
Azure RBAC(Azure 역할 기반 액세스 제어)는 Azure Machine Learning의 작업에 대한 액세스 권한을 부여하는 데 사용됩니다. 이 문서의 단계를 수행하려면 사용자 계정에 Azure Machine Learning 작업 영역에 대한 소유자 또는 기여자 역할 또는 Microsoft.MachineLearningServices/workspaces/onlineEndpoints/*를 허용하는 사용자 지정 역할이 할당되어야 합니다. 자세한 내용은 Azure Machine Learning 작업 영역 액세스 관리를 참조하세요.

또한 다음을 수행해야 합니다.

Azure CLI 및 ml 확장을 Azure CLI에 설치합니다. 자세한 내용은 CLI(v2) 설치, 설정 및 사용을 참조하세요.

Mlflow SDK 패키지 mlflow 및 MLflow용 Azure Machine Learning 플러그 인 azureml-mlflow를 설치합니다.
```
pip install mlflow azureml-mlflow
```
Azure Machine Learning 컴퓨팅에서 실행하지 않는 경우 작업 중인 작업 영역을 가리키도록 MLflow 추적 URI 또는 MLflow의 레지스트리 URI를 구성합니다. Azure Machine Learning을 위한 MLflow 구성 방법을 알아봅니다.

작업 영역에 연결

먼저 작업할 Azure Machine Learning 작업 영역에 연결해 보겠습니다.

az account set --subscription <subscription>
az configure --defaults workspace=<workspace> group=<resource-group> location=<location>

작업 영역은 Azure Machine Learning의 최상위 리소스로, Azure Machine Learning을 사용할 때 만든 모든 아티팩트를 사용할 수 있는 중앙 집중식 환경을 제공합니다. 이 섹션에서는 배포 작업을 수행할 작업 영역에 연결합니다.

필요한 라이브러리 가져오기:

from azure.ai.ml import MLClient, Input
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment, Model
from azure.ai.ml.constants import AssetTypes
from azure.identity import DefaultAzureCredential

작업 영역 세부 정보를 구성하고 작업 영역에 대한 핸들을 가져옵니다.

subscription_id = "<subscription>"
resource_group = "<resource-group>"
workspace = "<workspace>"

ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)

필요한 라이브러리를 가져옵니다.

import json
import mlflow
import requests
import pandas as pd
from mlflow.deployments import get_deploy_client

MLflow 클라이언트 및 배포 클라이언트를 구성합니다.

mlflow_client = mlflow.MLflowClient()
deployment_client = get_deploy_client(mlflow.get_tracking_uri())

레지스트리에 모델 등록

모델이 Azure Machine Learning 레지스트리에 등록되었는지 확인합니다. 등록되지 않은 모델의 배포는 Azure Machine Learning에서 지원되지 않습니다. 다음과 같은 방법으로 MLflow SDK를 사용하여 새 모델을 등록할 수 있습니다.

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

온라인 엔드포인트 만들기

온라인 엔드포인트란 온라인(실시간) 유추에 사용되는 엔드포인트입니다. 온라인 엔드포인트 에는 클라이언트에서 데이터를 수신할 준비가 되어 있고 실시간으로 응답을 다시 보낼 수 있는 배포 가 포함됩니다.

동일한 엔드포인트에서 동일한 모델의 여러 버전을 배포하여 이 기능을 활용할 것입니다. 그러나 새 배포는 처음에 트래픽의 0%를 수신합니다. 새 모델이 올바르게 작동하는지 확인하면 한 배포에서 다른 배포로 트래픽을 점진적으로 이동할 것입니다.

엔드포인트에는 동일한 지역에서 고유해야 하는 이름이 필요합니다. 존재하지 않는 항목을 만들어 보겠습니다.

ENDPOINT_SUFIX=$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w ${1:-5} | head -n 1)
ENDPOINT_NAME="heart-classifier-$ENDPOINT_SUFIX"

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

import random
import string

# Creating a unique endpoint name by including a random suffix
allowed_chars = string.ascii_lowercase + string.digits
endpoint_suffix = "".join(random.choice(allowed_chars) for x in range(5))
endpoint_name = "heart-classifier-" + endpoint_suffix

print(f"Endpoint name: {endpoint_name}")

엔드포인트 구성

endpoint.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineEndpoint.schema.json
name: heart-classifier-edp
auth_mode: key

endpoint = ManagedOnlineEndpoint(
    name=endpoint_name,
    description="An endpoint to serve predictions of the UCI heart disease problem",
    auth_mode="key",
)

구성 파일을 사용하여 이 엔드포인트의 속성을 구성할 수 있습니다. 다음 예에서는 엔드포인트의 인증 모드를 "키"로 구성합니다.

endpoint_config = {
    "auth_mode": "key",
    "identity": {
        "type": "system_assigned"
    }
}

이 구성을 JSON 파일에 작성해 보겠습니다.

endpoint_config_path = "endpoint_config.json"
with open(endpoint_config_path, "w") as outfile:
    outfile.write(json.dumps(endpoint_config))

엔드포인트 만들기:

az ml online-endpoint create -n $ENDPOINT_NAME -f endpoint.yml

ml_client.online_endpoints.begin_create_or_update(endpoint).result()

endpoint = deployment_client.create_endpoint(
    name=endpoint_name,
    config={"endpoint-config-file": endpoint_config_path},
)

엔드포인트에 대한 인증 비밀을 가져옵니다.
```
ENDPOINT_SECRET_KEY=$(az ml online-endpoint get-credentials -n $ENDPOINT_NAME | jq -r ".accessToken")
```
```
endpoint_secret_key = ml_client.online_endpoints.list_keys(
    name=endpoint_name
).access_token
```
이 기능은 MLflow SDK에서 사용할 수 없습니다. Azure Machine Learning 스튜디오로 이동하고 엔드포인트로 이동한 다음 거기에서 비밀 키를 검색합니다.

파란색 배포 만들기

지금까지 엔드포인트는 비어 있습니다. 배포가 없습니다. 이전에 작업했던 것과 동일한 모델을 배포하여 첫 번째 모델을 만들어 보겠습니다. 이 배포를 "기본"이라고 하며, "파란색 배포"를 나타냅니다.

배포 구성

blue-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: default
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

blue_deployment_name = "default"

배포의 하드웨어 요구 사항을 구성합니다.

blue_deployment = ManagedOnlineDeployment(
    name=blue_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

엔드포인트에 송신 연결이 없는 경우 인수 with_package=True를 포함하여 모델 패키징(미리 보기)을 사용합니다.

blue_deployment = ManagedOnlineDeployment(
    name=blue_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    with_package=True,
)

blue_deployment_name = "default"

배포의 하드웨어 요구 사항을 구성하려면 원하는 구성으로 JSON 파일을 만들어야 합니다.

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

참고 항목

이 구성의 전체 사양은 관리형 온라인 배포 스키마(v2)에서 찾을 수 있습니다.

구성을 파일에 씁니다.

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

배포 만들기

az ml online-deployment create --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

엔드포인트에 송신 연결이 없는 경우 플래그 --with-package를 포함하여 모델 패키징(미리 보기)을 사용합니다.

az ml online-deployment create --with-package --endpoint-name $ENDPOINT_NAME -f blue-deployment.yml --all-traffic

팁

모든 트래픽을 새 배포에 할당하는 create 명령에 --all-traffic 플래그를 설정합니다.

ml_client.online_deployments.begin_create_or_update(blue_deployment).result()

blue_deployment = deployment_client.create_deployment(
    name=blue_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

배포에 모든 트래픽 할당

지금까지 엔드포인트에는 하나의 배포가 있지만 트래픽이 할당되지 않았습니다. 할당해 보겠습니다.
만드는 동안 --all-traffic을 사용했기 때문에 Azure CLI에서는 이 단계가 필요하지 않습니다.
```
endpoint.traffic = { blue_deployment_name: 100 }
```
```
traffic_config = {"traffic": {blue_deployment_name: 100}}
```
구성을 파일에 씁니다.
```
traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))
```
엔드포인트 구성을 업데이트합니다.
만드는 동안 --all-traffic을 사용했기 때문에 Azure CLI에서는 이 단계가 필요하지 않습니다.
```
ml_client.begin_create_or_update(endpoint).result()
```
```
deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)
```

배포를 테스트하기 위한 샘플 입력 만들기

sample.yml

{
    "input_data": {
        "columns": [
            "age",
            "sex",
            "cp",
            "trestbps",
            "chol",
            "fbs",
            "restecg",
            "thalach",
            "exang",
            "oldpeak",
            "slope",
            "ca",
            "thal"
        ],
        "data": [
            [ 48, 0, 3, 130, 275, 0, 0, 139, 0, 0.2, 1, 0, "normal" ]
        ]
    }
}

다음 코드는 학습 데이터 세트에서 5개의 관찰을 샘플링하고 target 열을 제거하고(모델이 예측할 때) 모델 배포와 함께 사용할 수 있는 sample.json 파일에 요청을 만듭니다.

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

with open("sample.json", "w") as f:
    f.write(
        json.dumps(
            {"input_data": json.loads(samples.to_json(orient="split", index=False))}
        )
    )

다음 코드는 학습 데이터 세트에서 5개의 관찰을 샘플링하고 target 열을 제거하고(모델이 예측할 때) 요청을 만듭니다.

samples = (
    pd.read_csv("data/heart.csv")
    .sample(n=5)
    .drop(columns=["target"])
    .reset_index(drop=True)
)

배포 테스트

az ml online-endpoint invoke --name $ENDPOINT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    df=samples
)

엔드포인트에서 친환경 배포 만들기

개발 팀이 만든 모델의 새 버전이 있고 프로덕션에 들어갈 준비가 되었다고 상상해 봅시다. 먼저 이 모델을 사용하려고 시도하고 확신이 들면 엔드포인트를 업데이트하여 트래픽을 라우팅할 수 있습니다.

새 모델 버전 등록

MODEL_NAME='heart-classifier'
az ml model create --name $MODEL_NAME --type "mlflow_model" --path "model"

새 모델의 버전 번호를 알아봅시다.

VERSION=$(az ml model show -n heart-classifier --label latest | jq -r ".version")

model_name = 'heart-classifier'
model_local_path = "model"

model = ml_client.models.create_or_update(
     Model(name=model_name, path=model_local_path, type=AssetTypes.MLFLOW_MODEL)
)
version = model.version

model_name = 'heart-classifier'
model_local_path = "model"

registered_model = mlflow_client.create_model_version(
    name=model_name, source=f"file://{model_local_path}"
)
version = registered_model.version

새 배포 구성

green-deployment.yml

$schema: https://azuremlschemas.azureedge.net/latest/managedOnlineDeployment.schema.json
name: xgboost-model
endpoint_name: heart-classifier-edp
model: azureml:heart-classifier@latest
instance_type: Standard_DS2_v2
instance_count: 1

다음과 같이 배포 이름을 지정합니다.

GREEN_DEPLOYMENT_NAME="xgboost-model-$VERSION"

green_deployment_name = f"xgboost-model-{version}"

배포의 하드웨어 요구 사항을 구성합니다.

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
)

엔드포인트에 송신 연결이 없는 경우 인수 with_package=True를 포함하여 모델 패키징(미리 보기)을 사용합니다.

green_deployment = ManagedOnlineDeployment(
    name=green_deployment_name,
    endpoint_name=endpoint_name,
    model=model,
    instance_type="Standard_DS2_v2",
    instance_count=1,
    with_package=True,
)

green_deployment_name = f"xgboost-model-{version}"

배포의 하드웨어 요구 사항을 구성하려면 원하는 구성으로 JSON 파일을 만들어야 합니다.

deploy_config = {
    "instance_type": "Standard_DS2_v2",
    "instance_count": 1,
}

팁

deployment-config-file에 표시된 것과 동일한 하드웨어 확인을 사용하고 있습니다. 그러나 동일한 구성을 갖기 위한 요구 사항은 없습니다. 요구 사항에 따라 다른 모델에 대해 다른 하드웨어를 구성할 수 있습니다.

구성을 파일에 씁니다.

deployment_config_path = "deployment_config.json"
with open(deployment_config_path, "w") as outfile:
    outfile.write(json.dumps(deploy_config))

새 배포 만들기

az ml online-deployment create -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

엔드포인트에 송신 연결이 없는 경우 플래그 --with-package를 포함하여 모델 패키징(미리 보기)을 사용합니다.

az ml online-deployment create --with-package -n $GREEN_DEPLOYMENT_NAME --endpoint-name $ENDPOINT_NAME -f green-deployment.yml

ml_client.online_deployments.begin_create_or_update(green_deployment).result()

new_deployment = deployment_client.create_deployment(
    name=green_deployment_name,
    endpoint=endpoint_name,
    model_uri=f"models:/{model_name}/{version}",
    config={"deploy-config-file": deployment_config_path},
)

트래픽을 변경하지 않고 배포 테스트

az ml online-endpoint invoke --name $ENDPOINT_NAME --deployment-name $GREEN_DEPLOYMENT_NAME --request-file sample.json

ml_client.online_endpoints.invoke(
    endpoint_name=endpoint_name,
    deployment_name=green_deployment_name
    request_file="sample.json",
)

deployment_client.predict(
    endpoint=endpoint_name, 
    deployment_name=green_deployment_name, 
    df=samples
)

팁

이제 Microsoft가 호출하려는 배포의 이름을 어떻게 표시하고 있는지 확인합니다.

트래픽을 점진적으로 업데이트

새 배포에 대한 확신이 생기면 트래픽을 업데이트하여 일부를 새 배포로 라우팅할 수 있습니다. 트래픽은 엔드포인트 수준에서 구성됩니다.

트래픽을 구성합니다.

Azure CLI에서는 이 단계가 필요하지 않습니다.

endpoint.traffic = {blue_deployment_name: 90, green_deployment_name: 10}

traffic_config = {"traffic": {blue_deployment_name: 90, green_deployment_name: 10}}

구성을 파일에 씁니다.

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

엔드포인트 업데이트

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=90 $GREEN_DEPLOYMENT_NAME=10"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

전체 트래픽을 새 배포로 전환하기로 결정한 경우 모든 트래픽을 업데이트합니다.

Azure CLI에서는 이 단계가 필요하지 않습니다.

endpoint.traffic = {blue_deployment_name: 0, green_deployment_name: 100}

traffic_config = {"traffic": {blue_deployment_name: 0, green_deployment_name: 100}}

구성을 파일에 씁니다.

traffic_config_path = "traffic_config.json"
with open(traffic_config_path, "w") as outfile:
    outfile.write(json.dumps(traffic_config))

엔드포인트 업데이트

az ml online-endpoint update --name $ENDPOINT_NAME --traffic "default=0 $GREEN_DEPLOYMENT_NAME=100"

ml_client.begin_create_or_update(endpoint).result()

deployment_client.update_endpoint(
    endpoint=endpoint_name,
    config={"endpoint-config-file": traffic_config_path},
)

이전 배포는 트래픽을 수신하지 않으므로 안전하게 삭제할 수 있습니다.
```
az ml online-deployment delete --endpoint-name $ENDPOINT_NAME --name default
```
```
ml_client.online_deployments.begin_delete(
    name=blue_deployment_name, 
    endpoint_name=endpoint_name
)
```
```
deployment_client.delete_deployment(
    blue_deployment_name, 
    endpoint=endpoint_name
)
```
팁

이 시점에서 이전의 "파란색 배포"가 삭제되었으며 새로운 "녹색 배포"가 "파란색 배포"를 대신했습니다.

리소스 정리

az ml online-endpoint delete --name $ENDPOINT_NAME --yes

ml_client.online_endpoints.begin_delete(name=endpoint_name)

deployment_client.delete_endpoint(endpoint_name)

Important

엔드포인트를 삭제하면 그 아래에 있는 모든 배포도 삭제됩니다.

다음을 통해 공유

온라인 엔드포인트에 대한 MLflow 모델의 점진적 롤아웃

이 예에 대해

Jupyter Notebooks에서 따라하기

필수 조건

작업 영역에 연결

레지스트리에 모델 등록

온라인 엔드포인트 만들기

파란색 배포 만들기

엔드포인트에서 친환경 배포 만들기

트래픽을 점진적으로 업데이트

리소스 정리

다음 단계

피드백

추가 리소스