프로덕션 환경에서 실행되도록 R 스크립트 조정

아티클
09/04/2024

이 문서에서는 기존 R 스크립트를 가져와 Azure Machine Learning에서 작업으로 실행하기 위해 적절하게 변경하는 방법을 설명합니다.

이 문서에서 자세히 설명하는 변경 내용의 전부는 아니더라도 대부분을 수행해야 합니다.

사용자 상호 작용 제거

R 스크립트는 무인으로 실행되도록 설계되어야 하며 컨테이너 내에서 Rscript 명령을 통해 실행됩니다. 스크립트에서 대화형 입력 또는 출력을 제거했는지 확인합니다.

구문 분석 추가

스크립트에 일종의 입력 매개 변수가 필요한 경우(대부분의 스크립트가 필요함) Rscript 호출을 통해 스크립트에 입력을 전달합니다.

Rscript <name-of-r-script>.R
--data_file ${{inputs.<name-of-yaml-input-1>}} 
--brand ${{inputs.<name-of-yaml-input-2>}}

R 스크립트에서 입력을 구문 분석하고 적절한 형식 변환을 수행합니다. optparse 패키지를 사용하는 것이 좋습니다.

다음 코드 조각은 다음 방법을 보여 줍니다.

파서 시작
모든 입력을 옵션으로 추가
적절한 데이터 형식으로 입력을 구문 분석합니다.

테스트에 편리한 기본값을 추가할 수도 있습니다. 스크립트의 모든 출력이 저장되도록 기본값이 ./outputs인 --output 매개 변수를 추가하는 것이 좋습니다.

library(optparse)

parser <- OptionParser()

parser <- add_option(
  parser,
  "--output",
  type = "character",
  action = "store",
  default = "./outputs"
)

parser <- add_option(
  parser,
  "--data_file",
  type = "character",
  action = "store",
  default = "data/myfile.csv"
)

parser <- add_option(
  parser,
  "--brand",
  type = "double",
  action = "store",
  default = 1
)
args <- parse_args(parser)

args는 명명된 목록입니다. 나중에 스크립트에서 이러한 매개 변수를 사용할 수 있습니다.

`azureml_utils.R` 도우미 스크립트 소싱

실행될 R 스크립트의 동일한 작업 디렉터리에서 azureml_utils.R 스크립트라는 도우미 스크립트를 소싱해야 합니다. 실행 중인 R 스크립트가 MLflow 서버와 통신할 수 있으려면 도우미 스크립트가 필요합니다. 도우미 스크립트는 실행 중인 작업에서 토큰이 빠르게 변경되기 때문에 인증 토큰을 지속적으로 검색하는 방법을 제공합니다. 도우미 스크립트를 사용하면 R MLflow API에서 제공하는 로깅 함수를 사용하여 모델, 매개 변수, 태그 및 일반 아티팩트를 로깅할 수도 있습니다.

다음 코드를 사용하여 azureml_utils.R 파일을 만듭니다.

# Azure ML utility to enable usage of the MLFlow R API for tracking with Azure Machine Learning (Azure ML). This utility does the following::
# 1. Understands Azure ML MLflow tracking url by extending OSS MLflow R client.
# 2. Manages Azure ML Token refresh for remote runs (runs that execute in Azure Machine Learning). It uses tcktk2 R libraray to schedule token refresh.
#    Token refresh interval can be controlled by setting the environment variable MLFLOW_AML_TOKEN_REFRESH_INTERVAL and defaults to 30 seconds.

library(mlflow)
library(httr)
library(later)
library(tcltk2)

new_mlflow_client.mlflow_azureml <- function(tracking_uri) {
  host <- paste("https", tracking_uri$path, sep = "://")
  get_host_creds <- function () {
    mlflow:::new_mlflow_host_creds(
      host = host,
      token = Sys.getenv("MLFLOW_TRACKING_TOKEN"),
      username = Sys.getenv("MLFLOW_TRACKING_USERNAME", NA),
      password = Sys.getenv("MLFLOW_TRACKING_PASSWORD", NA),
      insecure = Sys.getenv("MLFLOW_TRACKING_INSECURE", NA)
    )
  }
  cli_env <- function() {
    creds <- get_host_creds()
    res <- list(
      MLFLOW_TRACKING_USERNAME = creds$username,
      MLFLOW_TRACKING_PASSWORD = creds$password,
      MLFLOW_TRACKING_TOKEN = creds$token,
      MLFLOW_TRACKING_INSECURE = creds$insecure
    )
    res[!is.na(res)]
  }
  mlflow:::new_mlflow_client_impl(get_host_creds, cli_env, class = "mlflow_azureml_client")
}

get_auth_header <- function() {
    headers <- list()
    auth_token <- Sys.getenv("MLFLOW_TRACKING_TOKEN")
    auth_header <- paste("Bearer", auth_token, sep = " ")
    headers$Authorization <- auth_header
    headers
}

get_token <- function(host, exp_id, run_id) {
    req_headers <- do.call(httr::add_headers, get_auth_header())
    token_host <- gsub("mlflow/v1.0","history/v1.0", host)
    token_host <- gsub("azureml://","https://", token_host)
    api_url <- paste0(token_host, "/experimentids/", exp_id, "/runs/", run_id, "/token")
    GET( api_url, timeout(getOption("mlflow.rest.timeout", 30)), req_headers)
}


fetch_token_from_aml <- function() {
    message("Refreshing token")
    tracking_uri <- Sys.getenv("MLFLOW_TRACKING_URI")
    exp_id <- Sys.getenv("MLFLOW_EXPERIMENT_ID")
    run_id <- Sys.getenv("MLFLOW_RUN_ID")
    sleep_for <- 1
    time_left <- 30
    response <- get_token(tracking_uri, exp_id, run_id)
    while (response$status_code == 429 && time_left > 0) {
        time_left <- time_left - sleep_for
        warning(paste("Request returned with status code 429 (Rate limit exceeded). Retrying after ",
                    sleep_for, " seconds. Will continue to retry 429s for up to ", time_left,
                    " second.", sep = ""))
        Sys.sleep(sleep_for)
        sleep_for <- min(time_left, sleep_for * 2)
        response <- get_token(tracking_uri, exp_id)
    }

    if (response$status_code != 200){
        error_response = paste("Error fetching token will try again after sometime: ", str(response), sep = " ")
        warning(error_response)
    }

    if (response$status_code == 200){
        text <- content(response, "text", encoding = "UTF-8")
        json_resp <-jsonlite::fromJSON(text, simplifyVector = FALSE)
        json_resp$token
        Sys.setenv(MLFLOW_TRACKING_TOKEN = json_resp$token)
        message("Refreshing token done")
    }
}

clean_tracking_uri <- function() {
    tracking_uri <- httr::parse_url(Sys.getenv("MLFLOW_TRACKING_URI"))
    tracking_uri$query = ""
    tracking_uri <-httr::build_url(tracking_uri)
    Sys.setenv(MLFLOW_TRACKING_URI = tracking_uri)
}

clean_tracking_uri()
tcltk2::tclTaskSchedule(as.integer(Sys.getenv("MLFLOW_TOKEN_REFRESH_INTERVAL_SECONDS", 30))*1000, fetch_token_from_aml(), id = "fetch_token_from_aml", redo = TRUE)

# Set MLFlow related env vars
Sys.setenv(MLFLOW_BIN = system("which mlflow", intern = TRUE))
Sys.setenv(MLFLOW_PYTHON_BIN = system("which python", intern = TRUE))

다음 줄을 사용하여 R 스크립트를 시작합니다.

source("azureml_utils.R")

데이터 파일을 로컬 파일로 읽기

R 스크립트를 작업으로 실행하면 Azure Machine Learning은 작업 제출에서 지정한 데이터를 가져와서 실행 중인 컨테이너에 탑재합니다. 따라서 실행 중인 컨테이너의 로컬 파일인 것처럼 데이터 파일을 읽을 수 있습니다.

원본 데이터가 데이터 자산으로 등록되어 있는지 확인합니다.
작업 제출 매개 변수에서 이름으로 데이터 자산 전달
일반적으로 로컬 파일을 읽는 것처럼 파일을 읽습니다.

매개 변수 섹션에 표시된 대로 입력 매개 변수를 정의합니다. data-file 매개 변수를 사용하여 전체 경로를 지정하면 read_csv(args$data_file)를 사용하여 데이터 자산을 읽을 수 있습니다.

작업 아티팩트(이미지, 데이터 등) 저장

Important

이 섹션은 모델에 적용되지 않습니다. 모델별 저장 및 로깅 지침은 다음 두 섹션을 참조하세요.

Azure Machine Learning에서 R 스크립트에 의해 생성된 데이터 파일, 이미지, 직렬화된 R 개체 등과 같은 임의의 스크립트 출력을 저장할 수 있습니다. 만들어진 모든 아티팩트(이미지, 모델, 데이터 등)를 저장할 ./outputs 디렉터리를 만듭니다. ./outputs에 저장된 모든 파일은 자동으로 실행에 포함되고 실행 종료 시 실험에 업로드됩니다. 입력 매개 변수 섹션에서 --output 매개 변수의 기본값을 추가했으므로 R 스크립트에 다음 코드 조각을 포함하여 output 디렉터리를 만듭니다.

if (!dir.exists(args$output)) {
  dir.create(args$output)
}

디렉터리를 만든 후 아티팩트를 해당 디렉터리에 저장합니다. 예시:

# create and save a plot
library(ggplot2)

myplot <- ggplot(...)

ggsave(myplot, 
       filename = file.path(args$output,"forecast-plot.png"))


# save an rds serialized object
saveRDS(myobject, file = file.path(args$output,"myobject.rds"))

`carrier` 패키지로 모델 `crate`

R MLflow API 설명서에는 R 모델이 crate 모델 버전이어야 한다고 명시되어 있습니다.

R 스크립트가 모델을 학습하고 모델 개체를 생성하는 경우 나중에 Azure Machine Learning을 사용하여 배포할 수 있도록 crate해야 합니다.
crate 함수를 사용할 때 필요한 패키지 함수를 호출할 때 명시적 네임스페이스를 사용합니다.

fable 패키지로 만들어진 my_ts_model이라는 시계열 모델 개체가 있다고 가정해 보겠습니다. 배포 시 이 모델을 호출할 수 있도록 하려면 모델 개체와 여러 기간의 예측 범위를 전달할 crate를 만듭니다.

library(carrier)
crated_model <- crate(function(x)
{
  fabletools::forecast(!!my_ts_model, h = x)
})

crated_model 개체는 로그할 개체입니다.

R MLflow API를 사용하여 모델, 매개 변수, 태그 또는 기타 아티팩트 기록

생성된 아티팩트를 저장하는 것 외에도 각 실행에 대한 모델, 태그 및 매개 변수를 로그할 수도 있습니다. 그렇게 하려면 R MLflow API를 사용합니다.

모델을 로그할 때 이전 섹션에 설명된 대로 만든 크레이트한 모델을 로그합니다.

참고 항목

모델을 로그하면 모델도 저장되고 실행 아티팩트에 추가됩니다. 모델을 로그하지 않은 경우 모델을 명시적으로 저장할 필요가 없습니다.

모델 및/또는 매개 변수를 로그하려면:

mlflow_start_run()으로 실행 시작
mlflow_log_model, mlflow_log_param 또는 mlflow_log_batch로 아티팩트를 로그합니다.
mlflow_end_run()으로 실행을 종료하지 마세요. 현재 오류가 발생하므로 이 호출을 건너뜁니다.

예를 들어, 이전 섹션에서 만들어진 crated_model 개체를 로그하려면 R 스크립트에 다음 코드를 포함합니다.

팁

모델을 로깅할 때 models를 artifact_path의 값으로 사용합니다. 이는 모범 사례입니다(다른 이름을 지정할 수도 있음).

mlflow_start_run()

mlflow_log_model(
  model = crated_model, # the crate model object
  artifact_path = "models" # a path to save the model object to
  )

mlflow_log_param(<key-name>, <value>)

# mlflow_end_run() - causes an error, do not include mlflow_end_run()

스크립트 구조 및 예

이 문서에 설명된 모든 변경 내용에 따라 이러한 코드 조각을 가이드로 사용하여 R 스크립트를 구성합니다.

# BEGIN R SCRIPT

# source the azureml_utils.R script which is needed to use the MLflow back end
# with R
source("azureml_utils.R")

# load your packages here. Make sure that they are installed in the container.
library(...)

# parse the command line arguments.
library(optparse)

parser <- OptionParser()

parser <- add_option(
  parser,
  "--output",
  type = "character",
  action = "store",
  default = "./outputs"
)

parser <- add_option(
  parser,
  "--data_file",
  type = "character",
  action = "store",
  default = "data/myfile.csv"
)

parser <- add_option(
  parser,
  "--brand",
  type = "double",
  action = "store",
  default = 1
)
args <- parse_args(parser)

# your own R code goes here
# - model building/training
# - visualizations
# - etc.

# create the ./outputs directory
if (!dir.exists(args$output)) {
  dir.create(args$output)
}

# log models and parameters to MLflow
mlflow_start_run()

mlflow_log_model(
  model = crated_model, # the crate model object
  artifact_path = "models" # a path to save the model object to
  )

mlflow_log_param(<key-name>, <value>)

# mlflow_end_run() - causes an error, do not include mlflow_end_run()
## END OF R SCRIPT

환경 만들기

R 스크립트를 실행하려면 CLI v2라고도 하는 Azure CLI용 ml 확장을 사용합니다. ml 명령은 YAML 작업 정의 파일을 사용합니다. az ml로 작업을 제출하는 방법에 대한 자세한 내용은 Azure Machine Learning CLI로 모델 학습을 참조하세요.

YAML 작업 파일은 환경을 지정합니다. 작업을 실행하기 전에 작업 영역에서 이 환경을 만들어야 합니다.

Azure Machine Learning 스튜디오 또는 Azure CLI를 사용하여 환경을 만들 수 있습니다.

어떤 방법을 사용하든 Dockerfile을 사용하게 됩니다. Azure Machine Learning에서 작업하려면 R 환경에 대한 모든 Docker 컨텍스트 파일에 다음 사양이 있어야 합니다.

FROM rocker/tidyverse:latest

# Install python
RUN apt-get update -qq && \
 apt-get install -y python3-pip tcl tk libz-dev libpng-dev

RUN ln -f /usr/bin/python3 /usr/bin/python
RUN ln -f /usr/bin/pip3 /usr/bin/pip
RUN pip install -U pip

# Install azureml-MLflow
RUN pip install azureml-MLflow
RUN pip install MLflow

# Create link for python
RUN ln -f /usr/bin/python3 /usr/bin/python

# Install R packages required for logging with MLflow (these are necessary)
RUN R -e "install.packages('mlflow', dependencies = TRUE, repos = 'https://cloud.r-project.org/')"
RUN R -e "install.packages('carrier', dependencies = TRUE, repos = 'https://cloud.r-project.org/')"
RUN R -e "install.packages('optparse', dependencies = TRUE, repos = 'https://cloud.r-project.org/')"
RUN R -e "install.packages('tcltk2', dependencies = TRUE, repos = 'https://cloud.r-project.org/')"

기본 이미지는 많은 R 패키지와 해당 종속성이 이미 설치된 rocker/tidyverse:latest입니다.

Important

스크립트를 미리 실행하는 데 필요한 모든 R 패키지를 설치해야 합니다. 필요에 따라 Docker 컨텍스트 파일에 행을 더 추가합니다.

RUN R -e "install.packages('<package-to-install>', dependencies = TRUE, repos = 'https://cloud.r-project.org/')"

추가 제안

고려할 수 있는 몇 가지 추가 제안:

예외 및 오류 처리를 위해 R의 tryCatch 함수 사용
문제 해결 및 디버깅을 위한 명시적 로깅 추가

다음 단계

Azure Machine Learning에서 R 모델을 학습시키는 방법

다음을 통해 공유

프로덕션 환경에서 실행되도록 R 스크립트 조정

사용자 상호 작용 제거

구문 분석 추가

`azureml_utils.R` 도우미 스크립트 소싱

데이터 파일을 로컬 파일로 읽기

작업 아티팩트(이미지, 데이터 등) 저장

`carrier` 패키지로 모델 `crate`

R MLflow API를 사용하여 모델, 매개 변수, 태그 또는 기타 아티팩트 기록

스크립트 구조 및 예

환경 만들기

추가 제안

다음 단계

피드백

추가 리소스

다음을 통해 공유

프로덕션 환경에서 실행되도록 R 스크립트 조정

사용자 상호 작용 제거

구문 분석 추가

azureml_utils.R 도우미 스크립트 소싱

데이터 파일을 로컬 파일로 읽기

작업 아티팩트(이미지, 데이터 등) 저장

carrier 패키지로 모델 crate

R MLflow API를 사용하여 모델, 매개 변수, 태그 또는 기타 아티팩트 기록

스크립트 구조 및 예

환경 만들기

추가 제안

다음 단계

피드백

추가 리소스

`azureml_utils.R` 도우미 스크립트 소싱

`carrier` 패키지로 모델 `crate`