단일 GPU에 대한 Hugging Face 모델 미세 조정

아티클
03/22/2025

이 문서에서는 단일 GPU에서 Hugging Face transformers 라이브러리를 사용하여 Hugging Face 모델을 미세 조정하는 방법을 설명합니다. 또한 Lakehouse에서 데이터를 로드하고 모델을 MLflow로 로깅하기 위한 Databricks 관련 권장 사항이 포함되어 있으므로 Azure Databricks에서 모델을 사용하고 제어할 수 있습니다.

Hugging Face transformers 라이브러리는 변환기 모델을 로드하고 미세 조정할 수 있는 트레이너 유틸리티 및 자동 모델 클래스를 제공합니다.

이러한 도구는 간단한 수정으로 다음 작업에 사용할 수 있습니다.

모델을 로드하여 미세 조정합니다.
Hugging Face Transformers Trainer 유틸리티의 설정을 구성합니다.
단일 GPU에서 학습을 수행합니다.

Hugging Face Transformers란 무엇인가요?를 참조하세요.

요구 사항

드라이버에 하나의 GPU가 있는 단일 노드 클러스터입니다.
Databricks Runtime 13.0 ML 이상의 GPU 버전입니다.
- 미세 조정을 위한 이 예제에서는 Databricks Runtime 13.0 ML 이상에 포함된 🤗 변환기 🤗, 데이터 세트 및 🤗 평가 패키지가 필요합니다.
MLflow 2.3.
변환기를 사용하여 모델을 미세 조정하기 위해 준비 및 로드된 데이터.

Hugging Face 데이터 세트 토큰화

Hugging Face Transformers 모델은 다운로드한 데이터의 텍스트가 아닌 토큰화된 입력을 기대합니다. 기본 모델과의 호환성을 보장하려면 기본 모델에서 로드된 AutoTokenizer를 사용합니다. Hugging Face datasets를 통해 토크나이저를 학습 및 테스트 데이터 모두에 일관되게 직접 적용할 수 있습니다.

예시:

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model)
def tokenize_function(examples):
    return tokenizer(examples["text"], padding=False, truncation=True)

train_test_tokenized = train_test_dataset.map(tokenize_function, batched=True)

학습 구성 설정

Hugging Face 학습 구성 도구를 사용하여 트레이너를 구성할 수 있습니다. Trainer 클래스는 사용자가 다음을 제공해야 합니다.

지표
베이스 모델
학습 구성

loss이 계산하는 기본 Trainer 메트릭 외에도 평가 메트릭을 구성할 수 있습니다. 다음 예제에서는 메트릭으로서 accuracy 메서드를 추가하는 방법을 보여 줍니다.

import numpy as np
import evaluate
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

NLP에 대한 자동 모델 클래스를 사용하여 작업에 적합한 모델을 로드합니다.

텍스트 분류의 경우 AutoModelForSequenceClassification을 사용하여 텍스트 분류를 위한 기본 모델을 로드합니다. 모델을 만들 때 데이터 세트 준비 중에 생성된 클래스 수와 레이블 매핑을 제공합니다.

from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained(
        base_model,
        num_labels=len(label2id),
        label2id=label2id,
        id2label=id2label
        )

그런 다음 학습 구성을 만듭니다. TrainingArguments 클래스를 사용하면 출력 디렉터리, 평가 전략, 학습 속도 및 기타 매개 변수를 지정할 수 있습니다.

from transformers import TrainingArguments, Trainer
training_args = TrainingArguments(output_dir=training_output_dir, evaluation_strategy="epoch")

데이터 정렬기를 사용하면 학습 및 평가 데이터 세트의 입력이 일괄 처리됩니다. DataCollatorWithPadding은 텍스트 분류에 대한 좋은 기준 성능을 제공합니다.

from transformers import DataCollatorWithPadding
data_collator = DataCollatorWithPadding(tokenizer)

이러한 매개 변수를 모두 생성하면 이제 Trainer를 만들 수 있습니다.

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_test_dataset["train"],
    eval_dataset=train_test_dataset["test"],
    compute_metrics=compute_metrics,
    data_collator=data_collator,
)

MLflow에 모델 훈련 및 로그 기록

Hugging Face는 MLflow와 잘 연결되며, MLflowCallback을 사용하여 모델 학습 중에 메트릭을 자동으로 기록합니다. 그러나 학습된 모델을 직접 기록해야 합니다.

MLflow 실행에서 학습을 래핑합니다. 이렇게 하면 tokenizer 및 학습된 모델에서 Transformers 파이프라인을 생성하고 로컬 디스크에 씁니다. 마지막으로 mlflow.transformers.log_model을 사용하여 모델을 MLflow에 기록합니다.

from transformers import pipeline

with mlflow.start_run() as run:
  trainer.train()
  trainer.save_model(model_output_dir)
  pipe = pipeline("text-classification", model=AutoModelForSequenceClassification.from_pretrained(model_output_dir), batch_size=1, tokenizer=tokenizer)
  model_info = mlflow.transformers.log_model(
        transformers_model=pipe,
        artifact_path="classification",
        input_example="Hi there!",
    )

파이프라인을 만들 필요가 없는 경우 학습에 사용되는 구성 요소를 사전에 제출할 수 있습니다.

model_info = mlflow.transformers.log_model(
  transformers_model={"model": trainer.model, "tokenizer": tokenizer},
  task="text-classification",
  artifact_path="text_classifier",
  input_example=["MLflow is great!", "MLflow on Databricks is awesome!"],
)

추론을 위해 모델 로드

모델이 기록되고 준비되면 유추를 위해 모델을 로드하는 것은 미리 학습된 MLflow 래핑된 모델을 로드하는 것과 같습니다.

logged_model = "runs:/{run_id}/{model_artifact_path}".format(run_id=run.info.run_id, model_artifact_path=model_artifact_path)

# Load model as a Spark UDF. Override result_type if the model does not return double values.
loaded_model_udf = mlflow.pyfunc.spark_udf(spark, model_uri=logged_model, result_type='string')

test = test.select(test.text, test.label, loaded_model_udf(test.text).alias("prediction"))
display(test)

자세한 내용은 Mosaic AI 모델 서비스 사용하여 모델 배포를 참조하세요.

일반적인 CUDA 오류 문제 해결

이 섹션에서는 일반적인 CUDA 오류 및 해결 방법에 대한 지침을 설명합니다.

OutOfMemoryError: CUDA 메모리 부족

대형 모델을 학습할 때 발생할 수 있는 일반적인 오류는 CUDA 메모리 부족 오류입니다.

예시:

OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 14.76 GiB total capacity; 666.34 MiB already allocated; 17.75 MiB free; 720.00 MiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF.

이 오류를 해결하려면 다음 권장 사항을 시도해 보세요.

학습을 위한 일괄 처리 크기를 줄입니다. per_device_train_batch_sizeTrainingArguments의 값을 줄일 수 있습니다.
낮은 정밀도 학습을 사용합니다. 에서 을 설정할 수 있습니다.
TrainingArguments에서 gradient_accumulation_steps 사용하여 전체 일괄 처리 크기를 효과적으로 늘입니다.
8비트 Adam 최적화 프로그램을 사용합니다.
학습하기 전에 GPU 메모리를 정리합니다. 경우에 따라 사용하지 않는 일부 코드에서 GPU 메모리를 사용할 수 있습니다.
```
from numba import cuda
device = cuda.get_current_device()
device.reset()
```

CUDA 커널 오류

학습을 실행할 때 CUDA 커널 오류가 발생할 수 있습니다.

예시:

CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.

For debugging, consider passing CUDA_LAUNCH_BLOCKING=1.

문제점을 해결하려면:

CPU에서 코드를 실행하여 오류가 재현 가능한지 확인합니다.
또 다른 옵션은 CUDA_LAUNCH_BLOCKING=1을 설정하여 더 나은 역추적 결과를 가져오는 것입니다.
```
import os
os.environ["CUDA_LAUNCH_BLOCKING"] = "1"
```

Notebook: 단일 GPU에서 텍스트 분류 미세 조정

예제 코드로 빠르게 시작하기 위해 이 예제 Notebook은 텍스트 분류를 위해 모델을 미세 조정하기 위한 엔드투엔드 예제를 제공합니다. 이 문서의 후속 섹션에서는 Azure Databricks에서 미세 조정을 위해 Hugging Face를 사용하는 방법에 대해 자세히 설명합니다.

Hugging Face 텍스트 분류 모델의 노트북 미세 조정

노트북 받기

추가 리소스

Azure Databricks에서의 Hugging Face에 대해 자세히 알아봅니다.

Hugging Face Transformers란 무엇인가요?
Spark에서 Hugging Face Transformers 모델을 사용하여 NLP 일괄 처리 애플리케이션을 스케일 아웃할 수 있습니다. NLP용 Hugging Face Transformers를 사용하는 모델 유추를 참조하세요.

다음을 통해 공유

단일 GPU에 대한 Hugging Face 모델 미세 조정

요구 사항

Hugging Face 데이터 세트 토큰화

학습 구성 설정

MLflow에 모델 훈련 및 로그 기록

추론을 위해 모델 로드

일반적인 CUDA 오류 문제 해결

OutOfMemoryError: CUDA 메모리 부족

CUDA 커널 오류

Notebook: 단일 GPU에서 텍스트 분류 미세 조정

Hugging Face 텍스트 분류 모델의 노트북 미세 조정

추가 리소스

피드백

추가 리소스