超參數微調 (預覽版)

發行項
10/15/2024

超參數微調是尋找參數最佳值的程序，這些參數並非機器學習模型在訓練期間學習的，而是由使用者在訓練程序開始之前設定的。這些參數通常稱為超參數，範例包括學習速率、神經網路中的隱藏層數目、正規化強度和批次大小。

機器學習模型的效能對超參數的選擇非常敏感，而最佳超參數集可能會因具體問題和資料集而有很大的差異。因此，超參數微調是機器學習管線中的重要步驟，因為它可能會對模型的正確性和一般化效能產生重大影響。

在 Fabric 中，資料科學家可以利用 FLAML (一種輕量型 Python 程式庫)，有效率地自動化機器學習和 AI 作業，進而滿足其超參數微調需求。在 Fabric 筆記本中，使用者可以呼叫 flaml.tune，以進行經濟超參數微調。

重要

這項功能處於預覽狀態。

微調工作流程

使用 flaml.tune 完成基本微調工作需要三個重要步驟：

指定超參數的微調目標。
指定超參數的搜尋空間。
指定微調條件約束，包括用於執行微調的資源預算的條件約束、組態的條件約束，及/或一 (或多個) 特定計量的條件約束。

微調目標

第一個步驟是指定微調目標。若要這樣做，您應該先在使用者定義的函式 evaluation_function 中指定超參數的評估程序。函式需要超參數組態作為輸入。它可以直接傳回純量中的計量值，或傳回計量名稱和計量值組的字典。

在下列範例中，我們可以針對名為 x 和 y 的 2 個超參數定義評估函式。

import time

def evaluate_config(config: dict):
    """evaluate a hyperparameter configuration"""
    score = (config["x"] - 85000) ** 2 - config["x"] / config["y"]


    faked_evaluation_cost = config["x"] / 100000
    time.sleep(faked_evaluation_cost)
    # we can return a single float as a score on the input config:
    # return score
    # or, we can return a dictionary that maps metric name to metric value:
    return {"score": score, "evaluation_cost": faked_evaluation_cost, "constraint_metric": config["x"] * config["y"]}

搜尋空間。

接下來，我們會指定超參數的搜尋空間。在搜尋空間中，您需要指定超參數的有效值，以及如何取樣這些值 (例如，從均勻分佈或對數均勻分布)。在下列範例中，我們可以提供超參數 x 和 y 的搜尋空間。兩者的有效值均為 [1, 100,000] 範圍內的整數。這些超參數會在指定的範圍中均勻取樣。

from flaml import tune

# construct a search space for the hyperparameters x and y.
config_search_space = {
    "x": tune.lograndint(lower=1, upper=100000),
    "y": tune.randint(lower=1, upper=100000)
}

# provide the search space to tune.run
tune.run(..., config=config_search_space, ...)

透過 FLAML，使用者可以針對特定超參數自訂網域。這可讓使用者指定要從中取樣參數的 [類型] 和 [有效範圍]。 FLAML 支援下列超參數類型：float、integer 和 categorical。您可以針對常用的網域查看此範例：

config = {
    # Sample a float uniformly between -5.0 and -1.0
    "uniform": tune.uniform(-5, -1),

    # Sample a float uniformly between 3.2 and 5.4,
    # rounding to increments of 0.2
    "quniform": tune.quniform(3.2, 5.4, 0.2),

    # Sample a float uniformly between 0.0001 and 0.01, while
    # sampling in log space
    "loguniform": tune.loguniform(1e-4, 1e-2),

    # Sample a float uniformly between 0.0001 and 0.1, while
    # sampling in log space and rounding to increments of 0.00005
    "qloguniform": tune.qloguniform(1e-4, 1e-1, 5e-5),

    # Sample a random float from a normal distribution with
    # mean=10 and sd=2
    "randn": tune.randn(10, 2),

    # Sample a random float from a normal distribution with
    # mean=10 and sd=2, rounding to increments of 0.2
    "qrandn": tune.qrandn(10, 2, 0.2),

    # Sample a integer uniformly between -9 (inclusive) and 15 (exclusive)
    "randint": tune.randint(-9, 15),

    # Sample a random uniformly between -21 (inclusive) and 12 (inclusive (!))
    # rounding to increments of 3 (includes 12)
    "qrandint": tune.qrandint(-21, 12, 3),

    # Sample a integer uniformly between 1 (inclusive) and 10 (exclusive),
    # while sampling in log space
    "lograndint": tune.lograndint(1, 10),

    # Sample a integer uniformly between 2 (inclusive) and 10 (inclusive (!)),
    # while sampling in log space and rounding to increments of 2
    "qlograndint": tune.qlograndint(2, 10, 2),

    # Sample an option uniformly from the specified choices
    "choice": tune.choice(["a", "b", "c"]),
}

若要深入了解如何在搜尋空間內自訂網域，請瀏覽有關自訂搜尋空間的 FLAML 文件。

微調條件約束

最後一個步驟是指定微調工作的條件約束。 flaml.tune 的一個值得注意的屬性是，它能夠在所需的資源條件約束內完成微調程序。若要這樣做，使用者可以使用 time_budget_s 引數，根據時鐘 (以秒為單位) 提供資源條件約束；或使用 num_samples 引數，根據試驗數目提供資源條件。

# Set a resource constraint of 60 seconds wall-clock time for the tuning.
flaml.tune.run(..., time_budget_s=60, ...)

# Set a resource constraint of 100 trials for the tuning.
flaml.tune.run(..., num_samples=100, ...)

# Use at most 60 seconds and at most 100 trials for the tuning.
flaml.tune.run(..., time_budget_s=60, num_samples=100, ...)

若要深入了解新增設定條件約束，您可以瀏覽 [FLAML 文件以取得進階微調選項]。

將它放在一起

定義微調準則之後，我們就可以執行微調試用。若要追蹤試用的結果，我們可以利用 MLFlow 自動記錄來擷取每個執行的計量和參數。此程式碼會擷取整個超參數微調試用，並醒目提示 FLAML 所探索的每個超參數組合。

import mlflow
mlflow.set_experiment("flaml_tune_experiment")
mlflow.autolog(exclusive=False)

with mlflow.start_run(nested=True, run_name="Child Run: "):
    analysis = tune.run(
        evaluate_config,  # the function to evaluate a config
        config=config_search_space,  # the search space defined
        metric="score",
        mode="min",  # the optimization mode, "min" or "max"
        num_samples=-1,  # the maximal number of configs to try, -1 means infinite
        time_budget_s=10,  # the time budget in seconds
    )

注意

啟用 MLflow 自動記錄後，應在 MLFlow 執行時自動記錄計量、參數和模型。不過，這會因架構而異。可能無法記錄特定模型的計量和參數。例如，不會記錄 XGBoost、LightGBM、Spark 和 SynapseML 模型的計量。您可以使用 MLFlow 自動記錄文件，深入了解從每個架構擷取哪些計量和參數。

使用 Apache Spark 進行平行微調

flaml.tune 功能支援微調 Apache Spark 和單一節點學習者。此外，當微調單一節點學習者 (例如 Scikit-Learn 學習者) 時，您也可以藉由設定 use_spark = True 來平行處理微調，以加速微調程序。針對 Spark 叢集，FLAML 預設將會為每個執行程式啟動一次試用。您也可以使用 n_concurrent_trials 引數來自訂同時試用的數目。


analysis = tune.run(
    evaluate_config,  # the function to evaluate a config
    config=config_search_space,  # the search space defined
    metric="score",
    mode="min",  # the optimization mode, "min" or "max"
    num_samples=-1,  # the maximal number of configs to try, -1 means infinite
    time_budget_s=10,  # the time budget in seconds
    use_spark=True,
)
print(analysis.best_trial.last_result)  # the best trial's result
print(analysis.best_config)  # the best config

若要深入了解如何平行處理微調試用，您可以瀏覽平行 Spark 工作的 FLAML 文件。

將結果視覺化

模組 flaml.visualization 提供公用程式函式，可使用 Plotly 繪製最佳化程序。藉由利用 Plotly，使用者可以互動方式探索其 AutoML 實驗結果。若要使用這些繪圖函式，僅需提供您的最佳化 flaml.AutoML 或 flaml.tune.tune.ExperimentAnalysis 物件作為輸入。

您可以在筆記本內使用下列函式：

plot_optimization_history：繪製實驗中所有試用的最佳化歷程記錄。
plot_feature_importance：為資料集中的每個功能繪製重要性。
plot_parallel_coordinate：在實驗中繪製高維度參數關聯性。
plot_contour：在實驗中將參數關聯性繪製為分佈圖。
plot_edf：繪製實驗的目標值 EDF (經驗分佈函式)。
plot_timeline：繪製實驗的時間軸。
plot_slice：在研究中將參數關聯性繪製為切片圖。
plot_param_importance：繪製實驗的超參數重要性。

共用方式為

超參數微調 (預覽版)

微調工作流程

微調目標

搜尋空間。

微調條件約束

將它放在一起

使用 Apache Spark 進行平行微調

將結果視覺化

意見反應

其他資源

共用方式為

超參數微調 (預覽版)

微調工作流程

微調目標

搜尋空間。

微調條件約束

將它放在一起

使用 Apache Spark 進行平行微調

將結果視覺化

相關內容

意見反應

其他資源