評估機器學習模型

10 分鐘

您已將預測性模型定型。您如何知道它是否有任何優點？

若要評估模型，您必須使用保留的驗證資料。針對受監督的機器學習模型，此方法可讓您比較模型所預測的標籤與驗證資料集中的實際標籤。透過比較預測與真實的標籤值，您可以計算一系列評估計量，以將模型的預測效能量化。

評估迴歸模型

迴歸模型會預測數值，因此，模型預測效能的任何評估都需要您考慮預測值與驗證資料集中實際標籤值之間的差值。由於驗證資料集包含多個案例，其中有些可能會比其他案例具備更精確的預測，因此，您需要某種方式來彙總差值，並判斷效能的整體計量。用來評估迴歸模型的一般計量包括：

平均平方誤差 (MSE)：此計量的計算方法是，對每個預測與實際值之間的差值進行平方，並將平方的差值相加，然後計算平均值。將值平方，就能使差值成為「絕對值」(忽略差值是否為正數或負數)，並為較大的差值提供更多權數。
均方根誤差 (RMSE)：雖然 MSE 計量是模型預測中錯誤層級的良好指示，但與標籤的實際測量單位無關。例如，在預測銷售 (以美元為單位) 的模型中，MSE 值實際上代表平方的美元值。為了評估預測與美元的偏差程度，您需要計算 MSE 的平方根。
決定係數 (R²)：R² 計量會測量平方特徵與預測值之間的相互關聯。這會產生介於 0 和 1 之間的值，以測量模型可說明的變異數量。這個值越接近 1，模型就能越精準地預測。

大部分的機器學習架構都會提供類別來為您計算這些計量。例如，Spark MLlib 程式庫提供 RegressionEvaluator 類別，如下列程式碼範例所示：

from pyspark.ml.evaluation import RegressionEvaluator

# Inference predicted labels from validation data
predictions_df = model.transform(validation_df)

# Assume predictions_df includes a 'prediction' column with the predicted labels
# and a 'label' column with the actual known label values

# Use an evaluator to get metrics
evaluator = RegressionEvaluator()
evaluator.setPredictionCol("prediction")
mse = evaluator.evaluate(predictions_df, {evaluator.metricName: "mse"})
rmse = evaluator.evaluate(predictions_df, {evaluator.metricName: "rmse"})
r2 = evaluator.evaluate(predictions_df, {evaluator.metricName: "r2"})
print("MSE:", str(mse))
print("RMSE:", str(rmse))
print("R2", str(r2))

評估分類模型

分類模型會計算每個可能類別的機率值，並選取具有最高機率的類別標籤，以預測類別標籤 (「類別」)。相較於驗證資料集中實際已知的標籤時，用來評估分類模型的計量會反映這些類別預測的正確頻率。用來評估分類模型的一般計量包括：

正確性：簡單的計量，指出模型所做出正確類別預測的比例。雖然這似乎是評估分類模型效能的明顯方式，但請考慮使用模型來預測某人將搭乘汽車、公車或電車通勤上班的案例。假設驗證集中有 95% 的案例使用汽車、3% 搭公車，而有 2% 會搭乘電車。總是簡單預測汽車的模型正確性將是 95%，即使它實際上沒有預測能力能夠區分這三個類別。
每個類別的計量：
- 精確度：指定類別正確預測的比例。這會透過「確判為真」(此類別的正確預測) 的數量除以此類別的預測總數 (包括「誤判為真」) 來衡量。
- 召回率：正確預測此類別實際執行個體的比例 (「確判為真」除以驗證資料集中此類別執行個體的總數，包括「誤判為真」，此為模型未正確預測不同類別的案例)。
- F1 分數：合併精確度和召回率的計量 (計算為精確度和召回率的「調和平均數」)。
針對所有類別合併 (加權) 的精確度、召回率和 F1 計量。

就迴歸而言，大部分的機器學習架構都包含可計算分類計量的類別。例如，下列程式碼會使用 Spark MLlib 程式庫中的 MulticlassClassificationEvaluator。

from pyspark.ml.evaluation import MulticlassClassificationEvaluator

# Inference predicted labels from validation data
predictions_df = model.transform(validation_df)

# Assume predictions_df includes a 'prediction' column with the predicted labels
# and a 'label' column with the actual known label values

# Use an evaluator to get metrics
accuracy = evaluator.evaluate(predictions_df, {evaluator.metricName:"accuracy"})
print("Accuracy:", accuracy)

labels = [0,1,2]
print("\nIndividual class metrics:")
for label in sorted(labels):
    print ("Class %s" % (label))
    precision = evaluator.evaluate(predictions_df, {evaluator.metricLabel:label,
                                                    evaluator.metricName:"precisionByLabel"})
    print("\tPrecision:", precision)
    recall = evaluator.evaluate(predictions_df, {evaluator.metricLabel:label,
                                                 evaluator.metricName:"recallByLabel"})
    print("\tRecall:", recall)
    f1 = evaluator.evaluate(predictions_df, {evaluator.metricLabel:label,
                                             evaluator.metricName:"fMeasureByLabel"})
    print("\tF1 Score:", f1)
    
overallPrecision = evaluator.evaluate(predictions_df, {evaluator.metricName:"weightedPrecision"})
print("Overall Precision:", overallPrecision)
overallRecall = evaluator.evaluate(predictions_df, {evaluator.metricName:"weightedRecall"})
print("Overall Recall:", overallRecall)
overallF1 = evaluator.evaluate(predictions_df, {evaluator.metricName:"weightedFMeasure"})
print("Overall F1 Score:", overallF1)

評估非監督式叢集模型

非監督式叢集模型沒有已知的真實標籤值。叢集模型的目標是根據其特徵，將類似的案例分組到叢集。若要評估叢集，您需要一個計量來指出叢集之間的分隔層級。您可以將叢集案例視為多維度空間中的繪製點。相同叢集中的點應該彼此接近，而且遠離不同叢集中的點。

這類計量之一是 Silhouette 量值，它會計算平方的 Euclidean 距離，並提供叢集內一致性的指示。 Silhouette 值可以介於 1 到 -1 之間，其值接近 1，表示叢集中的點接近相同叢集中的其他點，而且遠離其他叢集的點。

Spark MLlib 程式庫提供 ClusteringEvaluator 類別，其會計算叢集模型所做預測的 Silhouette，如下所示：

from pyspark.ml.evaluation import ClusteringEvaluator
from pyspark.ml.linalg import Vectors

# Inference predicted labels from validation data
predictions_df = model.transform(validation_df)

# Assume predictions_df includes a 'prediction' column with the predicted cluster

# Use an evaluator to get metrics
evaluator = ClusteringEvaluator(predictionCol="prediction")
silhouetteVal  = evaluator.evaluate(predictions_df)
print(silhouetteVal)

評估機器學習模型

評估迴歸模型

評估分類模型

評估非監督式叢集模型

意見反應