在工作區功能存放區中使用功能資料表（舊版）

發行項
03/29/2025

注意

本文件涵蓋工作區功能存放區。 Databricks 建議使用 Unity Catalog 中的特徵工程。工作區功能存放區將於未來停止使用。

如需在 Unity 目錄中使用功能資料表的詳細資訊，請參閱在 Unity 目錄中使用功能資料表。

此頁面描述如何在工作區功能存放區中建立和使用功能資料表。

注意

如果您的工作區已啟用 Unity Catalog，則具有主索引鍵的 Unity Catalog 管理的所有資料表都會自動成為功能資料表，您可以用於模型訓練和推斷。所有 Unity Catalog 功能 (例如安全性、譜系、標記和跨工作區存取) 都自動可供功能資料表使用。如需在已啟用 Unity 目錄的工作區中使用功能資料表的相關信息，請參閱在 Unity 目錄中使用功能資料表。

如需追蹤功能譜系和新鮮度的相關信息，請參閱探索功能與追蹤工作區功能存放區中的功能譜系（舊版）。

注意

資料庫和功能資料表名稱只可包含英數字元和底線 (_)。

建立功能資料表的資料庫

建立任何功能資料表之前，您必須先建立資料庫來儲存它們。

%sql CREATE DATABASE IF NOT EXISTS <database-name>

功能資料表會儲存為 Delta 資料表。當您使用 create_table (功能存放區用戶端 v0.3.6 和更新版本) 或 create_feature_table (v0.3.5 和以下版本) 建立功能資料表時，您必須指定資料庫名稱。首先，這個引數會在資料庫 customer_features 中建立名為 recommender_system 的 Delta 資料表。

name='recommender_system.customer_features'

當您將功能資料表發佈至線上商店時，預設資料表和資料庫名稱就是您在建立資料表時指定的資料表；您可以使用 publish_table 方法指定不同的名稱。

Databricks Feature Store UI 會顯示線上商店中資料表和資料庫的名稱，以及其他中繼資料。

在 Databricks Feature Store 建立功能資料表

注意

您現在可以將現有的 Delta 資料表註冊為功能資料表。請參閱將現有的 Delta 資料表註冊為功能資料表。

建立功能資料表的基本步驟如下：

撰寫 Python 函式來計算功能。每個函式的輸出應該是具有唯一主索引鍵的 Apache Spark DataFrame。主鍵可能包含一個或多個資料行。
藉由具現化 FeatureStoreClient 和使用 create_table (v0.3.6 和更新版本) 或 create_feature_table (v0.3.5 和以下版本) 來建立功能資料表。
使用 write_table 填入功能資料表。

如需下列範例中使用的命令和參數詳細資料，請參閱功能存放區 Python API 參照。

V0.3.6 及以上

from databricks.feature_store import feature_table

def compute_customer_features(data):
  ''' Feature computation code returns a DataFrame with 'customer_id' as primary key'''
  pass

# create feature table keyed by customer_id
# take schema from DataFrame output by compute_customer_features
from databricks.feature_store import FeatureStoreClient

customer_features_df = compute_customer_features(df)

fs = FeatureStoreClient()

customer_feature_table = fs.create_table(
  name='recommender_system.customer_features',
  primary_keys='customer_id',
  schema=customer_features_df.schema,
  description='Customer features'
)

# An alternative is to use `create_table` and specify the `df` argument.
# This code automatically saves the features to the underlying Delta table.

# customer_feature_table = fs.create_table(
#  ...
#  df=customer_features_df,
#  ...
# )

# To use a composite key, pass all keys in the create_table call

# customer_feature_table = fs.create_table(
#   ...
#   primary_keys=['customer_id', 'date'],
#   ...
# )

# Use write_table to write data to the feature table
# Overwrite mode does a full refresh of the feature table

fs.write_table(
  name='recommender_system.customer_features',
  df = customer_features_df,
  mode = 'overwrite'
)

V0.3.5 及以下

from databricks.feature_store import feature_table

def compute_customer_features(data):
  ''' Feature computation code returns a DataFrame with 'customer_id' as primary key'''
  pass

# create feature table keyed by customer_id
# take schema from DataFrame output by compute_customer_features
from databricks.feature_store import FeatureStoreClient

customer_features_df = compute_customer_features(df)

fs = FeatureStoreClient()

customer_feature_table = fs.create_feature_table(
  name='recommender_system.customer_features',
  keys='customer_id',
  schema=customer_features_df.schema,
  description='Customer features'
)

# An alternative is to use `create_feature_table` and specify the `features_df` argument.
# This code automatically saves the features to the underlying Delta table.

# customer_feature_table = fs.create_feature_table(
#  ...
#  features_df=customer_features_df,
#  ...
# )

# To use a composite key, pass all keys in the create_feature_table call

# customer_feature_table = fs.create_feature_table(
#   ...
#   keys=['customer_id', 'date'],
#   ...
# )

# Use write_table to write data to the feature table
# Overwrite mode does a full refresh of the feature table

fs.write_table(
  name='recommender_system.customer_features',
  df = customer_features_df,
  mode = 'overwrite'
)from databricks.feature_store import feature_table

def compute_customer_features(data):
  ''' Feature computation code returns a DataFrame with 'customer_id' as primary key'''
  pass

# create feature table keyed by customer_id
# take schema from DataFrame output by compute_customer_features
from databricks.feature_store import FeatureStoreClient

customer_features_df = compute_customer_features(df)

fs = FeatureStoreClient()

customer_feature_table = fs.create_feature_table(
  name='recommender_system.customer_features',
  keys='customer_id',
  schema=customer_features_df.schema,
  description='Customer features'
)

# An alternative is to use `create_feature_table` and specify the `features_df` argument.
# This code automatically saves the features to the underlying Delta table.

# customer_feature_table = fs.create_feature_table(
#  ...
#  features_df=customer_features_df,
#  ...
# )

# To use a composite key, pass all keys in the create_feature_table call

# customer_feature_table = fs.create_feature_table(
#   ...
#   keys=['customer_id', 'date'],
#   ...
# )

# Use write_table to write data to the feature table
# Overwrite mode does a full refresh of the feature table

fs.write_table(
  name='recommender_system.customer_features',
  df = customer_features_df,
  mode = 'overwrite'
)

將現有的 Delta 資料表註冊為功能資料表

在 v0.3.8 及更新版本中，您可以將現有的 Delta 資料表註冊為功能資料表。 Delta 資料表必須存在於中繼存放區中。

注意

若要更新已註冊的功能資料表，必須使用功能存放區 Python API。

fs.register_table(
  delta_table='recommender.customer_features',
  primary_keys='customer_id',
  description='Customer features'
)

控制對功能資料表的存取

請參閱控制工作區功能存放區中功能數據表的存取權（舊版）。

更新功能資料表

您可以藉由新增功能，或修改以主索引鍵為基礎的特定資料列，更新功能資料表。

下列功能資料表中繼資料無法更新：

主鍵
分區鍵
現有功能的名稱或類型

將新功能新增至現有功能資料表

可以使用下列兩種方式之一，將新功能新增至現有功能資料表：

更新現有的功能計算函式，並使用傳回的 DataFrame 執行 write_table。這會更新功能資料表結構描述，並根據主索引鍵合併新的功能值。
建立新的功能計算函式來計算新的功能值。這個新計算函式傳回的 DataFrame 必須包含特徵表的主鍵和分割鍵（如果已定義）。使用 DataFrame 執行 write_table，以使用相同的主索引鍵將新功能寫入現有的功能資料表。

僅更新功能資料表中的特定資料列

在 mode = "merge" 中使用 write_table。在 write_table 呼叫中傳送的 DataFrame 裡，主索引鍵不存在的資料列將保持不變。

fs.write_table(
  name='recommender.customer_features',
  df = customer_features_df,
  mode = 'merge'
)

安排任務來更新特徵資料表

為了確保功能資料表中的功能一律具有最新的值，Databricks 建議您建立一個工作來執行筆記本，定期 (例如每天) 更新功能資料表。如果您已建立非排程工作，可以將其轉換成排程工作，以確保功能值一律為最新狀態。請參閱使用 Databricks 作業協調流程。

更新功能資料表的程式碼會使用 mode='merge'，如下列範例所示。

fs = FeatureStoreClient()

customer_features_df = compute_customer_features(data)

fs.write_table(
  df=customer_features_df,
  name='recommender_system.customer_features',
  mode='merge'
)

儲存每日特徵的過去值

使用複合主索引鍵定義功能資料表。在主鍵中加入日期。例如，對於功能資料表 store_purchases，可以使用複合主索引鍵 (date、user_id) 和資料分割索引鍵 date，以有效率地讀取。

fs.create_table(
  name='recommender_system.customer_features',
  primary_keys=['date', 'customer_id'],
  partition_columns=['date'],
  schema=customer_features_df.schema,
  description='Customer features'
)

然後，您可以建立程式碼，以從篩選 date 的功能資料表中讀取到感興趣的時間週期。

您也可以透過使用 timestamp_keys 引數，指定 date 資料行作為時間戳索引鍵，來建立時間序列特徵表。

fs.create_table(
  name='recommender_system.customer_features',
  primary_keys=['date', 'customer_id'],
  timestamp_keys=['date'],
  schema=customer_features_df.schema,
  description='Customer timeseries features'
)

當您使用 create_training_set 或 score_batch 時，這會啟用時點查詢。系統會使用您指定的 timestamp_lookup_key 來執行截至時間戳聯結。

若要讓功能資料表保持最新狀態，請設定定期排程的工作，以將功能寫入功能資料表，或將新的功能值串流至功能資料表。

建立串流功能計算管線以更新功能

若要建立串流功能計算管線，請將串流 DataFrame 作為引數傳遞至 write_table。此方法會傳回 StreamingQuery 物件。

def compute_additional_customer_features(data):
  ''' Returns Streaming DataFrame
  '''
  pass  # not shown

customer_transactions = spark.readStream.load("dbfs:/events/customer_transactions")
stream_df = compute_additional_customer_features(customer_transactions)

fs.write_table(
  df=stream_df,
  name='recommender_system.customer_features',
  mode='merge'
)

從功能資料表讀取

使用 read_table 來讀取功能值。

fs = feature_store.FeatureStoreClient()
customer_features_df = fs.read_table(
  name='recommender.customer_features',
)

搜尋和瀏覽功能資料表

使用功能存放區 UI 來搜尋或瀏覽功能資料表。

在側邊欄中，選取[機器學習功能存放區] 以顯示功能存放區 UI。
在搜尋框中，輸入功能資料表、功能或用於功能計算之資料來源的所有或部分名稱。也可以輸入標記的全部或部分鍵或值。搜尋文字不區分大小寫。

取得功能資料表中繼資料

取得功能資料表中繼資料的 API 取決於您所使用的 Databricks Runtime 版本。對於 v0.3.6 和更新版本，請使用 get_table。對於 v0.3.5 和以下版本，請使用 get_feature_table。

# this example works with v0.3.6 and above
# for v0.3.5, use `get_feature_table`
from databricks.feature_store import FeatureStoreClient
fs = FeatureStoreClient()
fs.get_table("feature_store_example.user_feature_table")

使用功能資料表標記

標記是您可以建立的索引鍵/值組，用來搜尋功能資料表。您可以使用功能存放區 UI 或功能存放區 Python API 來建立、編輯和刪除標記。

在 UI 中使用功能資料表標記

使用功能存放區 UI 來搜尋或瀏覽功能資料表。若要存取 UI，請在側邊欄中選取 [機器學習] > [功能存放區]。

使用功能存放區 UI 新增標記

如果尚未開啟，請按下。標籤表格會隨即出現。
按下 [名稱] 和 [值] 欄位，然後輸入標記的索引鍵和值。
按一下新增。

使用功能存放區 UI 編輯或刪除標記

若要編輯或刪除現有的標記，請使用 [動作] 資料行中的圖示。

標籤操作

使用特徵儲存庫 Python API 處理功能資料表標籤

在執行 v0.4.1 和更新版本之叢集上，您可以使用功能存放區 Python API 建立、編輯和刪除標記。

需求

功能存放區用戶端 v0.4.1 和更新版本

使用功能存放區 Python API 建立具有標記的功能資料表

from databricks.feature_store import FeatureStoreClient
fs = FeatureStoreClient()

customer_feature_table = fs.create_table(
  ...
  tags={"tag_key_1": "tag_value_1", "tag_key_2": "tag_value_2", ...},
  ...
)

使用功能存放區 Python API 新增、更新和刪除標記

from databricks.feature_store import FeatureStoreClient
fs = FeatureStoreClient()

# Upsert a tag
fs.set_feature_table_tag(table_name="my_table", key="quality", value="gold")

# Delete a tag
fs.delete_feature_table_tag(table_name="my_table", key="quality")

更新功能資料表的資料來源

功能存放區會自動追蹤用來計算功能的資料來源。現在還可以使用功能存放區 Python API 手動更新資料來源。

需求

功能庫用戶端版本0.5.0及以上

使用功能存放區 Python API 新增資料來源

以下是一些命令範例：如需詳細資料，請參閱 API 文件。

from databricks.feature_store import FeatureStoreClient
fs = FeatureStoreClient()

# Use `source_type="table"` to add a table in the metastore as data source.
fs.add_data_sources(feature_table_name="clicks", data_sources="user_info.clicks", source_type="table")

# Use `source_type="path"` to add a data source in path format.
fs.add_data_sources(feature_table_name="user_metrics", data_sources="dbfs:/FileStore/user_metrics.json", source_type="path")

# Use `source_type="custom"` if the source is not a table or a path.
fs.add_data_sources(feature_table_name="user_metrics", data_sources="user_metrics.txt", source_type="custom")

使用功能存放區 Python API 刪除資料來源

如需詳細資料，請參閱 API 文件。

注意

下列命令會刪除符合來源名稱的所有類型的資料來源 (「資料表」、「路徑」和「自訂」。

from databricks.feature_store import FeatureStoreClient
fs = FeatureStoreClient()
fs.delete_data_sources(feature_table_name="clicks", sources_names="user_info.clicks")

刪除功能資料表

可以使用功能存放區 UI 或功能存放區 Python API 刪除功能資料表。

注意

刪除功能資料表可能會導致上游生產者和下游取用者 (模型、端點和排程工作) 發生非預期的失敗。必須透過雲端提供者刪除已發佈的線上商店。
當您使用 API 刪除功能資料表時，基礎 Delta 資料表也會卸除。從 UI 刪除功能資料表時，必須個別卸除基礎 Delta 資料表。

使用 UI 刪除功能表格

在功能資料表頁面上，按下功能資料表名稱右邊的，然後選取 [刪除]。如果沒有功能資料表的 [可以管理] 權限，您不會看到此選項。
在 [刪除功能資料表] 對話方塊中，按下 [刪除] 以確認。
如果同時想要刪除基礎 Delta 資料表，請在編輯器中執行下列命令。
```
%sql DROP TABLE IF EXISTS <feature-table-name>;
```

使用功能存放區 Python API 刪除功能資料表

透過功能存放區用戶端 v0.4.1和更新版本，可以使用 drop_table 刪除功能資料表。當您使用 drop_table 刪除資料表時，底層的 Delta 資料表也會被刪除。

fs.drop_table(
  name='recommender_system.customer_features'
)

共用方式為

在工作區功能存放區中使用功能資料表（舊版）

建立功能資料表的資料庫

在 Databricks Feature Store 建立功能資料表

V0.3.6 及以上

V0.3.5 及以下

將現有的 Delta 資料表註冊為功能資料表

控制對功能資料表的存取

更新功能資料表

將新功能新增至現有功能資料表

僅更新功能資料表中的特定資料列

安排任務來更新特徵資料表

儲存每日特徵的過去值

建立串流功能計算管線以更新功能

從功能資料表讀取

搜尋和瀏覽功能資料表

取得功能資料表中繼資料

使用功能資料表標記

在 UI 中使用功能資料表標記

使用功能存放區 UI 新增標記

使用功能存放區 UI 編輯或刪除標記

使用特徵儲存庫 Python API 處理功能資料表標籤

需求

使用功能存放區 Python API 建立具有標記的功能資料表

使用功能存放區 Python API 新增、更新和刪除標記

更新功能資料表的資料來源

需求

使用功能存放區 Python API 新增資料來源

使用功能存放區 Python API 刪除資料來源

刪除功能資料表

使用 UI 刪除功能表格

使用功能存放區 Python API 刪除功能資料表

意見反應

其他資源

共用方式為

在工作區功能存放區中使用功能資料表 （舊版）

建立功能資料表的資料庫

在 Databricks Feature Store 建立功能資料表

V0.3.6 及以上

V0.3.5 及以下

將現有的 Delta 資料表註冊為功能資料表

控制對功能資料表的存取

更新功能資料表

將新功能新增至現有功能資料表

僅更新功能資料表中的特定資料列

安排任務來更新特徵資料表

儲存每日特徵的過去值

建立串流功能計算管線以更新功能

從功能資料表讀取

搜尋和瀏覽功能資料表

取得功能資料表中繼資料

使用功能資料表標記

在 UI 中使用功能資料表標記

使用功能存放區 UI 新增標記

使用功能存放區 UI 編輯或刪除標記

使用特徵儲存庫 Python API 處理功能資料表標籤

需求

使用功能存放區 Python API 建立具有標記的功能資料表

使用功能存放區 Python API 新增、更新和刪除標記

更新功能資料表的資料來源

需求

使用功能存放區 Python API 新增資料來源

使用功能存放區 Python API 刪除資料來源

刪除功能資料表

使用 UI 刪除功能表格

使用功能存放區 Python API 刪除功能資料表

意見反應

其他資源

在工作區功能存放區中使用功能資料表（舊版）