ModelDataCollector Class
Defines a model data collector that can be used to collect data in an Azure Machine Learning AKS WebService deployment to a blob storage.
The ModelDataCollector class enables you to define a data collector for your models in Azure Machine Learning AKS deployments. The data collector object can be used to collect model data, such as inputs and predictions, to the blob storage of the workspace. When model data collection is enabled in your deployment, collected data will show up in the following container path as csv files: /modeldata/{workspace_name}/{webservice_name}/{model_name}/{model_version}/{designation}/{year}/{month}/{day}/{collection_name}.csv
ModelDataCollector constructor.
When model data collection is enabled, data will be sent to the following container path: /modeldata/{workspace}/{webservice_name}/{model_name}/{model_version}/{designation}/{year}/{month}/{day}/{collection_name}.csv
- Inheritance
-
builtins.objectModelDataCollector
Constructor
ModelDataCollector(model_name, designation='default', feature_names=None, workspace='default/default/default', webservice_name='default', model_version='default', collection_name='default')
Parameters
Name | Description |
---|---|
model_name
Required
|
The name of the model that data is being collected for. |
designation
|
A unique designation for the data collection location. Supported designations are 'inputs', 'predictions', 'labels', 'signals, and 'general'. Default value: default
|
feature_names
|
A list of feature names that become the csv header when supplied. Default value: None
|
workspace
|
The identifier for the Azure Machine Learning workspace in the form of {subscription_id}/{resource_group}/{workspace_name}. This is populated automatically when models are operationalized through Azure Machine Learning. Default value: default/default/default
|
webservice_name
|
The name of the webservice to which this model is currently deployed. This is populated automatically when models are operationalized through Azure Machine Learning. Default value: default
|
model_version
|
The version of the model. This is populated automatically when models are operationalized through Azure Machine Learning. Default value: default
|
collection_name
|
The name of the file that ModelDataCollector collects data into. This param is only considered for 'signals' and 'general' designations. For the other types of designations, designation name is used as the file name. Default value: default
|
Remarks
Currently, ModelDataCollector only works in Azure Machine Learning AKS deployments. To collect model data within a deployment you need to perform following steps:
Update your image entry_script to add ModelDataCollector object(s) and collect statement(s). You can define multiple ModelDataCollector objects within a script, e.g. one for inputs and one for prediction for the same model. See the following class for more details on how to define and use an entry_script: InferenceConfig
Set enable_data_collection flag in your AKS model deployment step. Once a model is deployed, this flag can be used to turn on/off model data collection without modifying your entry_script. See the following class for more details on how to configure your model deployment: AksWebservice
The following code snippet shows how an entry_script would look like with ModelDataCollection:
from azureml.monitoring import ModelDataCollector
def init():
global inputs_dc
# Define your models and other scoring related objects
# ...
# Define input data collector to model "bestmodel". You need to define one object per model and
# designation. For the sake of simplicity, we are only defining one object here.
inputs_dc = ModelDataCollector(model_name="bestmodel", designation="inputs", feature_names=["f1", "f2"])
def run(raw_data):
global inputs_dc
# Convert raw_data to proper format and run prediction
# ...
# Use inputs_dc to collect data. For any data that you want to collect, you need to call collect method
# on respective ModelDataCollector objects. For the sake of simplicity, we are only working on a single
# object.
inputs_dc.collect(input_data)
The above example illustrates a couple of things about ModelDataCollector. First an object is defined per model and per designation, in this case "bestmodel" and "inputs". Second, ModelDataCollector expects tabular data as input and maintains the data as csv files. Optional feature names can be provided to set the header of these csv files.
The following code snippet shows how ModelDataCollector can be enabled during model deployment:
webservice_config = AksWebservice.deploy_configuration(collect_model_data=True)
Model.deploy(ws, "myservice", [model], inference_config, webservice_config, aks_cluster)
Once the Azure Machine Learning AKS WebService is deployed and scoring is run on the service, collected data will show up in the workspace's storage account. ModelDataCollector will partition the data for ease of access and use. All the data will be collected under "modeldata" storage container. Here is the partition format:
/modeldata/{workspace_name}/{webservice_name}/{model_name}/{model_version}/{designation}/{year}/{month}/{day}/{collection_name}.csv
Note that collection_name in the file name will only be considered for "signals" and "general" designations. For "inputs", "predictions", and "labels" file name will be set as {designation}.csv.
Methods
add_correlations |
Helper function to add correlation headers and values to given input data. |
collect |
Collect data to storage. |
add_correlations
Helper function to add correlation headers and values to given input data.
add_correlations(input_data, correlations)
Parameters
Name | Description |
---|---|
input_data
Required
|
The data to add correlation headers and values to. |
correlations
Required
|
Correlation headers and values that are returned from collect() function. |
Returns
Type | Description |
---|---|
input_data with added correlation headers and values. |
Remarks
Once the collect
is called, it will return a set of correlation headers and values. These include
metadata such as request id, timestamp, and a unique correlation id generated by ModelDataCollector or
provided as a parameter. These values can be used to analyze and correlate different types of data later.
The following example shows how to add correlations to both input data and prediction data. Note that
"inputs" designation type has the correlation data by default.
# Define inputs_dc and predictions_dc for the same model and "inputs" and "predictions" designations
# respectively
# ...
correlations = inputs_dc.collect(input_data)
predictions_data = predictions_dc.add_correlations(predictions_data, correlations)
predictions_dc.collect(predictions_data)
collect
Collect data to storage.
collect(input_data, user_correlation_id='')
Parameters
Name | Description |
---|---|
input_data
Required
|
The data to be collected. For dataframe types, if a header exists with feature names, this information is included in the data destination without needing to explicitly pass feature names in the ModelDataCollector constructor. |
user_correlation_id
Required
|
An optional correlation id uses to correlate this data later. |
Returns
Type | Description |
---|---|
A dictionary that contains correlation headers and values. |
Attributes
AML_DC_BOUNDARY_HEADER
AML_DC_BOUNDARY_HEADER = '$aml_dc_boundary'
AML_DC_CORRELATION_HEADER
AML_DC_CORRELATION_HEADER = '$aml_dc_correlation_id'
AML_DC_SCORING_TIMESTAMP_HEADER
AML_DC_SCORING_TIMESTAMP_HEADER = '$aml_dc_scoring_timestamp'
AML_MODEL_NAME_HEADER
AML_MODEL_NAME_HEADER = '$aml_model_name'
AML_MODEL_VERSION_HEADER
AML_MODEL_VERSION_HEADER = '$aml_model_version'
AML_REQUEST_ID_HEADER
AML_REQUEST_ID_HEADER = '$aml_request_id'
AML_SERVICE_NAME_HEADER
AML_SERVICE_NAME_HEADER = '$aml_service_name'
AML_WORKSPACE_HEADER
AML_WORKSPACE_HEADER = '$aml_workspace'
dllpath
dllpath = 'C:\\hostedtoolcache\\windows\\Python\\3.9.13\\x64\\lib\\site-packages\\azureml\\monitoring\\tools\\modeldatacollector\\lib\\native\\Windows'