ModelDataCollector Class

Reference

Defines a model data collector that can be used to collect data in an Azure Machine Learning AKS WebService deployment to a blob storage.

The ModelDataCollector class enables you to define a data collector for your models in Azure Machine Learning AKS deployments. The data collector object can be used to collect model data, such as inputs and predictions, to the blob storage of the workspace. When model data collection is enabled in your deployment, collected data will show up in the following container path as csv files: /modeldata/{workspace_name}/{webservice_name}/{model_name}/{model_version}/{designation}/{year}/{month}/{day}/{collection_name}.csv

ModelDataCollector constructor.

When model data collection is enabled, data will be sent to the following container path: /modeldata/{workspace}/{webservice_name}/{model_name}/{model_version}/{designation}/{year}/{month}/{day}/{collection_name}.csv

Inheritance: builtins.object

ModelDataCollector

Constructor

ModelDataCollector(model_name, designation='default', feature_names=None, workspace='default/default/default', webservice_name='default', model_version='default', collection_name='default')

Parameters

Name	Description
model_name Required	str The name of the model that data is being collected for.
designation	str A unique designation for the data collection location. Supported designations are 'inputs', 'predictions', 'labels', 'signals, and 'general'. Default value: default
feature_names	list A list of feature names that become the csv header when supplied. Default value: None
workspace	str The identifier for the Azure Machine Learning workspace in the form of {subscription_id}/{resource_group}/{workspace_name}. This is populated automatically when models are operationalized through Azure Machine Learning. Default value: default/default/default
webservice_name	str The name of the webservice to which this model is currently deployed. This is populated automatically when models are operationalized through Azure Machine Learning. Default value: default
model_version	str The version of the model. This is populated automatically when models are operationalized through Azure Machine Learning. Default value: default
collection_name	str The name of the file that ModelDataCollector collects data into. This param is only considered for 'signals' and 'general' designations. For the other types of designations, designation name is used as the file name. Default value: default

Remarks

Currently, ModelDataCollector only works in Azure Machine Learning AKS deployments. To collect model data within a deployment you need to perform following steps:

Update your image entry_script to add ModelDataCollector object(s) and collect statement(s). You can define multiple ModelDataCollector objects within a script, e.g. one for inputs and one for prediction for the same model. See the following class for more details on how to define and use an entry_script: InferenceConfig
Set enable_data_collection flag in your AKS model deployment step. Once a model is deployed, this flag can be used to turn on/off model data collection without modifying your entry_script. See the following class for more details on how to configure your model deployment: AksWebservice

The following code snippet shows how an entry_script would look like with ModelDataCollection:


   from azureml.monitoring import ModelDataCollector

   def init():
       global inputs_dc

       # Define your models and other scoring related objects
       # ...

       # Define input data collector to model "bestmodel". You need to define one object per model and
       # designation. For the sake of simplicity, we are only defining one object here.
       inputs_dc = ModelDataCollector(model_name="bestmodel", designation="inputs", feature_names=["f1", "f2"])

   def run(raw_data):
       global inputs_dc

       # Convert raw_data to proper format and run prediction
       # ...

       # Use inputs_dc to collect data. For any data that you want to collect, you need to call collect method
       # on respective ModelDataCollector objects. For the sake of simplicity, we are only working on a single
       # object.
       inputs_dc.collect(input_data)

The above example illustrates a couple of things about ModelDataCollector. First an object is defined per model and per designation, in this case "bestmodel" and "inputs". Second, ModelDataCollector expects tabular data as input and maintains the data as csv files. Optional feature names can be provided to set the header of these csv files.

The following code snippet shows how ModelDataCollector can be enabled during model deployment:


   webservice_config = AksWebservice.deploy_configuration(collect_model_data=True)
   Model.deploy(ws, "myservice", [model], inference_config, webservice_config, aks_cluster)

Once the Azure Machine Learning AKS WebService is deployed and scoring is run on the service, collected data will show up in the workspace's storage account. ModelDataCollector will partition the data for ease of access and use. All the data will be collected under "modeldata" storage container. Here is the partition format:

/modeldata/{workspace_name}/{webservice_name}/{model_name}/{model_version}/{designation}/{year}/{month}/{day}/{collection_name}.csv

Note that collection_name in the file name will only be considered for "signals" and "general" designations. For "inputs", "predictions", and "labels" file name will be set as {designation}.csv.

Methods

add_correlations	Helper function to add correlation headers and values to given input data.
collect	Collect data to storage.

add_correlations

Helper function to add correlation headers and values to given input data.

add_correlations(input_data, correlations)

Parameters

Name	Description
input_data Required	list, array, DataFrame, DataFrame The data to add correlation headers and values to.
correlations Required	dict Correlation headers and values that are returned from collect() function.

Returns

Type	Description
list, array, DataFrame, DataFrame	input_data with added correlation headers and values.

Remarks

Once the collect is called, it will return a set of correlation headers and values. These include metadata such as request id, timestamp, and a unique correlation id generated by ModelDataCollector or provided as a parameter. These values can be used to analyze and correlate different types of data later. The following example shows how to add correlations to both input data and prediction data. Note that "inputs" designation type has the correlation data by default.


   # Define inputs_dc and predictions_dc for the same model and "inputs" and "predictions" designations
   # respectively
   # ...

   correlations = inputs_dc.collect(input_data)
   predictions_data = predictions_dc.add_correlations(predictions_data, correlations)
   predictions_dc.collect(predictions_data)

collect

Collect data to storage.

collect(input_data, user_correlation_id='')

Parameters

Name	Description
input_data Required	list, array, DataFrame, DataFrame The data to be collected. For dataframe types, if a header exists with feature names, this information is included in the data destination without needing to explicitly pass feature names in the ModelDataCollector constructor.
user_correlation_id Required	str An optional correlation id uses to correlate this data later.

Returns

Type	Description
dict	A dictionary that contains correlation headers and values.

Attributes

AML_DC_BOUNDARY_HEADER

AML_DC_BOUNDARY_HEADER = '$aml_dc_boundary'

AML_DC_CORRELATION_HEADER

AML_DC_CORRELATION_HEADER = '$aml_dc_correlation_id'

AML_DC_SCORING_TIMESTAMP_HEADER

AML_DC_SCORING_TIMESTAMP_HEADER = '$aml_dc_scoring_timestamp'

AML_MODEL_NAME_HEADER

AML_MODEL_NAME_HEADER = '$aml_model_name'

AML_MODEL_VERSION_HEADER

AML_MODEL_VERSION_HEADER = '$aml_model_version'

AML_REQUEST_ID_HEADER

AML_REQUEST_ID_HEADER = '$aml_request_id'

AML_SERVICE_NAME_HEADER

AML_SERVICE_NAME_HEADER = '$aml_service_name'

AML_WORKSPACE_HEADER

AML_WORKSPACE_HEADER = '$aml_workspace'

dllpath

dllpath = 'C:\\hostedtoolcache\\windows\\Python\\3.9.13\\x64\\lib\\site-packages\\azureml\\monitoring\\tools\\modeldatacollector\\lib\\native\\Windows'

Share via

ModelDataCollector Class

Constructor

Parameters

Remarks

Methods

add_correlations

Parameters

Returns

Remarks

collect

Parameters

Returns

Attributes

AML_DC_BOUNDARY_HEADER

AML_DC_CORRELATION_HEADER

AML_DC_SCORING_TIMESTAMP_HEADER

AML_MODEL_NAME_HEADER

AML_MODEL_VERSION_HEADER

AML_REQUEST_ID_HEADER

AML_SERVICE_NAME_HEADER

AML_WORKSPACE_HEADER

dllpath

Feedback

Additional resources