Microsoft.MachineLearningServices workspaces/onlineEndpoints/deployments 2024-10-01
- Latest
- 2024-10-01
- 2024-10-01-preview
- 2024-07-01-preview
- 2024-04-01
- 2024-04-01-preview
- 2024-01-01-preview
- 2023-10-01
- 2023-08-01-preview
- 2023-06-01-preview
- 2023-04-01
- 2023-04-01-preview
- 2023-02-01-preview
- 2022-12-01-preview
- 2022-10-01
- 2022-10-01-preview
- 2022-06-01-preview
- 2022-05-01
- 2022-02-01-preview
- 2021-03-01-preview
Bicep resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups - See resource group deployment commands
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Bicep to your template.
resource symbolicname 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2024-10-01' = {
parent: resourceSymbolicName
identity: {
type: 'string'
userAssignedIdentities: {
{customized property}: {}
}
}
kind: 'string'
location: 'string'
name: 'string'
properties: {
appInsightsEnabled: bool
codeConfiguration: {
codeId: 'string'
scoringScript: 'string'
}
dataCollector: {
collections: {
{customized property}: {
clientId: 'string'
dataCollectionMode: 'string'
dataId: 'string'
samplingRate: int
}
}
requestLogging: {
captureHeaders: [
'string'
]
}
rollingRate: 'string'
}
description: 'string'
egressPublicNetworkAccess: 'string'
environmentId: 'string'
environmentVariables: {
{customized property}: 'string'
}
instanceType: 'string'
livenessProbe: {
failureThreshold: int
initialDelay: 'string'
period: 'string'
successThreshold: int
timeout: 'string'
}
model: 'string'
modelMountPath: 'string'
properties: {
{customized property}: 'string'
}
readinessProbe: {
failureThreshold: int
initialDelay: 'string'
period: 'string'
successThreshold: int
timeout: 'string'
}
requestSettings: {
maxConcurrentRequestsPerInstance: int
maxQueueWait: 'string'
requestTimeout: 'string'
}
scaleSettings: {
scaleType: 'string'
// For remaining properties, see OnlineScaleSettings objects
}
endpointComputeType: 'string'
// For remaining properties, see OnlineDeploymentProperties objects
}
sku: {
capacity: int
family: 'string'
name: 'string'
size: 'string'
tier: 'string'
}
tags: {
{customized property}: 'string'
}
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
{
containerResourceRequirements: {
containerResourceLimits: {
cpu: 'string'
gpu: 'string'
memory: 'string'
}
containerResourceRequests: {
cpu: 'string'
gpu: 'string'
memory: 'string'
}
}
endpointComputeType: 'Kubernetes'
}
For Managed, use:
{
endpointComputeType: 'Managed'
}
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
{
scaleType: 'Default'
}
For TargetUtilization, use:
{
maxInstances: int
minInstances: int
pollingInterval: 'string'
scaleType: 'TargetUtilization'
targetUtilizationPercentage: int
}
Property values
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] (required) |
Collection
Name | Description | Value |
---|---|---|
clientId | The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. | string |
dataCollectionMode | Enable or disable data collection. | 'Disabled' 'Enabled' |
dataId | The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. | string |
samplingRate | The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. | int |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
DataCollector
Name | Description | Value |
---|---|---|
collections | [Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string. Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging. |
DataCollectorCollections (required) |
requestLogging | The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. | RequestLogging |
rollingRate | When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly. |
'Day' 'Hour' 'Minute' 'Month' 'Year' |
DataCollectorCollections
Name | Description | Value |
---|
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
location | The geo-location where the resource lives | string (required) |
name | The resource name | string Constraints: Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required) |
parent | In Bicep, you can specify the parent resource for a child resource. You only need to add this property when the child resource is declared outside of the parent resource. For more information, see Child resource outside parent resource. |
Symbolic name for resource of type: workspaces/onlineEndpoints |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
tags | Resource tags | Dictionary of tag names and values. See Tags in templates |
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
dataCollector | The mdc configuration, we disable mdc when it's null. | DataCollector |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
endpointComputeType | Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. | 'Kubernetes' 'Managed' (required) |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. Default: Standard_F4s_v2. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | (Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays) |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. | 'Default' 'TargetUtilization' (required) |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
RequestLogging
Name | Description | Value |
---|---|---|
captureHeaders | For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. | string[] |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
TrackedResourceTags
Name | Description | Value |
---|
UserAssignedIdentities
Name | Description | Value |
---|
UserAssignedIdentity
Name | Description | Value |
---|
ARM template resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups - See resource group deployment commands
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following JSON to your template.
{
"type": "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments",
"apiVersion": "2024-10-01",
"name": "string",
"identity": {
"type": "string",
"userAssignedIdentities": {
"{customized property}": {
}
}
},
"kind": "string",
"location": "string",
"properties": {
"appInsightsEnabled": "bool",
"codeConfiguration": {
"codeId": "string",
"scoringScript": "string"
},
"dataCollector": {
"collections": {
"{customized property}": {
"clientId": "string",
"dataCollectionMode": "string",
"dataId": "string",
"samplingRate": "int"
}
},
"requestLogging": {
"captureHeaders": [ "string" ]
},
"rollingRate": "string"
},
"description": "string",
"egressPublicNetworkAccess": "string",
"environmentId": "string",
"environmentVariables": {
"{customized property}": "string"
},
"instanceType": "string",
"livenessProbe": {
"failureThreshold": "int",
"initialDelay": "string",
"period": "string",
"successThreshold": "int",
"timeout": "string"
},
"model": "string",
"modelMountPath": "string",
"properties": {
"{customized property}": "string"
},
"readinessProbe": {
"failureThreshold": "int",
"initialDelay": "string",
"period": "string",
"successThreshold": "int",
"timeout": "string"
},
"requestSettings": {
"maxConcurrentRequestsPerInstance": "int",
"maxQueueWait": "string",
"requestTimeout": "string"
},
"scaleSettings": {
"scaleType": "string"
// For remaining properties, see OnlineScaleSettings objects
},
"endpointComputeType": "string"
// For remaining properties, see OnlineDeploymentProperties objects
},
"sku": {
"capacity": "int",
"family": "string",
"name": "string",
"size": "string",
"tier": "string"
},
"tags": {
"{customized property}": "string"
}
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
{
"containerResourceRequirements": {
"containerResourceLimits": {
"cpu": "string",
"gpu": "string",
"memory": "string"
},
"containerResourceRequests": {
"cpu": "string",
"gpu": "string",
"memory": "string"
}
},
"endpointComputeType": "Kubernetes"
}
For Managed, use:
{
"endpointComputeType": "Managed"
}
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
{
"scaleType": "Default"
}
For TargetUtilization, use:
{
"maxInstances": "int",
"minInstances": "int",
"pollingInterval": "string",
"scaleType": "TargetUtilization",
"targetUtilizationPercentage": "int"
}
Property values
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] (required) |
Collection
Name | Description | Value |
---|---|---|
clientId | The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. | string |
dataCollectionMode | Enable or disable data collection. | 'Disabled' 'Enabled' |
dataId | The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. | string |
samplingRate | The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. | int |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
DataCollector
Name | Description | Value |
---|---|---|
collections | [Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string. Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging. |
DataCollectorCollections (required) |
requestLogging | The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. | RequestLogging |
rollingRate | When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly. |
'Day' 'Hour' 'Minute' 'Month' 'Year' |
DataCollectorCollections
Name | Description | Value |
---|
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
apiVersion | The api version | '2024-10-01' |
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
location | The geo-location where the resource lives | string (required) |
name | The resource name | string Constraints: Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required) |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
tags | Resource tags | Dictionary of tag names and values. See Tags in templates |
type | The resource type | 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments' |
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
dataCollector | The mdc configuration, we disable mdc when it's null. | DataCollector |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
endpointComputeType | Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. | 'Kubernetes' 'Managed' (required) |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. Default: Standard_F4s_v2. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | (Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays) |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. | 'Default' 'TargetUtilization' (required) |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
RequestLogging
Name | Description | Value |
---|---|---|
captureHeaders | For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. | string[] |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
TrackedResourceTags
Name | Description | Value |
---|
UserAssignedIdentities
Name | Description | Value |
---|
UserAssignedIdentity
Name | Description | Value |
---|
Terraform (AzAPI provider) resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Terraform to your template.
resource "azapi_resource" "symbolicname" {
type = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2024-10-01"
name = "string"
identity = {
type = "string"
userAssignedIdentities = {
{customized property} = {
}
}
}
kind = "string"
location = "string"
sku = {
capacity = int
family = "string"
name = "string"
size = "string"
tier = "string"
}
tags = {
{customized property} = "string"
}
body = jsonencode({
properties = {
appInsightsEnabled = bool
codeConfiguration = {
codeId = "string"
scoringScript = "string"
}
dataCollector = {
collections = {
{customized property} = {
clientId = "string"
dataCollectionMode = "string"
dataId = "string"
samplingRate = int
}
}
requestLogging = {
captureHeaders = [
"string"
]
}
rollingRate = "string"
}
description = "string"
egressPublicNetworkAccess = "string"
environmentId = "string"
environmentVariables = {
{customized property} = "string"
}
instanceType = "string"
livenessProbe = {
failureThreshold = int
initialDelay = "string"
period = "string"
successThreshold = int
timeout = "string"
}
model = "string"
modelMountPath = "string"
properties = {
{customized property} = "string"
}
readinessProbe = {
failureThreshold = int
initialDelay = "string"
period = "string"
successThreshold = int
timeout = "string"
}
requestSettings = {
maxConcurrentRequestsPerInstance = int
maxQueueWait = "string"
requestTimeout = "string"
}
scaleSettings = {
scaleType = "string"
// For remaining properties, see OnlineScaleSettings objects
}
endpointComputeType = "string"
// For remaining properties, see OnlineDeploymentProperties objects
}
})
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
{
containerResourceRequirements = {
containerResourceLimits = {
cpu = "string"
gpu = "string"
memory = "string"
}
containerResourceRequests = {
cpu = "string"
gpu = "string"
memory = "string"
}
}
endpointComputeType = "Kubernetes"
}
For Managed, use:
{
endpointComputeType = "Managed"
}
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
{
scaleType = "Default"
}
For TargetUtilization, use:
{
maxInstances = int
minInstances = int
pollingInterval = "string"
scaleType = "TargetUtilization"
targetUtilizationPercentage = int
}
Property values
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] (required) |
Collection
Name | Description | Value |
---|---|---|
clientId | The msi client id used to collect logging to blob storage. If it's null,backend will pick a registered endpoint identity to auth. | string |
dataCollectionMode | Enable or disable data collection. | 'Disabled' 'Enabled' |
dataId | The data asset arm resource id. Client side will ensure data asset is pointing to the blob storage, and backend will collect data to the blob storage. | string |
samplingRate | The sampling rate for collection. Sampling rate 1.0 means we collect 100% of data by default. | int |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
DataCollector
Name | Description | Value |
---|---|---|
collections | [Required] The collection configuration. Each collection has it own configuration to collect model data and the name of collection can be arbitrary string. Model data collector can be used for either payload logging or custom logging or both of them. Collection request and response are reserved for payload logging, others are for custom logging. |
DataCollectorCollections (required) |
requestLogging | The request logging configuration for mdc, it includes advanced logging settings for all collections. It's optional. | RequestLogging |
rollingRate | When model data is collected to blob storage, we need to roll the data to different path to avoid logging all of them in a single blob file. If the rolling rate is hour, all data will be collected in the blob path /yyyy/MM/dd/HH/. If it's day, all data will be collected in blob path /yyyy/MM/dd/. The other benefit of rolling path is that model monitoring ui is able to select a time range of data very quickly. |
'Day' 'Hour' 'Minute' 'Month' 'Year' |
DataCollectorCollections
Name | Description | Value |
---|
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
location | The geo-location where the resource lives | string (required) |
name | The resource name | string Constraints: Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required) |
parent_id | The ID of the resource that is the parent for this resource. | ID for resource of type: workspaces/onlineEndpoints |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
tags | Resource tags | Dictionary of tag names and values. |
type | The resource type | "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2024-10-01" |
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
dataCollector | The mdc configuration, we disable mdc when it's null. | DataCollector |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
endpointComputeType | Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. | 'Kubernetes' 'Managed' (required) |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. Default: Standard_F4s_v2. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | (Deprecated for Managed Online Endpoints) The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. (Now increase request_timeout_ms to account for any networking/queue delays) |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. | 'Default' 'TargetUtilization' (required) |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
RequestLogging
Name | Description | Value |
---|---|---|
captureHeaders | For payload logging, we only collect payload by default. If customers also want to collect the specified headers, they can set them in captureHeaders so that backend will collect those headers along with payload. | string[] |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
TrackedResourceTags
Name | Description | Value |
---|
UserAssignedIdentities
Name | Description | Value |
---|
UserAssignedIdentity
Name | Description | Value |
---|