Microsoft.MachineLearningServices workspaces/onlineEndpoints/deployments 2023-04-01
- Latest
- 2024-10-01
- 2024-10-01-preview
- 2024-07-01-preview
- 2024-04-01
- 2024-04-01-preview
- 2024-01-01-preview
- 2023-10-01
- 2023-08-01-preview
- 2023-06-01-preview
- 2023-04-01
- 2023-04-01-preview
- 2023-02-01-preview
- 2022-12-01-preview
- 2022-10-01
- 2022-10-01-preview
- 2022-06-01-preview
- 2022-05-01
- 2022-02-01-preview
- 2021-03-01-preview
Bicep resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups - See resource group deployment commands
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Bicep to your template.
resource symbolicname 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2023-04-01' = {
identity: {
type: 'string'
userAssignedIdentities: {
{customized property}: {}
}
}
kind: 'string'
location: 'string'
name: 'string'
properties: {
appInsightsEnabled: bool
codeConfiguration: {
codeId: 'string'
scoringScript: 'string'
}
description: 'string'
egressPublicNetworkAccess: 'string'
environmentId: 'string'
environmentVariables: {
{customized property}: 'string'
}
instanceType: 'string'
livenessProbe: {
failureThreshold: int
initialDelay: 'string'
period: 'string'
successThreshold: int
timeout: 'string'
}
model: 'string'
modelMountPath: 'string'
properties: {
{customized property}: 'string'
}
readinessProbe: {
failureThreshold: int
initialDelay: 'string'
period: 'string'
successThreshold: int
timeout: 'string'
}
requestSettings: {
maxConcurrentRequestsPerInstance: int
maxQueueWait: 'string'
requestTimeout: 'string'
}
scaleSettings: {
scaleType: 'string'
// For remaining properties, see OnlineScaleSettings objects
}
endpointComputeType: 'string'
// For remaining properties, see OnlineDeploymentProperties objects
}
sku: {
capacity: int
family: 'string'
name: 'string'
size: 'string'
tier: 'string'
}
tags: {
{customized property}: 'string'
}
}
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
{
scaleType: 'Default'
}
For TargetUtilization, use:
{
maxInstances: int
minInstances: int
pollingInterval: 'string'
scaleType: 'TargetUtilization'
targetUtilizationPercentage: int
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
{
containerResourceRequirements: {
containerResourceLimits: {
cpu: 'string'
gpu: 'string'
memory: 'string'
}
containerResourceRequests: {
cpu: 'string'
gpu: 'string'
memory: 'string'
}
}
endpointComputeType: 'Kubernetes'
}
For Managed, use:
{
endpointComputeType: 'Managed'
}
Property values
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] (required) |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
location | The geo-location where the resource lives | string (required) |
name | The resource name | string Constraints: Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required) |
parent | In Bicep, you can specify the parent resource for a child resource. You only need to add this property when the child resource is declared outside of the parent resource. For more information, see Child resource outside parent resource. |
Symbolic name for resource of type: workspaces/onlineEndpoints |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
tags | Resource tags | Dictionary of tag names and values. See Tags in templates |
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
endpointComputeType | Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. | 'Kubernetes' 'Managed' (required) |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. | 'Default' 'TargetUtilization' (required) |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
TrackedResourceTags
Name | Description | Value |
---|
UserAssignedIdentities
Name | Description | Value |
---|
UserAssignedIdentity
Name | Description | Value |
---|
ARM template resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups - See resource group deployment commands
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following JSON to your template.
{
"type": "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments",
"apiVersion": "2023-04-01",
"name": "string",
"identity": {
"type": "string",
"userAssignedIdentities": {
"{customized property}": {
}
}
},
"kind": "string",
"location": "string",
"properties": {
"appInsightsEnabled": "bool",
"codeConfiguration": {
"codeId": "string",
"scoringScript": "string"
},
"description": "string",
"egressPublicNetworkAccess": "string",
"environmentId": "string",
"environmentVariables": {
"{customized property}": "string"
},
"instanceType": "string",
"livenessProbe": {
"failureThreshold": "int",
"initialDelay": "string",
"period": "string",
"successThreshold": "int",
"timeout": "string"
},
"model": "string",
"modelMountPath": "string",
"properties": {
"{customized property}": "string"
},
"readinessProbe": {
"failureThreshold": "int",
"initialDelay": "string",
"period": "string",
"successThreshold": "int",
"timeout": "string"
},
"requestSettings": {
"maxConcurrentRequestsPerInstance": "int",
"maxQueueWait": "string",
"requestTimeout": "string"
},
"scaleSettings": {
"scaleType": "string"
// For remaining properties, see OnlineScaleSettings objects
},
"endpointComputeType": "string"
// For remaining properties, see OnlineDeploymentProperties objects
},
"sku": {
"capacity": "int",
"family": "string",
"name": "string",
"size": "string",
"tier": "string"
},
"tags": {
"{customized property}": "string"
}
}
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
{
"scaleType": "Default"
}
For TargetUtilization, use:
{
"maxInstances": "int",
"minInstances": "int",
"pollingInterval": "string",
"scaleType": "TargetUtilization",
"targetUtilizationPercentage": "int"
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
{
"containerResourceRequirements": {
"containerResourceLimits": {
"cpu": "string",
"gpu": "string",
"memory": "string"
},
"containerResourceRequests": {
"cpu": "string",
"gpu": "string",
"memory": "string"
}
},
"endpointComputeType": "Kubernetes"
}
For Managed, use:
{
"endpointComputeType": "Managed"
}
Property values
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] (required) |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
apiVersion | The api version | '2023-04-01' |
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
location | The geo-location where the resource lives | string (required) |
name | The resource name | string Constraints: Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required) |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
tags | Resource tags | Dictionary of tag names and values. See Tags in templates |
type | The resource type | 'Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments' |
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
endpointComputeType | Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. | 'Kubernetes' 'Managed' (required) |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. | 'Default' 'TargetUtilization' (required) |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
TrackedResourceTags
Name | Description | Value |
---|
UserAssignedIdentities
Name | Description | Value |
---|
UserAssignedIdentity
Name | Description | Value |
---|
Terraform (AzAPI provider) resource definition
The workspaces/onlineEndpoints/deployments resource type can be deployed with operations that target:
- Resource groups
For a list of changed properties in each API version, see change log.
Resource format
To create a Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments resource, add the following Terraform to your template.
resource "azapi_resource" "symbolicname" {
type = "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2023-04-01"
name = "string"
identity = {
type = "string"
userAssignedIdentities = {
{customized property} = {
}
}
}
kind = "string"
location = "string"
body = jsonencode({
properties = {
appInsightsEnabled = bool
codeConfiguration = {
codeId = "string"
scoringScript = "string"
}
description = "string"
egressPublicNetworkAccess = "string"
environmentId = "string"
environmentVariables = {
{customized property} = "string"
}
instanceType = "string"
livenessProbe = {
failureThreshold = int
initialDelay = "string"
period = "string"
successThreshold = int
timeout = "string"
}
model = "string"
modelMountPath = "string"
properties = {
{customized property} = "string"
}
readinessProbe = {
failureThreshold = int
initialDelay = "string"
period = "string"
successThreshold = int
timeout = "string"
}
requestSettings = {
maxConcurrentRequestsPerInstance = int
maxQueueWait = "string"
requestTimeout = "string"
}
scaleSettings = {
scaleType = "string"
// For remaining properties, see OnlineScaleSettings objects
}
endpointComputeType = "string"
// For remaining properties, see OnlineDeploymentProperties objects
}
})
sku = {
capacity = int
family = "string"
name = "string"
size = "string"
tier = "string"
}
tags = {
{customized property} = "string"
}
}
OnlineScaleSettings objects
Set the scaleType property to specify the type of object.
For Default, use:
{
scaleType = "Default"
}
For TargetUtilization, use:
{
maxInstances = int
minInstances = int
pollingInterval = "string"
scaleType = "TargetUtilization"
targetUtilizationPercentage = int
}
OnlineDeploymentProperties objects
Set the endpointComputeType property to specify the type of object.
For Kubernetes, use:
{
containerResourceRequirements = {
containerResourceLimits = {
cpu = "string"
gpu = "string"
memory = "string"
}
containerResourceRequests = {
cpu = "string"
gpu = "string"
memory = "string"
}
}
endpointComputeType = "Kubernetes"
}
For Managed, use:
{
endpointComputeType = "Managed"
}
Property values
CodeConfiguration
Name | Description | Value |
---|---|---|
codeId | ARM resource ID of the code asset. | string |
scoringScript | [Required] The script to execute on startup. eg. "score.py" | string Constraints: Min length = 1 Pattern = [a-zA-Z0-9_] (required) |
ContainerResourceRequirements
Name | Description | Value |
---|---|---|
containerResourceLimits | Container resource limit info: | ContainerResourceSettings |
containerResourceRequests | Container resource request info: | ContainerResourceSettings |
ContainerResourceSettings
Name | Description | Value |
---|---|---|
cpu | Number of vCPUs request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
gpu | Number of Nvidia GPU cards request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
memory | Memory size request/limit for container. More info: https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/ |
string |
DefaultScaleSettings
Name | Description | Value |
---|---|---|
scaleType | [Required] Type of deployment scaling algorithm | 'Default' (required) |
EndpointDeploymentPropertiesBaseEnvironmentVariables
Name | Description | Value |
---|
EndpointDeploymentPropertiesBaseProperties
Name | Description | Value |
---|
KubernetesOnlineDeployment
Name | Description | Value |
---|---|---|
containerResourceRequirements | The resource requirements for the container (cpu and memory). | ContainerResourceRequirements |
endpointComputeType | [Required] The compute type of the endpoint. | 'Kubernetes' (required) |
ManagedOnlineDeployment
Name | Description | Value |
---|---|---|
endpointComputeType | [Required] The compute type of the endpoint. | 'Managed' (required) |
ManagedServiceIdentity
Name | Description | Value |
---|---|---|
type | Type of managed service identity (where both SystemAssigned and UserAssigned types are allowed). | 'None' 'SystemAssigned' 'SystemAssigned,UserAssigned' 'UserAssigned' (required) |
userAssignedIdentities | The set of user assigned identities associated with the resource. The userAssignedIdentities dictionary keys will be ARM resource ids in the form: '/subscriptions/{subscriptionId}/resourceGroups/{resourceGroupName}/providers/Microsoft.ManagedIdentity/userAssignedIdentities/{identityName}. The dictionary values can be empty objects ({}) in requests. | UserAssignedIdentities |
Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments
Name | Description | Value |
---|---|---|
identity | Managed service identity (system assigned and/or user assigned identities) | ManagedServiceIdentity |
kind | Metadata used by portal/tooling/etc to render different UX experiences for resources of the same type. | string |
location | The geo-location where the resource lives | string (required) |
name | The resource name | string Constraints: Pattern = ^[a-zA-Z0-9][a-zA-Z0-9\-_]{0,254}$ (required) |
parent_id | The ID of the resource that is the parent for this resource. | ID for resource of type: workspaces/onlineEndpoints |
properties | [Required] Additional attributes of the entity. | OnlineDeploymentProperties (required) |
sku | Sku details required for ARM contract for Autoscaling. | Sku |
tags | Resource tags | Dictionary of tag names and values. |
type | The resource type | "Microsoft.MachineLearningServices/workspaces/onlineEndpoints/deployments@2023-04-01" |
OnlineDeploymentProperties
Name | Description | Value |
---|---|---|
appInsightsEnabled | If true, enables Application Insights logging. | bool |
codeConfiguration | Code configuration for the endpoint deployment. | CodeConfiguration |
description | Description of the endpoint deployment. | string |
egressPublicNetworkAccess | If Enabled, allow egress public network access. If Disabled, this will create secure egress. Default: Enabled. | 'Disabled' 'Enabled' |
endpointComputeType | Set to 'Kubernetes' for type KubernetesOnlineDeployment. Set to 'Managed' for type ManagedOnlineDeployment. | 'Kubernetes' 'Managed' (required) |
environmentId | ARM resource ID or AssetId of the environment specification for the endpoint deployment. | string |
environmentVariables | Environment variables configuration for the deployment. | EndpointDeploymentPropertiesBaseEnvironmentVariables |
instanceType | Compute instance type. | string |
livenessProbe | Liveness probe monitors the health of the container regularly. | ProbeSettings |
model | The URI path to the model. | string |
modelMountPath | The path to mount the model in custom container. | string |
properties | Property dictionary. Properties can be added, but not removed or altered. | EndpointDeploymentPropertiesBaseProperties |
readinessProbe | Readiness probe validates if the container is ready to serve traffic. The properties and defaults are the same as liveness probe. | ProbeSettings |
requestSettings | Request settings for the deployment. | OnlineRequestSettings |
scaleSettings | Scale settings for the deployment. If it is null or not provided, it defaults to TargetUtilizationScaleSettings for KubernetesOnlineDeployment and to DefaultScaleSettings for ManagedOnlineDeployment. |
OnlineScaleSettings |
OnlineRequestSettings
Name | Description | Value |
---|---|---|
maxConcurrentRequestsPerInstance | The number of maximum concurrent requests per node allowed per deployment. Defaults to 1. | int |
maxQueueWait | The maximum amount of time a request will stay in the queue in ISO 8601 format. Defaults to 500ms. |
string |
requestTimeout | The scoring timeout in ISO 8601 format. Defaults to 5000ms. |
string |
OnlineScaleSettings
Name | Description | Value |
---|---|---|
scaleType | Set to 'Default' for type DefaultScaleSettings. Set to 'TargetUtilization' for type TargetUtilizationScaleSettings. | 'Default' 'TargetUtilization' (required) |
ProbeSettings
Name | Description | Value |
---|---|---|
failureThreshold | The number of failures to allow before returning an unhealthy status. | int |
initialDelay | The delay before the first probe in ISO 8601 format. | string |
period | The length of time between probes in ISO 8601 format. | string |
successThreshold | The number of successful probes before returning a healthy status. | int |
timeout | The probe timeout in ISO 8601 format. | string |
Sku
Name | Description | Value |
---|---|---|
capacity | If the SKU supports scale out/in then the capacity integer should be included. If scale out/in is not possible for the resource this may be omitted. | int |
family | If the service has different generations of hardware, for the same SKU, then that can be captured here. | string |
name | The name of the SKU. Ex - P3. It is typically a letter+number code | string (required) |
size | The SKU size. When the name field is the combination of tier and some other value, this would be the standalone code. | string |
tier | This field is required to be implemented by the Resource Provider if the service has more than one tier, but is not required on a PUT. | 'Basic' 'Free' 'Premium' 'Standard' |
TargetUtilizationScaleSettings
Name | Description | Value |
---|---|---|
maxInstances | The maximum number of instances that the deployment can scale to. The quota will be reserved for max_instances. | int |
minInstances | The minimum number of instances to always be present. | int |
pollingInterval | The polling interval in ISO 8691 format. Only supports duration with precision as low as Seconds. | string |
scaleType | [Required] Type of deployment scaling algorithm | 'TargetUtilization' (required) |
targetUtilizationPercentage | Target CPU usage for the autoscaler. | int |
TrackedResourceTags
Name | Description | Value |
---|
UserAssignedIdentities
Name | Description | Value |
---|
UserAssignedIdentity
Name | Description | Value |
---|