Setting up more than 18 GPU Instances on Azure using VMs or Containers
I have been getting a number of questions around the availability of Azure N Series GPU
at present we have two SKUs NC (GPU Compute}_ and NV (GPU Visualisation)
This blog explains the differences between the SKUs and where NC vs NV hardware instances should be used https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/01/10/azure-cloud-gpu-for-datascience-and-academic-activities-such-as-cloud-rendering/
DataCenter & OS Availability
For NV machines (visualization): OS Availability- Windows Server 2016 or Windows Server 2012 R2
Region Availability – South Central US, West EU, South East Asia
For NC machines (compute):
OS Availability- Windows Server 2016, Windows Server 2012 R2 or Ubuntu 16.04 LTS
Region Availability – South Central US, East US
Getting Started
– You should choose HDD and not SSD
– You should install the necessary GPU drivers all documentation is available here: /en-us/azure/virtual-machines/virtual-machines-windows-n-series-driver-setup and /en-us/azure/virtual-machines/virtual-machines-linux-n-series-driver-setup
Capacity Restriction
In regards to provisioning at present Azure GPU provision via portal.azure.com is set to a maximum of 18 Instances. If you require more Instances, the following is a quick overview of the steps you should take.
Provisioning Process.
Step 1. You make a provisioning request on Azure via the Azure Portal https://portal.azure.com , so you simply try to create the VM.
Step 2. You will initially get a failure as the maximum number of NC24 is 20 Instances this places your request in a hold so please don’t allocate without letting me know anything over 18 instances
The error is "Operation results in exceeding quota limits of Core. Maximum allowed: 20, Current in use: 18," when trying to allocate the VMs in South Central US
Step 3. You will receive an email within 24 hours from the provision team look at the request and generally get in touch to confirm this wasn’t a mistake as NC series are costly.
To request a quota increase, you must open a Support Request with Microsoft. Load the Microsoft Azure Portal and click the question mark icon in the upper-right corner to get started.
The email you receive may from an alias or one of the provisioning team directly but it will be titled like this
RE: [REG:XXXXXXXX (is a number)] Quota request - Cores Initial Response
Information you need to submit
If you require over 18 Instances for your courses its always good to be clear to what these are being used for
The Azure Region you wish to deploy this to
Course Name/Title.
Requirement Number of Instances x Type of GPU Instance i.e. 20 x NC6 server instances and specification of available services and costs are below
What will the GPU be used for
i.e GPU Cores for use within Deep Learning curriculum
Course Codes:
Course Title: i.e Data Analysis and Probabilistic Inference
Course begin date: i.e. 16/1/17
Course end date: i.e 1/4/17 Number of participating students: 120 Profile of students: 4th-year machine learning students (undergraduate)
Proposed MS Azure utilisation in support of course teaching: estimated 880 hours of teaching utilisation be mindful of the costs which will be charged to your Azure Subscriptions
So in this example you have 20 x NC6 for 880 hrs per instance so 17,600 hours (733 days) of compute at $0.90 per hr = $15,840 charge to your Azure Subscription
Number of Students i.e. 125
What will students be doing i.e. SSH for Students Approx 3 students per NC6
The provisioning team then check with the capacity team to see if a pre auth has been given for you and for the details above they then pass/fail the request this and your machine can be provisioned.
This typically takes 2 days
Step 4. You will receive a confirmation email simply go back to the portal.azure.com and create the VMs again in the same region to the same instance size and these get deployed within 15 mins
Resources
Azure Batch Shipyard Data Science Containers https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/02/13/deep-learning-using-cntk-caffe-keras-theanotorch-tensorflow-on-docker-with-microsoft-azure-batch-shipyard/
juypter notebooks Tensorflow Deep Learning https://notebooks.azure.com/library/OEdO6ybBxM4/dashboard?page=1
DataScience VM https://blogs.msdn.microsoft.com/uk_faculty_connection/2017/01/28/data-science-virtual-machines-windows-vm-update-jan-2017/
Comments
- Anonymous
August 11, 2017
If you wish to current availability of Azure Services on your account you can run the following command:Using the Azure cloud console or Azure CLI 2.0 az vm list-usage --location eastus -o tableSee all az-vm commands at https://docs.microsoft.com/en-us/cli/azure/vmThe output from the command list-usage shows the current limits you have on Azure Resources see output belowName CurrentValue Limit-------------------------------- -------------- -------Availability Sets 0 2000Total Regional Cores 0 100Virtual Machines 0 10000Virtual Machine Scale Sets 0 2000Basic A Family Cores 0 100Standard A0-A7 Family Cores 0 100Standard A8-A11 Family Cores 0 100Standard D Family Cores 0 100Standard Dv2 Family Cores 0 100Standard G Family Cores 0 100Standard DS Family Cores 0 100Standard DSv2 Family Cores 0 100Standard GS Family Cores 0 100Standard F Family Cores 0 100Standard FS Family Cores 0 100Standard NV Family Cores 0 24Standard NC Family Cores 0 48Standard H Family Cores 0 8Standard Av2 Family Cores 0 100Standard LS Family Cores 0 100Standard Dv2 Promo Family Cores 0 100Standard DSv2 Promo Family Cores 0 100Standard MS Family Cores 0 0Standard Dv3 Family Cores 0 100Standard DSv3 Family Cores 0 100Standard Ev3 Family Cores 0 100Standard ESv3 Family Cores 0 100Standard Storage Managed Disks 0 10000Premium Storage Managed Disks 0 10000To request a quota increase, you must open a Support Request with Microsoft: https://docs.microsoft.com/en-us/azure/azure-supportability/resource-manager-core-quotas-request