Machine learning and data science tools on Azure Data Science Virtual Machines
Artikel
Azure Data Science Virtual Machines (DSVMs) have a rich set of tools and libraries for machine learning. These resources are available in popular languages, such as Python, R, and Julia.
The DSVM supports these machine-learning tools and libraries:
You can use the Azure Machine Learning cloud service to develop and deploy machine-learning models. You can use the Python SDK to track your models as you build, train, scale, and manage them. Deploy models as containers, and run them in the cloud, on-premises, or on Azure IoT Edge.
Supported editions
Windows (conda environment: AzureML), Linux (conda environment: py36)
Typical uses
General machine-learning platform
How is it configured or installed?
Installed with GPU support
How to use or run it
As a Python SDK and in the Azure CLI. Activate to the conda environment AzureML on the Windows edition or activate to py36 on the Linux edition.
Link to samples
Find sample Jupyter notebooks in the AzureML directory, under notebooks.
H2O
Category
Value
What is it?
An open-source AI platform that supports distributed, fast, in-memory, scalable machine learning.
Connect to the VM with X2Go. Start a new terminal, and run java -jar /dsvm/tools/h2o/current/h2o.jar. Then, start a web browser and connect to http://localhost:54321.
Link to samples
Find samples on the VM in Jupyter, under the h2o directory.
There are several other machine-learning libraries on DSVMs - for example, the popular scikit-learn package that's part of the Anaconda Python distribution for DSVMs. For a list of packages available in Python, R, and Julia, run the respective package managers.
LightGBM
Category
Value
What is it?
A fast, distributed, high-performance gradient-boosting (GBDT, GBRT, GBM, or MART) framework based on decision tree algorithms. Machine-learning tasks - ranking, classification, etc. - use it.
Supported versions
Windows, Linux
Typical uses
General-purpose gradient-boosting framework
How is it configured or installed?
LightGBM is installed as a Python package on Windows. On Linux, the command-line executable is located in /opt/LightGBM/lightgbm. The R package is installed, and Python packages are installed.
A graphical user interface for data mining that uses R.
Supported editions
Windows, Linux
Typical uses
General UI data-mining tool for R
How to use or run it
As a UI tool. On Windows, start a command prompt, run R, and then inside R, run rattle(). On Linux, connect with X2Go, start a terminal, run R, and then inside R, run rattle().
A collection of machine-learning algorithms for data-mining tasks. You can either apply the algorithms directly, or call them from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
Supported editions
Windows, Linux
Typical uses
General machine-learning tool
How to use or run it
On Windows, search for Weka on the Start menu. On Linux, sign in with X2Go, and then go to Applications > Development > Weka.
A fast, portable, and distributed gradient-boosting (GBDT, GBRT, or GBM) library for Python, R, Java, Scala, C++, and more. It runs on a single machine, and on Apache Hadoop and Spark.
Supported editions
Windows, Linux
Typical uses
General machine-learning library
How is it configured or installed?
Installed with GPU support
How to use or run it
As a Python library (2.7 and 3.6+), R package, and on-path command-line tool (C:\dsvm\tools\xgboost\bin\xgboost.exe for Windows and /dsvm/tools/xgboost/xgboost for Linux)
Links to samples
Samples are included on the VM, in /dsvm/tools/xgboost/demo on Linux, and C:\dsvm\tools\xgboost\demo on Windows.