Freigeben über


Fundamentals of Machine Learning

Let's face it - computing was created to analyze data. You rarely if ever have a program looking for data. Rather, you have data looking for code to analyze it. Machine learning represents the state-of-the-art in making sense of data. Unfortunately, for many years it has been out of reach for the common developer – until now.

This is perhaps one of the highest paid and most sought-after skills today.  No question about it -  this is the place to really make a big as a developer.

image001

Figure 1: The world of machine learning

Machine learning represents the logical extension of simple data retrieval and storage. It is about developing building blocks that make computers learn and behave more intelligently.

Machine learning makes it possible to mine historical data and make predictions about future trends. Without realizing it, you are probably already using the benefits of machine learning. Search engine results, online recommendations, ad targeting, fraud detection, and spam filtering are all examples of what is possible with machine learning.

Machine learning is about making data-driven decisions. While instinct might be important, it is difficult to beat empirical data.

The many facets of machine learning

Once you start to dive deep into the topic you start addressing such topics as:

  1. Supervised and unsupervised learning

  2. Classification

  3. Markov models and Bayesian networks and much more

Mahout and Hadoop

The Apache Mahout project's goal is to build a scalable machine learning library.

There is some degree of overlap with big data analytics within a Hadoop

There is an entire machine learning open-source project that you can get for free with Hadoop. You can learn more here:

  1. https://mahout.apache.org/

Mahout includes algorithms for clustering, classfication and collaborative filtering. You can also find:

  1. Matrix factorization based recommenders

  2. K-Means, Fuzzy K-Means clustering

  3. Latent Dirichlet Allocation

  4. Singular Value Decomposition

  5. Logistic regression classifier

  6. (Complementary) Naive Bayes classifier

  7. Random forest classifier

My alma mater was UC Berkeley and they offer many awesome courses there

I wish I had more time. I would seriously consider taking this free MIT online class, which you can find here:

  1. https://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-867-machine-learning-fall-2006/index.htm

Azure is democratizing machine learning

Historically, machine learning, has required complex software and high-end computers . This field of computing required a seasoned data scientist . What's been needed is a fully managed cloud service for this form of machine learning, also known as predictive analytics .

Welcome To ML Studio

MAML - Microsoft Azure Machine Learning is an Azure Service. It is a web application that has a studio called Studio ML. You create experiments with this web application that represent your machine learning activities.

A visual composition surface is used to create a machine learning workflow. The design surface of the web app allows you to add modules. Additional modules can be authored in R.

The point of a visual design surface is to remove complexity of creating algorithms, cleaning data, finding Features.

There are 2 Phases to using MAML. The first phase is the experiment. That is where you start with the data and begin to clean it up. This is going to take 60% to 70% of the total time. In this phase, you will be combining data, removing rows, eliminating columns. In this phase you will also take your model, and train it. From there the output will be scored and evaluated.

In phase 2 you will operationalize it, which means it will be put behind a web service. This will allow you connect your machine learning model to other business processes. This is the real magic of the Azure Machine Learning offering. Operationalizing your models and exposing them to your business is a key step and is often extremely difficult with other approaches. Operationalizing Azure Machine Learning is extremely simple.

Using simple drag-and-drop gestures along with some data flow graphs you are able to set up some experiments and take advantage of sophisticated algorithms about writing code.

There is a pool of VMs running machine learning algorithms using an orchestration engine, freeing the data scientist from moving data and moving to different services.

The ML Studio is targeting the emerging data scientists. You can train 10 models in minutes, not days. You can put a predictive model into production in minutes, not weeks or months. Some customers are reporting a 10X-100X in reduction in cost relative to competition. I invite readers to go get some pricing for SAS. See https://www.sas.com/en_us/software/analytics/rapid-predictive-modeler.html.

These models can also be shared with other parts of a company. Employees can create their own workspaces, giving re-use and cross-teaming. The models can be locked as well, allowing them to be reused but not modified. In other words, these can be immutable models, allowing sharing and innovation but not breaking what is considered ‘golden.’

The predictive models can be shared as a service across an enterprise leverage Azure as the public cloud back-end. Average waiting from one service in Azure to another is between 50ms to 100 ms. This is very fast and will allow companies to leverage machine learning back-ends running predictive models from other services in Azure. For example, you can write JSON-based back ends that leverage your predictive models, allowing you to build decision making dashboards for your business.

Machine Learning algorithms are built to continually improve over time by leverage training sets. Training sets make it possible to continually improve the robustness of your predictive model.

Data Scientists Code in R

R is a popular open source programming environment for statistics and data mining. The good news is that it is easily integrated into ML Studio. I have a lot of friends using functional languages for machine learning, such as F#. It's pretty clear, however, that R is dominant in this space.

Polls and surveys of data miners are showing R's popularity has increased substantially in recent years. R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team, of which Chambers is a member. R is named partly after the first names of the first two R authors. R is a GNU project and is written primarily in C, Fortran.

Data Analytics

Below is a framework that provides a way for you to think about the predictive nature of machine learning. It's all about providing insight to business decisions where limited resources are applied to grow revenue or limit expenses. This might include insights into consumer spending patterns, or to optimizing supply chain.

How to think about the analytics spectrum

One great way to think about machine learning is to break down analytics into 3 questions:

  1. What happened?

    • Historical
  2. What will happen?

    • Predictive
  3. What should I do next?

    • Prescriptive

How to think of the personas doing analytics

  1. The information worker

    • Typically using a self-service approach using Power BI.

      • Power BI for Office 365 is a self-service business intelligence (BI) solution delivered through Excel and Office 365 that provides information workers with data analysis and visualization capabilities to identify deeper business insights about their data
  2. IT professionals

    • Involved in data transformation, data warehousing, creating data merchant cubes for analytics, and data modeling
    • Work for GM's are directors
  3. Data scientists

    • Deeply technical and skilled not just with code, but with mathematics, statistics, and probability

    • Can use a variety of techniques to apply probability to predictions (ie, there is a 42% chance that prices will go up in the next 18 hours)

    • Like Monte Carlo simulations, parameterizing the model

    • What to look for in a data scientist

      • Domain Knowledge

      • Clear Understanding Of The Scientific Method

        • Objectivity, Hypothesis, Validation, Transparency
      • Strong in Math and Statistics

      • Intellectual Curiosity and Critical Thinking

      • Visualization and Communication

      • Advanced Computing And Data Management

Academic backgrounds

If you were to go to school, went to study to be a data scientist, what courses would you take?

  1. Applied Mathematics

  2. Computer Science

  3. Econometrics

  4. Statistics

  5. Engineering

Industries that really benefit from that of science

  1. Financial Services

  2. Telecommunications

  3. Information Technology

  4. Manufacturing

  5. Utilities

  6. Healthcare

  7. Marketing

Some video help

Video: Getting Started with Azure Machine Learning - Step3 https://azure.microsoft.com/en-us/documentation/videos/getting-started-machine-learning-step-3#
Video: Getting Started with Azure Machine Learning - Step2 https://azure.microsoft.com/en-us/documentation/videos/getting-started-machine-learning-step-2#
Video: Getting Started with Azure Machine Learning - Step1 https://azure.microsoft.com/en-us/documentation/videos/getting-started-machine-learning-step-1#
Video: Overview of Azure ML https://azure.microsoft.com/en-us/documentation/videos/overview-of-ml#
Video: Getting Started with Azure ML Studio https://azure.microsoft.com/en-us/documentation/videos/getting-started-with-ml-studio#
Video: Provisioning Azure ML workspaces from Azure Portal https://azure.microsoft.com/en-us/documentation/videos/provisioning-ml-workspaces-from-portal#
Video: Predictive Modeling with Azure ML studio https://azure.microsoft.com/en-us/documentation/videos/predictive-modeling-with-ml-studio#
Video: Introduction to Azure ML API Service https://azure.microsoft.com/en-us/documentation/videos/introduction-to-ml-api-service#
Video: Preprocessing Data in Azure ML Studio https://azure.microsoft.com/en-us/documentation/videos/preprocessing-data-in-ml-studio#
Video: R in Azure ML Studio https://azure.microsoft.com/en-us/documentation/videos/r-in-ml-studio#
Video: Deploying a predictive model as a service (part-I) https://azure.microsoft.com/en-us/documentation/videos/deploying-a-predictive-model-as-a-service-part-1#
Video: Deploying a predictive model as a service (part-II) https://azure.microsoft.com/en-us/documentation/videos/deploying-a-predictive-model-as-a-service-part-2#
Video: Getting Started https://azure.microsoft.com/en-us/documentation/videos/getting-started-with-ml-studio#
Video: Provisioning Workspaces https://azure.microsoft.com/en-us/documentation/videos/provisioning-ml-workspaces-from-portal#
Video: Predictive Modeling https://azure.microsoft.com/en-us/documentation/videos/predictive-modeling-with-ml-studio#
Video center https://azure.microsoft.com/en-us/documentation/videos/index/?services=machine-learning#
Tutorial: Create your first predictive analytics experiment https://azure.microsoft.com/en-us/documentation/articles/machine-learning-create-experiment/#
Walkthrough: Develop a predictive solution https://azure.microsoft.com/en-us/documentation/articles/machine-learning-walkthrough-develop-predictive-solution/#
Tutorial: Use the sample datasets https://azure.microsoft.com/en-us/documentation/articles/machine-learning-use-sample-datasets/#
Tutorial: Create an Azure Machine Learning workspace https://azure.microsoft.com/en-us/documentation/articles/machine-learning-create-workspace/#
Featured Models: Featured Models: CRM task https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/crm-task/#
Featured Models: Flight delay prediction https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/flight-delay/#
Featured Models: Network intrusion detection https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/net-intrusion/#
Featured Models: Sentiment analysis https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/sentiment-analysis/#
Featured Models: Finding similar companies https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/similar-companies/#
Featured Models: Student performance https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/student-perf/#
Featured Models: Time series prediction model https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/time-series/#
Use the sample datasets in Azure Machine Learning Studio https://azure.microsoft.com/en-us/documentation/articles/machine-learning-use-sample-datasets/#
Guide to the Net# Neural Networks Specification Language for Azure ML https://azure.microsoft.com/en-us/documentation/articles/machine-learning-azure-ml-netsharp-reference-guide/#
Whitepaper: Analyze customer churn using Microsoft Azure Machine Learning https://azure.microsoft.com/en-us/documentation/articles/machine-learning-azure-ml-customer-churn-scenario/#
Model gallery https://azure.microsoft.com/en-us/documentation/services/machine-learning/models/#

Wrapping up

This post provided a high-level view of some of the characteristics and concepts with respect to machine learning. In the next post will start playing around with the Azure portal.

image002

Figure 2: The Azure Portal

Comments

  • Anonymous
    September 16, 2014
    At an offsite and this article was useful

  • Anonymous
    September 26, 2014
    Very nice article and collection of resources

  • Anonymous
    January 29, 2015
    Great article and a good start point

  • Anonymous
    July 16, 2015
    That is great