다음을 통해 공유


Machine Learning DotNet for Clustering Model: Getting Started

Introduction

In Build 2018 Microsoft interduce the preview of ML.NET (Machine Learning .NET) which is a cross platform, open source machine learning framework. Yes, now its easy to develop our own Machine Learning application or develop costume module using Machine Learning framework.ML.NET is a machine learning framework which was mainly developed for .NET developers. We can use C# or F# to develop ML.NET applications.ML.NET is an open source which can be run on Windows, Linux and macOS. The ML.NET is still in development and now we can use the preview version to work and play with ML.NET. 

In this article we will see on how to develop our first ML.Net application for Clustering Model. 

Machine Learning 

Machine Learning is nothing but a set of programs which is used to train the computer to predict and display the output for us. Example live applications which is using Machine Learning are Windows Cortana, Facebook News Feed, Self-Driving Car, Future Stock Prediction, Gmail Spam detection, Pay pal fraud detection and etc.

In Machine Learning there is 3 main types  

  1. Supervised learning: Machine gets labelled inputs and their desired outputs, example we can say as Taxi Fare detection. 
  2. Unsupervised learning: Machine gets inputs without desired outputs, Example we can say as Customer Segmentations.
  3. Reinforcement learning: In this kind of algorithm, will interact with the dynamic interaction, example we can say as self-Driving Cars.

In Each type we will be using the Algorithm to train the Machine for producing the result we can see the algorithm for each Machine Learning types.

  1. **Supervised learning has Regression **and Classification Algorithms 
  2. Unsupervised learning has Clustering and Association Algorithms
  3. **Reinforcement learning **has Classification and Control Algorithms

In our previous article we have explained about predicting Future Stock for an Item using ML.NET for the Regression Model for the Supervised learning.

In this article we will see how to work on Clustering model for predicting the Mobile Sales by model, Sex, before 2010 and After 2010 using the Clustering model with ML.NET.

In this sample program we will be using Machine Leaning Clustering model of ML.NET to predict the Customer Segmentation by Sex, Mobile Phone purchased before 2010 and After 2010 by mobile model. Some members can use Windows Mobile,Some members can use Samsung or Iphone ,Some members can use both Samsung and Windows or IPhone,To Segment the members count by Sex, Mobile Phone purchased before 2010 and After 2010 by mobile model and to find the Cluster ID and Score value we have created the sample program in this article,We have used simple dataset with random members cluster count by Sex,Before2010 and After 2010.

Things to know before starting ML.NET

Initialize the Model

For working with Machine Learning first we need to pick our best fit machine learning algorithm. Machine learning has Clustering, regression, classification and anomaly detection modules. Here in this article we will be using the Clustering model for predicting the Customer Segmentation of Mobile phone usage.

Train

We need to train the machine learning model. Training is the process of analyzing input data by model. The training is mainly used for model to learn the pattern and save the as a trained model. For example, we will be creating a csv file in our application and in the csv file we will be giving the Customer details as Male, Female, Before2010 and After2010 and MobilePhone type for the Input. We give more than 100 records in the csv file as sample with all necessary details. We need to give this csv file as input to our model. Our model needs to be got trained and using this data our model needs to be analyzed to predict the result. The Predicted result will be displayed to as Cluster ID and Score as Distance to us in our console application.

Score

Score here is not same as our regression model where in Regression we will be having the labeled input as well as labeled output, but for the Clustering model we don’t have the desired output here in score will contain the array with squared Euclidean distances to the cluster centroids. 

Prerequisites:

Make sure, you have installed all the prerequisites in your computer. If not, then download and install Visual Studio 2017 15.6 or later with the ".NET Core cross-platform development" workload installed.

Code part

Step 1 - Create C# Console Application

After installing the prerequisites, click Start >> Programs >> Visual Studio 2017 >> Visual Studio 2017 on your desktop. Click New >> Project. Select Visual C# >> Windows Desktop >> Console APP (.Net Framework). Enter your project name and click OK.

Step 2 – Add Microsoft ML package

Right click on your project and click on Manage NuGet Packages.
 

Select Browse tab and search for Microsoft.ML

Click on Install, I Accept and wait till the installation complete.

We can see as the Microsoft.ML package was been installed and all the references for Microsoft.ML has been added in our project references.

Step 3 – Creating Train and Evaluate Data

Now we need to create a Model training and evaluate dataset. For creating this we will add two csv file one for training and one for the evaluate. We will create a new folder called data in our project to add our csv files.

Add Data Folder:

Right click the project and Add New Folder and name the folder as “Data”

Creating Train CSV file

Right click the Data folder click on Add >> New Item >> select the text file and name it as “custTrain.csv”

Select the properties of the “StockTrain.csv” change the Copy to Output Directory to “Copy always”

         Add your csv file data like below.

Add your csv file data like below. 

Here we have added the data with the following fields.

(Feature)

Male - Total No of phone using (Feature)

Female – Total No of phone using (Feature)

Before2010 – Total No of phone using (Feature)

After2010 – Total No of phone using (Feature)

MobilePhone – Mobile Phone Type. ** ** 

Note:  we need minimum 100 records of data to be added to train our Model 

Step 4 – Creating Class for Input Data and Prediction

Now we need to create a class for Input Data and prediction for doing this right click our project and add new class and name it as “CustData.cs”

In our class first, we need to import the Microsoft.ML.Runtime.Api for column and ClusterPrediction Class creation. 

using Microsoft.ML.Runtime.Api;

Next, we need to add all our columns same like our csv file in same order in our class and set as the column 0 to 5.

class CustData 
    { 
        [Column("0")] 
        public float Male; 
  
        [Column("1")] 
        public float Female; 
  
        [Column("2")] 
        public float Before2010; 
  
        [Column("3")] 
        public float After2010; 
    }

Creating prediction class. Now we need to create a prediction class and, in this class, we need to add our Prediction column.Here we add PredictedLabel and Score column as PredictedCustId and Distances.Predicted Label will contains the ID of the predicted cluster. Score column contains an array with squared Euclidean distances to the cluster centroids. The array length is equal to the number of clusters. For more details refer ML.NET to cluster

Note: Important to be note is in the prediction column we need to set the column name as the “Score” also set the data type as the float[] for Score and for PredictedLabel set as uint.

public class ClusterPrediction 
    { 
        [ColumnName("PredictedLabel")] 
        public uint PredictedCustId; 
  
        [ColumnName("Score")] 
        public float[] Distances; 
    }

Step 5 – Program.cs To work with ML.NET we open our “program.cs” file and first we import all the needed ML.NET references.

using Microsoft.ML;
using Microsoft.ML.Data;
using Microsoft.ML.Models;
using Microsoft.ML.Trainers;
using Microsoft.ML.Transforms;

Dataset Path

We set the custTrain.csv data and Model data path. For the traindata we give “custTrain.csv” path  

The final trained model needs to be saved for produce results. For this we set modelpath with the “custClusteringModel. zip” file. The trained model will be saved in the zip fil automatically during runtime of the program our bin folder with all needed files.

static readonly  string _dataPath = Path.Combine(Environment.CurrentDirectory, "Data",  "custTrain.csv"); 
static readonly  string _modelPath = Path.Combine(Environment.CurrentDirectory, "Data",  "custClusteringModel.zip");

Change the Main method to async Task Main method like below code.

static async Task Main(string[] args) 
        { 
               
        }

Before doing this, we need to perform 2 important tasks to successfully run our program

First is to set Platform Target as **x64.The ML.NET only runs in x64for doing this right click the project and **select properties >> Select Build and change the Platform target to x64.

 In order to run with our async Task Main method we need the change the Language version to C#7.1

In the Project Properties >> Build tab >> click on Advance button at the bottom and change the Language Version to C#7.1

Working with Training Model

First, we need to train the model and save the model to the zip file for this in our main method we call the predictionModel method and pass the CustData and ClusterPrediction class and return the model to the main method.

static async Task Main(string[] args) 
        { 
            PredictionModel<CustData, ClusterPrediction> model = await Train(); 
        } 
  
  public static  async Task<PredictionModel<CustData, ClusterPrediction>> Train() 
        { 
        }

Train and Save Model

In the above method we add the function to train the model and save the model to the zip file.

LearningPipeline

In training the first step will be working the LearningPipeline().

The LearningPipeline loads all the training data to train the model.

TextLoader

The TextLoader used to get all the data from train csv file for training and here we set as the useHeader:true to avaoid reading the first row from the csv file.

ColumnConcatenator

Next, we add all our fratures colums to be trained and evaluate.

Adding Learning Algorithm

KMeansPlusPlusClusterer

The learner will train the model.We have selected the Clustering model  for our sample and we will be usingKMeansPlusPlusClusterer **learner . **KMeansPlusPlusClusterer is one of the clustering leraner provided by the ML.NET. Here we add the KMeansPlusPlusClusterer to our pipeline.

We also need to set the K value as how many cluster we are using for our model.here we have 3 segments as Windows Mobile,Samsung and Apple so we have set K=4 in our program for the 3 clustering.

Train and Save Model

 Finally, we will train and save the model from this method.

public static  async Task<PredictionModel<CustData, ClusterPrediction>> Train() 
        { 
              
  
            // Start Learning 
            var pipeline = new  LearningPipeline(); 
               
            // Load Train Data 
            pipeline.Add(new TextLoader(_dataPath).CreateFrom<CustData>(useHeader: true, separator: ',')); 
            // </Snippet6> 
  
            // Add Features columns 
            pipeline.Add(new ColumnConcatenator( 
                    "Features", 
                    "Male", 
                    "Female", 
                    "Before2010", 
                    "After2010")); 
               
            // Add KMeansPlus Algorithm for k=3 (We have 3 set of clusters) 
            pipeline.Add(new KMeansPlusPlusClusterer() { K = 3 }); 
              
  
            // Start Training the model and return the model 
            var model = pipeline.Train<CustData, ClusterPrediction>(); 
            return model; 
   
        }

Prediction Results

Now its time for us to produce the result of predicted results by model. For this we will add one more class and, in this Class we will give the inputs.

Create a new Class named as “TestCustData.cs“

We add the values to the TestCustDataClass which we already created and defined the columns for Model training.   

static class  TestCustData 
    { 
        internal static  readonly CustData PredictionObj = new CustData 
        { 
            Male = 300f, 
            Female = 100f, 
            Before2010 = 400f, 
            After2010 = 1400f 
        }; 
    }

We can see in our custTrain.csv file we have the same data for the inputs.

Produce the Model Predicted results

In our program main method, we will add the below code at the bottom after Train method calling to predict the result of ClusterID and distances and display the results from model to users in command window. 

var prediction = model.Predict(TestCustData.PredictionObj); 
            Console.WriteLine($"Cluster: {prediction.PredictedCustId}"); 
            Console.WriteLine($"Distances: {string.Join(" ", prediction.Distances)}"); 
            Console.ReadLine();

Build and Run

When we can run the program, we can see the result in the command window like below.

See Also

Getting Started with Machine Learning DotNet (ML.NET)

Conclusion

ML.NET (Machine Learning DotNet) is a great framework for all the dotnet lovers who are all looking to work with machine learning. Now only preview version of ML.NET is available and Can’t wait till the release of public version of ML.NET. Here in this article I have used the clustering for Unsupervised type. If you are .Net lovers, not aware about Machine Learning and looking forward to work with machine learning then ML.Net is for you all and its great frame work to getting started with ML.NET. Hope you all enjoy reading this article and see you all soon with another post.

Reference

Download

Getting Started with Machine Learning DotNet for Clustering Model