PowerShell Script To Invoke ML Scoring Part I
By Earle Sinnatamby, Consultant
Objective
The purpose of this blog post is to provide PowerShell alternative to utilizing Azure Data Factory to perform Machine Learning (ML) scoring.
The Pilot engagement required daily on-premises data to be uploaded into Azure Blob Storage. Each data file uploaded required daily rescoring with ML script provided by the client the results of which was used to alert possible maintenance required for factory equipment.
The initial plan was to utilize ADF to perform this task. However, it was not possible to identify the daily files uploaded using parameter settings in Azure Data Factory. This was primarily because all the files contained the date and timestamp in yyyyMMdd_hhmmss.ddd format. Wide variation of the values of the hhmmss.ddd was observed thus could not be hardcoded.
The resulting PowerShell script was deployed on a Virtual Machine that had been previously provisioned and scheduled for execution daily using the TaskScheduler.
PowerShell Solution
The following PowerShell code snippets was utilized:
The Import-AzurePublishSettingFile cmdlet Imports a publish settings file that lets you manage your Azure accounts in Windows PowerShell. The file itself is an XML file with a .publishsettings file name extension. The file contains an encoded certificate that provides management credentials for your Azure subscriptions
#Import the publish settings file to setup Azure credentials
Import-AzurePublishSettingsFile C:\file-conaining-Azure-credentials.publishsettings
Set-AzureSubscription Creates or changes an Azure subscription in order to Changes the current and default Azure subscriptions with Select-AzureSubscription cmdlet
$subscriptionName = "mySubscription" #Azure Subscription
$storageAccountName = "myStorageAccountName" #Blob storage account name
$containerName = "myContainerName" #Container where files to be processed are stored
Set-AzureSubscription -SubscriptionName $subscriptionName -CurrentStorageAccountName $storageAccountName
Select-AzureSubscription -SubscriptionName $subscriptionName
Get-Date Gets the current date and time. And decrementing by 1 day with AddDays to get the prior day's date. Finally, using ToString was used to get the result in the yyyyMMdd format
$fileDate = (Get-Date).AddDays(-1).ToString('yyyyMMdd') #Date of previous day
Assign variable $prefixFileName the filename up to the date portion to be searched
$prefixFileName = "AzureBlobDirectoryContainingFilesToBeProcessed/fileName_${fileDate}"
Get-AzureStorageBlob obtains the list of blobs in the specified container and returns a list of blob present at that location
$files = Azure.Storage\Get-AzureStorageBlob -Container $containerName -Prefix $prefixKnifeLeft
Iterate over $files which consists of a list of files with the name prefix – note that $files.Count returns the count of blogs in the list
for ($i=0; $i -lt $files.Count; $i++) {
Break if the file is empty i.e. zero-byte size, hence, no ML scoring is required
if($files[$i].Length -le 0) {Break;}
Obtain the name of the file with the following cmdlet and passed on to the ML scoring function
$filename = split-path $files[$i].Name –leaf
Invoke the ML scoring function which will be covered in a future blog
# call ML Scoring function – to be covered by my colleague in a future blog
}
Scheduling the PowerShell Script
Use Windows TaskScheduler to schedule the script for execution.
Ensure the following is set correctly:
- The setting for Program/script in Action is cmd.exe
- The setting for Add Arguments is "/c –File C:\PowerShell directory\PowerShell script.ps1"
Alternatively, to ensure console logging use "Write-Host [-foreground Color]" cmdlet in the script with the following setting:
- The setting for Program/script in Action is cmd.exe
- The setting for Add Arguments is "/c –File C:\PowerShell directory\PowerShell script.ps1 >> C:\DailyJobLogs\DailyMLScoring.log 2>&"