在 Microsoft Fabric 中使用 PyTorch 训练模型

项目
07/19/2024

本文介绍如何训练和跟踪 PyTorch 模型的迭代。 PyTorch 机器学习框架基于 Torch 库。 PyTorch 通常用于计算机视觉和自然语言处理应用程序。

先决条件

在笔记本中安装 PyTorch 和 torchvision。可以使用以下命令在环境中安装或升级这些库的版本：

pip install torch torchvision

设置机器学习实验

可以使用 MLFLow API 创建机器学习试验。如果尚不存在，MLflow set_experiment() 函数将创建名为 sample-pytorch 的新机器学习试验。

在笔记本中运行以下代码，创建试验：

import mlflow

mlflow.set_experiment("sample-pytorch")

训练和评估 Pytorch 模型

设置试验后，加载修改的国家标准与技术研究所 (MNIST) 数据集。生成测试和训练数据库，然后创建训练函数。

在笔记本中运行以下代码，训练 Pytorch 模型：

import os
import torch
import torch.nn as nn
from torch.autograd import Variable
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torch.nn.functional as F
import torch.optim as optim

# Load the MNIST dataset
root = "/tmp/mnist"
if not os.path.exists(root):
    os.mkdir(root)

trans = transforms.Compose(
    [transforms.ToTensor(), transforms.Normalize((0.5,), (1.0,))]
)

# If the data doesn't exist, download the MNIST dataset
train_set = dset.MNIST(root=root, train=True, transform=trans, download=True)
test_set = dset.MNIST(root=root, train=False, transform=trans, download=True)

batch_size = 100

train_loader = torch.utils.data.DataLoader(
    dataset=train_set, batch_size=batch_size, shuffle=True
)
test_loader = torch.utils.data.DataLoader(
    dataset=test_set, batch_size=batch_size, shuffle=False
) 

print("==>>> total trainning batch number: {}".format(len(train_loader)))
print("==>>> total testing batch number: {}".format(len(test_loader)))

# Define the network
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4 * 4 * 50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x): 
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4 * 4 * 50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

    def name(self):
        return "LeNet"

# Train the model
model = LeNet()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.9)

criterion = nn.CrossEntropyLoss()

for epoch in range(1):
    # Model training
    ave_loss = 0
    for batch_idx, (x, target) in enumerate(train_loader):
        optimizer.zero_grad()
        x, target = Variable(x), Variable(target)
        out = model(x)
        loss = criterion(out, target)
        ave_loss = (ave_loss * batch_idx + loss.item()) / (batch_idx + 1)
        loss.backward()
        optimizer.step()
        if (batch_idx + 1) % 100 == 0 or (batch_idx + 1) == len(train_loader):
            print(
                "==>>> epoch: {}, batch index: {}, train loss: {:.6f}".format(
                    epoch, batch_idx + 1, ave_loss
                )
            )
    # Model testing
    correct_cnt, total_cnt, ave_loss = 0, 0, 0
    for batch_idx, (x, target) in enumerate(test_loader):
        x, target = Variable(x, volatile=True), Variable(target, volatile=True)
        out = model(x)
        loss = criterion(out, target)
        _, pred_label = torch.max(out.data, 1)
        total_cnt += x.data.size()[0]
        correct_cnt += (pred_label == target.data).sum()
        ave_loss = (ave_loss * batch_idx + loss.item()) / (batch_idx + 1)

        if (batch_idx + 1) % 100 == 0 or (batch_idx + 1) == len(test_loader):
            print(
                "==>>> epoch: {}, batch index: {}, test loss: {:.6f}, acc: {:.3f}".format(
                    epoch, batch_idx + 1, ave_loss, correct_cnt * 1.0 / total_cnt
                )
            )

torch.save(model.state_dict(), model.name())

使用 MLflow 记录模型

下一个任务启动 MLflow 运行，并跟踪机器学习试验里的结果。示例代码创建一个名为“sample-pytorch”的新模型。此代码将使用指定的参数创建运行，并记录 sample-pytorch 试验中的运行。

在笔记本中运行以下代码并用日志记录模型：

with mlflow.start_run() as run:
    print("log pytorch model:")
    mlflow.pytorch.log_model(
        model, "pytorch-model", registered_model_name="sample-pytorch"
    )

    model_uri = "runs:/{}/pytorch-model".format(run.info.run_id)
    print("Model saved in run %s" % run.info.run_id)
    print(f"Model URI: {model_uri}")

加载和评估模型

保存模型后，就可以加载它进行推理。

在笔记本中运行以下代码并加载模型进行推理：

# Inference with loading the logged model
loaded_model = mlflow.pytorch.load_model(model_uri)
print(type(loaded_model))

correct_cnt, total_cnt, ave_loss = 0, 0, 0
for batch_idx, (x, target) in enumerate(test_loader):
    x, target = Variable(x, volatile=True), Variable(target, volatile=True)
    out = loaded_model(x)
    loss = criterion(out, target)
    _, pred_label = torch.max(out.data, 1)
    total_cnt += x.data.size()[0]
    correct_cnt += (pred_label == target.data).sum()
    ave_loss = (ave_loss * batch_idx + loss.item()) / (batch_idx + 1)

    if (batch_idx + 1) % 100 == 0 or (batch_idx + 1) == len(test_loader):
        print(
            "==>>> epoch: {}, batch index: {}, test loss: {:.6f}, acc: {:.3f}".format(
                epoch, batch_idx + 1, ave_loss, correct_cnt * 1.0 / total_cnt
            )
        )

探索机器学习模型
创建机器学习试验

通过

在 Microsoft Fabric 中使用 PyTorch 训练模型

先决条件

设置机器学习实验

训练和评估 Pytorch 模型

使用 MLflow 记录模型

加载和评估模型

反馈

其他资源

通过

在 Microsoft Fabric 中使用 PyTorch 训练模型

先决条件

设置机器学习实验

训练和评估 Pytorch 模型

使用 MLflow 记录模型

加载和评估模型

相关内容

反馈

其他资源