你当前正在访问 Microsoft Azure Global Edition 技术文档网站。如果需要访问由世纪互联运营的 Microsoft Azure 中国技术文档网站，请访问 https://docs.azure.cn。

动手实验室图像识别

项目
01/14/2017

请注意，本教程需要最新的主版本，或即将发布的 CNTK 1.7 版本。可以在本教程最初设计的 KDD CNTK Hands-On 教程的说明中找到中间二进制下载。

Hands-On实验室：使用卷积网络、批处理规范化和残差网络进行图像识别

此动手实验室演示如何使用 CNTK 实现基于卷积的图像识别。我们将从常见的卷积图像识别体系结构开始，添加 Batch 规范化，然后将其扩展到剩余网络 (ResNet-20) 。

你将练习的技术包括：

修改CNTK网络定义以添加预定义的操作 (dropout)
创建用户定义的函数以将网络中重复的部分提取到可重用模块中
(ResNet 跳过连接) 实现自定义网络结构
使用递归循环一次创建多个层
并行训练
卷积网络
批处理规范化

先决条件

我们假设你已安装CNTK并可以运行 CNTK 命令。本教程在 KDD 2016 上举行，需要最近的内部版本，请参阅此处以获取设置说明。只需按照说明从该页下载二进制安装包。对于与映像相关的任务，应在具有支持 CUDA 兼容的 GPU 的计算机上执行此操作。

接下来，请下载 ZIP 存档 (大约 12 MB) ：单击此链接，然后在“下载”按钮上。存档包含本教程的文件。请存档并将工作目录设置为 ImageHandsOn。你将使用以下文件：

ImageHandsOn.cntk：我们将介绍以下CNTK配置文件，并与之配合使用。
cifar10.pretrained.cmf：我们将首先生成的配置模型。
cifar10.ResNet.cmf：我们将在下面创建的 ResNet 版本的生成的模型。

最后，我们必须下载并转换 CIFAR-10 数据集。转换步骤大约需要 10 分钟。请执行以下两个 Python 脚本，也可以在工作目录中找到这些脚本：

wget -rc http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
tar xvf www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
python CifarConverter.py cifar-10-batches-py

这会将图像转换为 PNG 文件，50000 用于训练，10000 用于测试，它们将分别放置在以下两个目录中： cifar-10-batches-py/data/traincifar-10-batches-py/data/test

模型结构

我们将使用简单的卷积模型开始本教程。它由 3 层 5x5 卷积 + 非线性 + 2x 维度减少 3x3 最大池，然后是密集隐藏层和密集转换，形成输入到 10 向 softmax 分类器。

或者，作为CNTK网络说明。请快速查看，并将其与上述说明匹配：

featNorm = features - Constant (128)
l1 = ConvolutionalLayer {32, (5:5), pad = true, activation = ReLU,
                         init = "gaussian", initValueScale = 0.0043} (featNorm)
p1 = MaxPoolingLayer {(3:3), stride = (2:2)} (l1)
l2 = ConvolutionalLayer {32, (5:5), pad = true, activation = ReLU,
                         init = "gaussian", initValueScale = 1.414} (p1)
p2 = MaxPoolingLayer {(3:3), stride = (2:2)} (l2)
l3 = ConvolutionalLayer {64, (5:5), pad = true, activation = ReLU,
                         init = "gaussian", initValueScale = 1.414} (p2)
p3 = MaxPoolingLayer {(3:3), stride = (2:2)} (l3)
d1 = DenseLayer {64, activation = ReLU, init = "gaussian", initValueScale = 12} (p3)
z  = LinearLayer {10, init = "gaussian", initValueScale = 1.5} (d1)

可在此处找到有关这些运算符的详细信息：ConvolutionalLayer{}、、MaxPoolingLayer{}、 DenseLayer{}LinearLayer{}。

CNTK配置

配置文件

若要在CNTK中训练和测试模型，我们需要提供一个配置文件，告知CNTK要运行command (变量) 的操作，以及每个命令的参数节。

对于训练命令，需要告知CNTK：

如何读取数据 (reader 部分)
计算图中的模型函数及其输入和输出 (部分) BrainScriptNetworkBuilder
学习器 (部分的 SGD hyper-parameters)

对于评估命令，CNTK需要知道：

如何读取测试数据 (部分) reader
要评估 (evalNodeNames 参数) 的指标

下面是我们将开始使用的配置文件。如你所看到的，CNTK配置文件是一个文本文件，其中包含参数的定义，这些参数组织在记录层次结构中。还可以了解如何使用 $parameterName$ 语法CNTK支持基本参数替换。实际文件只包含上述几个参数，但请扫描该文件并找到刚才提到的配置项：

# CNTK Configuration File for training a simple CIFAR-10 convnet.
# During the hands-on tutorial, this will be fleshed out into a ResNet-20 model.

command = TrainConvNet:Eval

makeMode = false ; traceLevel = 0 ; deviceId = "auto"

rootDir = "." ; dataDir  = "$rootDir$" ; modelDir = "$rootDir$/Models"

modelPath = "$modelDir$/cifar10.cmf"

# Training action for a convolutional network
TrainConvNet = {
    action = "train"

    BrainScriptNetworkBuilder = {
        imageShape = 32:32:3
        labelDim = 10

        model (features) = {
            featNorm = features - Constant (128)
            l1 = ConvolutionalLayer {32, (5:5), pad=true, activation=ReLU,
                                     init="gaussian", initValueScale=0.0043} (featNorm)
            p1 = MaxPoolingLayer {(3:3), stride=(2:2)} (l1)
            l2 = ConvolutionalLayer {32, (5:5), pad=true, activation=ReLU,
                                     init="gaussian", initValueScale=1.414} (p1)
            p2 = MaxPoolingLayer {(3:3), stride=(2:2)} (l2)
            l3 = ConvolutionalLayer {64, (5:5), pad=true, activation=ReLU,
                                     init="gaussian", initValueScale=1.414} (p2)
            p3 = MaxPoolingLayer {(3:3), stride=(2:2)} (l3)
            d1 = DenseLayer {64, activation=ReLU, init="gaussian", initValueScale=12} (p3)
            z  = LinearLayer {10, init="gaussian", initValueScale=1.5} (d1)
        }.z

        # inputs
        features = Input {imageShape}
        labels   = Input {labelDim}

        # apply model to features
        z = model (features)

        # connect to system
        ce       = CrossEntropyWithSoftmax (labels, z)
        errs     = ErrorPrediction         (labels, z)

        featureNodes    = (features)
        labelNodes      = (labels)
        criterionNodes  = (ce)
        evaluationNodes = (errs)
        outputNodes     = (z)
    }

    SGD = {
        epochSize = 50000

        maxEpochs = 30 ; minibatchSize = 64
        learningRatesPerSample = 0.00015625*10:0.000046875*10:0.000015625
        momentumAsTimeConstant = 600*20:6400
        L2RegWeight = 0.03

        firstMBsToShowResult = 10 ; numMBsToShowResult = 100
    }

    reader = {
        verbosity = 0 ; randomize = true
        deserializers = ({
            type = "ImageDeserializer" ; module = "ImageReader"
            file = "$dataDir$/cifar-10-batches-py/train_map.txt"
            input = {
                features = { transforms = (
                    { type = "Crop" ; cropType = "RandomSide" ; sideRatio = 0.8 ; jitterType = "UniRatio" } :
                    { type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
                    { type = "Transpose" }
                )}
                labels = { labelDim = 10 }
            }
        })
    }
}

# Eval action
Eval = {
    action = "eval"
    minibatchSize = 16
    evalNodeNames = errs
    reader = {
        verbosity = 0 ; randomize = true
        deserializers = ({
            type = "ImageDeserializer" ; module = "ImageReader"
            file = "$dataDir$/cifar-10-batches-py/test_map.txt"
            input = {
                features = { transforms = (
                   { type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
                   { type = "Transpose" }
                )}
                labels = { labelDim = 10 }
            }
        })
    }
}

数据和数据读取

下载 CIFAR-10 数据并在本教程开始时按请求运行 CifarConverter.py 脚本后，你将找到一 cifar-10-batches-py/data个名为目录的目录，其中包含两个子目录， train 以及 test完整的 PNG 文件。 CNTKImageDeserializer使用标准图像格式。

你还将找到两个文件和 train_map.txttest_map.txt。看看后者，

% more cifar-10-batches-py/test_map.txt
cifar-10-batches-py/data/test/00000.png 3
cifar-10-batches-py/data/test/00001.png 8
cifar-10-batches-py/data/test/00002.png 8
...

这两个文件都包含两列，其中第一列包含图像文件的路径，第二列是类标签作为数字索引。这些列对应于读取器输入 features ，并 labels 定义为：

 features = { transforms = (
     { type = "Crop" ; cropType = "RandomSide" ; sideRatio = 0.8 ; jitterType = "UniRatio" } :
     { type = "Scale" ; width = 32 ; height = 32 ; channels = 3 ; interpolations = "linear" } :
     { type = "Transpose" }
 )}
 labels = { labelDim = 10 }

其他 transforms 部分指示 ImageDeserializer 在读取图像时应用一系列常见的) 转换 (。有关详细信息，请参阅此处。

运行它

可以在工作文件夹中的名称 ImageHandsOn.cntk 下找到上述配置文件。若要运行它，请运行以下命令执行上述配置：

cntk  configFile=ImageHandsOn.cntk

你的屏幕将活着与一系列日志消息 (CNTK有时可以说话) ，但如果一切正常，你很快就会看到这一点：

Training 116906 parameters in 10 out of 10 parameter tensors and 28 nodes with gradient

后跟如下所示的输出：

Finished Epoch[ 1 of 10]: [Training] ce = 1.66950797 * 50000; errs = 61.228% * 50000
Finished Epoch[ 2 of 10]: [Training] ce = 1.32699016 * 50000; errs = 47.394% * 50000
Finished Epoch[ 3 of 10]: [Training] ce = 1.17140398 * 50000; errs = 41.168% * 50000

这告诉你它正在学习。每个纪元表示一个通过 50000 个训练图像。它还会告诉你，第二个时期之后，配置命名 ce的训练标准已达到 1.33，根据此纪元的 50000 个样本测量，错误率为 47%，这两个 50000 个训练样本的误差率为 47%。

请注意，仅限 CPU 的计算机速度约为 20 倍。在看到第一个日志输出之前，需要花费几分钟时间。若要确保系统正在进行，可以启用跟踪来查看部分结果，结果应合理显示：

cntk  configFile=ImageHandsOn.cntk  traceLevel=1

Epoch[ 1 of 10]-Minibatch[-498-   1, 0.13%]: ce = 2.30260658 * 64; errs = 90.625% * 64
...
Epoch[ 1 of 10]-Minibatch[   1- 100, 12.80%]: ce = 2.10434176 * 5760; errs = 78.472% * 5760
Epoch[ 1 of 10]-Minibatch[ 101- 200, 25.60%]: ce = 1.82372971 * 6400; errs = 68.172% * 6400
Epoch[ 1 of 10]-Minibatch[ 201- 300, 38.40%]: ce = 1.69708496 * 6400; errs = 62.469% * 6400

训练完成 (大约需要 3 分钟Surface Book，在具有 Titan-X GPU) 的台式计算机上，最终消息将如下所示：

Finished Epoch[10 of 10]: [Training] ce = 0.74679766 * 50000; errs = 25.486% * 50000

这表示网络成功减少了 ce 损失，并在训练集上达到了 25.5% 的分类错误。由于变量command指定了第二个命令Eval，因此CNTK随后将继续执行该操作。它测量测试集 10000 张图像的分类错误率。

Final Results: Minibatch[1-625]: errs = 24.180% * 10000

测试错误率接近训练。由于 CIFAR-10 是一个相当小的数据集，因此这是一个指标，表明我们的模型尚未完全 (聚合，事实上，运行 30 个纪元将让你达到大约 20% ) 。

如果不想等到完成，可以运行中间模型，例如

cntk  configFile=ImageHandsOn.cntk  command=Eval  modelPath=Models/cifar10.cmf.5
Final Results: Minibatch[1-625]: errs = 31.710% * 10000

或运行预先训练的模型：

cntk  configFile=ImageHandsOn.cntk  command=Eval  modelPath=cifar10.pretrained.cmf
Final Results: Minibatch[1-625]: errs = 24.180% * 10000

修改模型

在下面，你将获得用于练习修改CNTK配置的任务。本文档末尾提供了解决方案...但请不要尝试！

任务 1：添加 Dropout

提高模型通用性的常见方法是删除。若要将 dropout 添加到CNTK模型，需要

向要在其中插入 dropout 操作的 CNTK 函数Dropout()添加调用
将参数 dropoutRate 添加到 SGD 调用的节中以定义 dropout 概率

在此特定任务中，请指定前 1 个纪元的退出率，后跟 50% 的下降率。 Dropout()请查看文档，了解如何执行此操作。

如果一切顺利，你将观察前1个纪元没有变化，但一旦投球踢进第二个纪元，将少得多的改善 ce 。这是正常情况。 (对于此特定配置，识别准确性不会提高，实际上。) 仅训练 10 个纪元时的最终结果约为 32%。 10 个纪元不足以用于删除。

请参阅此处的解决方案。

任务 2：通过将重复部分提取到函数中来简化模型定义

在此示例中， (卷积 >> ReLU >> 池) 的序列重复三次。你的任务是编写一个 BrainScript 函数，将这三个操作分组到可重用模块中。请注意，此序列的所有三个用法都使用不同的参数 (输出维度、初始化权重) 。因此，除了输入数据外，你编写的函数还应采用这两个参数。例如，它可以定义为

MyLayer (x, depth, initValueScale)

运行此操作时，预期生成的 ce 值和 errs 值相同。但是，如果在 GPU 上运行，cuDNN 的反向传播实现中的非确定性会导致细微变化。

请参阅此处的解决方案。

任务 3：添加 BatchNormalization

(此任务需要 GPU，因为CNTK的批处理规范化实现基于 cuDNN.)

批处理规范化是加快和改进收敛的常用技术。在CNTK中，批处理规范化实现为 BatchNormalizationLayer{}。

空间形式 (，其中所有像素位置都通过共享参数规范化，) 由可选参数调用： BatchNormalizationLayer{spatialRank=2}

请将批处理规范化添加到所有三个卷积层和两个密集层之间。请注意，应在非线性之前插入批处理规范化。因此，必须删除activation参数，并改为插入对CNTK函数ReLU()的显式调用。

此外，批处理规范化更改收敛速度。因此，让我们提高前 7 个时期 3 倍的学习率，并通过将参数设置为 0 来禁用势头和 L2 正则化。

运行时，你将看到训练日志中列出的其他可学习参数。最终结果约为 28%，比在相同次数迭代后不进行批处理规范化要好 4 磅。收敛确实加快了速度。

请参阅此处的解决方案。

任务 4：转换为残差网

上述配置是一个“玩具”示例，让你的手变得肮脏，运行和修改CNTK配置，我们故意没有运行到完全收敛，以保持低转时间。因此，让我们现在前进到更真实的配置---残差 Net。残差 Net (https://arxiv.org/pdf/1512.03385v1.pdf) 是经过修改的深度网络结构，层而不是从输入到输出的映射，而是学习更正术语。

(此任务还需要 GPU 才能执行批处理规范化操作，不过，如果有大量时间，可以通过编辑调用批处理规范化来尝试在 CPU 上运行它，但在某些情况下丢失准确性。)

若要开始，请修改以前的配置。首先，请将模型函数替换为以下函数 model(features) ：

        MySubSampleBN (x, depth, stride) =
        {
            s = Splice ((MaxPoolingLayer {(1:1), stride = (stride:stride)} (x) : ConstantTensor (0, (1:1:depth/stride))), axis = 3)  # sub-sample and pad: [W x H x depth/2] --> [W/2 x H/2 x depth]
            b = BatchNormalizationLayer {spatialRank = 2, normalizationTimeConstant = 4096} (s)
        }.b
        MyConvBN (x, depth, initValueScale, stride) =
        {
            c = ConvolutionalLayer {depth, (3:3), pad = true, stride = (stride:stride), bias = false,
                                    init = "gaussian", initValueScale = initValueScale} (x)
            b = BatchNormalizationLayer {spatialRank = 2, normalizationTimeConstant = 4096} (c)
        }.b
        ResNetNode (x, depth) =
        {
            c1 = MyConvBN (x,  depth, 7.07, 1)
            r1 = ReLU (c1)
            c2 = MyConvBN (r1, depth, 7.07, 1)
            r  = ReLU (c2)
        }.r
        ResNetIncNode (x, depth) =
        {
            c1 = MyConvBN (x,  depth, 7.07, 2)  # note the 2
            r1 = ReLU (c1)
            c2 = MyConvBN (r1, depth, 7.07, 1)
            r  = ReLU (c2)
        }.r
        model (features) =
        {
            conv1 = ReLU (MyConvBN (features, 16, 0.26, 1))
            rn1   = ResNetNode (ResNetNode (ResNetNode (conv1, 16), 16), 16)

            rn2_1 = ResNetIncNode (rn1, 32)
            rn2   = ResNetNode (ResNetNode (rn2_1, 32), 32)

            rn3_1 = ResNetIncNode (rn2, 64)
            rn3   = ResNetNode (ResNetNode (rn3_1, 64), 64)

            pool = AveragePoolingLayer {(8:8)} (rn3)

            z = LinearLayer {labelDim, init = "gaussian", initValueScale = 0.4} (pool)
        }.z

并将 SGD 配置更改为：

SGD = {
    epochSize = 50000

    maxEpochs = 160 ; minibatchSize = 128
    learningRatesPerSample = 0.0078125*80:0.00078125*40:0.000078125
    momentumAsTimeConstant = 1200
    L2RegWeight = 0.0001

    firstMBsToShowResult = 10 ; numMBsToShowResult = 500
}

你的任务是修改 ResNetNode() ， ResNetNodeInc() 以便它们实现以下值得奖励的 ASCII 艺术中布局的结构：

            ResNetNode                   ResNetNodeInc

                |                              |
         +------+------+             +---------+----------+
         |             |             |                    |
         V             |             V                    V
    +----------+       |      +--------------+   +----------------+
    | Conv, BN |       |      | Conv x 2, BN |   | SubSample, BN  |
    +----------+       |      +--------------+   +----------------+
         |             |             |                    |
         V             |             V                    |
     +-------+         |         +-------+                |
     | ReLU  |         |         | ReLU  |                |
     +-------+         |         +-------+                |
         |             |             |                    |
         V             |             V                    |
    +----------+       |        +----------+              |
    | Conv, BN |       |        | Conv, BN |              |
    +----------+       |        +----------+              |
         |             |             |                    |
         |    +---+    |             |       +---+        |
         +--->| + |<---+             +------>+ + +<-------+
              +---+                          +---+
                |                              |
                V                              V
            +-------+                      +-------+
            | ReLU  |                      | ReLU  |
            +-------+                      +-------+
                |                              |
                V                              V

请在日志中确认验证输出是否正确。

此操作需要很长时间才能完成。预期输出将如下所示：

Finished Epoch[ 1 of 160]: [Training] ce = 1.57037109 * 50000; errs = 58.940% * 50000
Finished Epoch[ 2 of 160]: [Training] ce = 1.06968234 * 50000; errs = 38.166% * 50000
Finished Epoch[ 3 of 160]: [Training] ce = 0.85858969 * 50000; errs = 30.316% * 50000

而没有跳过连接的不正确模型如下所示：

Finished Epoch[ 1 of 160]: [Training] ce = 1.72901219 * 50000; errs = 66.232% * 50000
Finished Epoch[ 2 of 160]: [Training] ce = 1.30180430 * 50000; errs = 47.424% * 50000
Finished Epoch[ 3 of 160]: [Training] ce = 1.04641961 * 50000; errs = 37.568% * 50000

请参阅此处的解决方案。

任务 5：自动生成多个层

最后，性能最佳的 ResNet 有 152 层。由于编写 152 个单个表达式会非常繁琐且容易出错，因此我们现在将修改定义以自动生成堆栈 ResNetNode()。

你的任务是编写具有此签名的函数：

ResNetNodeStack (x, depth, L)

其中 L 表示应堆叠多少 ResNetNodes 个，以便我们可以将上述表达式 rn1 替换为参数化调用：

rn1   = ResNetNodeStack (conv1, 16, 3)  # 3 means 3 such nodes

同样， rn2 和 rn3. 你需要的工具是条件表达式：

z = if cond then x else y

和递归。

这次训练将在泰坦-X上运行大约一半。如果这样做正确，日志的早期会包含以下消息：

Training 200410 parameters in 51 out of 51 parameter tensors and 112 nodes with gradient:

为了参考，我们包括此模型的预训练版本。可以使用以下命令测量错误率：

cntk  configFile=ImageHandsOn.ResNet.cntk  command=Eval

应会看到如下所示的结果：

Final Results: Minibatch[1-625]: errs = 8.400% * 10000; top5Errs = 0.290% * 10000

此错误率与原始 ResNet 纸张 (（表 6) https://arxiv.org/pdf/1512.03385v1.pdf）中报告的情况非常接近。

请参阅此处的解决方案。

任务 6：并行训练

最后，如果有多个 GPU，CNTK允许使用 MPI (消息传递接口) 并行化训练。此模型太小，无法进一步加快速度，例如， (当前微型batch大小设置太小，无法充分利用可用的 GPU 核心) 。然而，让我们通过动作，让你知道一旦转到现实世界的工作负载，如何执行此操作。

请将以下行添加到 SGD 块：

SGD = {
    ...
    parallelTrain = {
        parallelizationMethod = "DataParallelSGD"
        parallelizationStartEpoch = 2
        distributedMBReading = true
        dataParallelSGD = { gradientBits = 1 }
    }
}

然后执行此命令：

mpiexec -np 4 cntk  configFile=ImageHandsOn.cntk  stderr=Models/log  parallelTrain=true

接下来该做什么？

本教程已练习采用现有配置，并采用特定方式对其进行修改：

(删除) 添加预定义的操作
将重复部分提取到可重用模块中， (函数)
重构 (以插入批处理规范化)
resNet (自定义网络结构跳过连接)
使用递归参数化重复结构

我们已经了解了如何通过并行化加快训练速度。

那么，我们从哪里去？你可能已经发现，这些示例中使用的模式（我们称之为 图形构建 样式）可能很容易出错。发现错误？

model (features) =
{
    l1 = ConvolutionalLayer {32, (5:5), pad = true, activation = ReLU,
                             init = "gaussian", initValueScale = 0.0043} (featNorm)
    p1 = MaxPoolingLayer {(3:3), stride = (2:2)} (l1)
    l2 = ConvolutionalLayer {64, (5:5), pad = true, activation = ReLU,
                             init = "gaussian", initValueScale = 1.414} (p1)
    p2 = MaxPoolingLayer {(3:3), stride = (2:2)} (l1)
    d1 = DenseLayer {64, activation = ReLU, init = "gaussian", initValueScale = 12} (p2)
    z  = LinearLayer {10, init = "gaussian", initValueScale = 1.5} (d1)
}.z

避免此错误的一种方法是使用 函数组合。下面是一种更简洁的替代方法，可以编写相同的方法：

model = Sequential (
    ConvolutionalLayer {32, (5:5), pad = true, activation = ReLU,
                        init = "gaussian", initValueScale = 0.0043} :
    MaxPoolingLayer {(3:3), stride = (2:2)} :
    ConvolutionalLayer {64, (5:5), pad = true, activation = ReLU,
                        init = "gaussian", initValueScale = 1.414} :
    MaxPoolingLayer {(3:3), stride = (2:2)} :
    DenseLayer {64, activation = ReLU, init = "gaussian", initValueScale = 12} :
    LinearLayer {10, init = "gaussian", initValueScale = 1.5}
)

此样式将在下一个动手教程（ 文本理解与循环网络）中引入和使用。

解决方案

解决方案 1：添加 Dropout

按如下所示修改模型定义：

p3 = MaxPoolingLayer {(3:3), stride = (2:2)} (l3)
d1 = DenseLayer {64, activation = ReLU, init = "gaussian", initValueScale = 12} (p3)
d1_d = Dropout (d1)    ##### added
z  = LinearLayer {10, init = "gaussian", initValueScale = 1.5} (d1_d)  ##### d1 -> d1_d

和 SGD 部分：

SGD = {
    ...
    dropoutRate = 0*5:0.5   ##### added
    ...
}

解决方案 2：通过将重复部分提取到函数中来简化模型定义

添加函数定义，如下所示：

MyLayer (x, depth, initValueScale) =
{
    c = ConvolutionalLayer {depth, (5:5), pad = true, activation = ReLU,
                            init = "gaussian", initValueScale = initValueScale} (x)
    p = MaxPoolingLayer {(3:3), stride = (2:2)} (c)
}.p

并更新模型定义以使用它

featNorm = features - Constant (128)
p1 = MyLayer (featNorm, 32, 0.0043)  ##### replaced
p2 = MyLayer (p1,       32, 1.414)   ##### replaced
p3 = MyLayer (p2,       64, 1.414)   ##### replaced
d1 = DenseLayer {64, activation = ReLU, init = "gaussian", initValueScale = 12} (p3)

解决方案 3：添加 BatchNormalization

修改 MyLayer()：

MyLayer (x, depth, initValueScale) =
{
    c = ConvolutionalLayer {depth, (5:5), pad = true,  ##### no activation=ReLU
                            init = "gaussian", initValueScale = initValueScale} (x)
    b = BatchNormalizationLayer {spatialRank = 2} (c)
    r = ReLU (b)   ##### now called explicitly
    p = MaxPoolingLayer {(3:3), stride = (2:2)} (r)
}.p

并使用它。此外，在以下代码之前 z插入批处理规范化：

d1 = DenseLayer {64, init = "gaussian", initValueScale = 12} (p3)
d1_bnr = ReLU (BatchNormalizationLayer {} (d1))  ##### added BN and explicit ReLU
d1_d = Dropout (d1_bnr)                          ##### d1 -> d1_bnr
z  = LinearLayer {10, init = "gaussian", initValueScale = 1.5} (d1_d)

在“SGD”部分中更新这些参数：

SGD = {
    ....
    learningRatesPerSample = 0.00046875*7:0.00015625*10:0.000046875*10:0.000015625
    momentumAsTimeConstant = 0
    L2RegWeight = 0
    ...
}

解决方案 4：转换为残差网络

正确实现 ResNetNode() 并 ResNetNodeInc() 如下：

    ResNetNode (x, depth) =
    {
        c1 = MyConvBN (x,  depth, 7.07, 1)
        r1 = ReLU (c1)
        c2 = MyConvBN (r1, depth, 7.07, 1)
        r  = ReLU (x + c2)   ##### added skip connection
    }.r
    ResNetIncNode (x, depth) =
    {
        c1 = MyConvBN (x,  depth, 7.07, 2)  # note the 2
        r1 = ReLU (c1)
        c2 = MyConvBN (r1, depth, 7.07, 1)

        xs = MySubSampleBN (x, depth, 2)

        r  = ReLU (xs + c2)   ##### added skip connection
    }.r

解决方案 5：自动生成多个层

这是实现：

    ResNetNodeStack (x, depth, L) =
    {
        r = if L == 0
            then x
            else ResNetNode (ResNetNodeStack (x, depth, L-1), depth)
    }.r

或者，较短：

    ResNetNodeStack (x, depth, L) =
        if L == 0
        then x
        else ResNetNode (ResNetNodeStack (x, depth, L-1), depth)

还需要修改模型函数：

        conv1 = ReLU (MyConvBN (features, 16, 0.26, 1))
        rn1   = ResNetNodeStack (conv1, 16, 3)  ##### replaced

        rn2_1 = ResNetIncNode (rn1, 32)
        rn2   = ResNetNodeStack (rn2_1, 32, 2)  ##### replaced

        rn3_1 = ResNetIncNode (rn2, 64)
        rn3   = ResNetNodeStack (rn3_1, 64, 2)  ##### replaced

通过

动手实验室图像识别

Hands-On实验室：使用卷积网络、批处理规范化和残差网络进行图像识别

先决条件

模型结构

CNTK配置

配置文件

数据和数据读取

运行它

修改模型

任务 1：添加 Dropout

任务 2：通过将重复部分提取到函数中来简化模型定义

任务 3：添加 BatchNormalization

任务 4：转换为残差网

任务 5：自动生成多个层

任务 6：并行训练

接下来该做什么？

解决方案

解决方案 1：添加 Dropout

解决方案 2：通过将重复部分提取到函数中来简化模型定义

解决方案 3：添加 BatchNormalization

解决方案 4：转换为残差网络

解决方案 5：自动生成多个层

其他资源