BrainScript epochSize in CNTK
For Python users, see here.
The number of label samples (tensors along a dynamic axis) in each epoch. The epochSize
in CNTK is the number of label samples after which specific additional actions are taken, including
- saving a checkpoint model (training can be restarted from here)
- cross-validation
- learning-rate control
- minibatch-scaling
Note that the definition of the number of label samples is similar to the number of samples used for minibatchSize (minibatch_size_in_samples). The definition of epochSize
differs from the definition of minitbatchSize
in the sense that epochSize
is label samples, not input samples.
So, importantly, for sequential data, a sample is an individual item of a sequence.
Hence, CNTK's epochSize
does not refer to a number of sequences,
but the of sequence items across the sequence labels that constitute the minibatch.
Equally important, it is label samples, not input samples, and the number of labels per sequence is not necessarily the number of input samples. It is possible, for example, to have one label per sequence and for each sequence to have many samples (in which case epochSize
acts like number of sequences), and it is possible to have one label per sample in a sequence, in which case epochSize
acts exactly like minibatchSize
in that every sample (not sequence) is counted.
For smaller dataset sizes, epochSize
is often set equal to the dataset size. In BrainScript you can specify 0 to denote that. In Python you can specify cntk.io.INFINITELY_REPEAT
for that. In Python only, you can also set it to cntk.io.FULL_DATA_SWEEP
where processing will stop after one pass of the whole data size.
For large datasets, you may want to guide your choice for epochSize by checkpointing. For example, if you want to lose at most 30 minutes of computation in case of a power outage or network glitch, you would want a checkpoint to be created about every 30 minutes (from which the training can be resumed). Choose epochSize
to be the number of samples that takes about 30 minutes to compute.