Binary Operations
Elementwise binary operators.
ElementTimes (x, y)
x .* y
Minus (x, y)
x - y
Plus (x, y)
x + y
LogPlus (x, y)
Less (x, y)
Equal (x, y)
Greater (x, y)
GreaterEqual (x, y)
NotEqual (x, y)
LessEqual (x, y)
BS.Boolean.And (x, y)
BS.Boolean.Or (x, y)
BS.Boolean.Xor (x, y)
Parameters
x
: left inputy
: right input
The dimensions of x
and y
must match (subject to broadcasting rules, see below).
For the three Boolean
operations, both inputs are expected to be either 0 or 1, otherwise the behavior of
the functions is unspecified, and will in fact change in future versions.
Sparse values are currently not supported.
Return value
These functions return the result of the corresponding operations. The relation operators (Equal()
etc.)
and the three Boolean
operations return values that
are either 0 or 1.
The output dimension or tensor shape is identical to those of the inputs, subject to broadcasting, see below.
Descriptions
These are the common binary operators.
They are applied elementwise.
(Note that BrainScript's *
operator is not elementwise, but stands for the matrix product. This is different, for example, from Python's numpy
library.)
The dimensions of the inputs must be identical, with the exception of broadcasting.
Broadcasting semantics
Broadcasting, a concept that CNTK models after Python's numpy
library,
means that a dimension in one of the inputs can be 1 where the other input's is not.
In that case, the input with the 1-dimension will be copied n
times, where n
is the
corresponding other input's dimension.
If the tensor ranks do not match, the tensor shape of the input with less dimensions will be
assumed to be 1, and trigger broadcasting.
For example, adding a [13 x 1]
tensor to a [1 x 42]
vector would yield a [13 x 42]
vector
that contains the sums of all combinations.
Relation Operations
The relation operators (Equal()
etc.) are not differentiable, their gradient is always considered 0.
They can be used for flags, e.g. as a condition argument in the If()
operation.
LogPlus()
The LogPlus()
operation computes the sum of values represented in logarithmic form.
I.e., it computes:
LogPlus (x, y) = Log (Exp (x) + Exp (y))
where x
and y
are logarithms of values.
This operation is useful when dealing with probabilities,
which are often so small that only a logarithmic representation
allows for appropriate numeric accuracy.
Note: Another common name for this operation is log-add-exp, e.g. SciPy.
Examples
Standard Sigmoid Layer
This layer uses the elementwise binary +
:
z = Sigmoid (W * x + b)
Note that *
above is not elementwise, but stands for the matrix product.
Alternative Implementation of Softmax Activation
The Softmax()
activation function can be written using broadcasting Minus
:
MySoftmax (z) = Exp (z - ReduceLogSum (z))
Here, ReduceLogSum()
reduces the vector z
to a scalar by computing its logarithmic sum. Through broadcasting semantics
of subtraction, this scalar is then subtracted from every input value.
This implements the division by the sum over all values in the Softmax function.
Elementwise Max of Two Inputs
The elementwise maximum of two inputs can be computed as a combination of Greater()
and If()
:
MyElementwiseMax (a, b) = If (Greater (a, b), a, b)
This also works with broadcasting. For example, the linear rectifier can be written with this using a scalar constant as the second input:
MyReLU (x) = MyElementwiseMax (x, Constant(0))