Note
Access to this page requires authorization. You can try signing in or changing directories.
Access to this page requires authorization. You can try changing directories.
The previous post in this series introduced array_views, why you should prefer using them in your C++ AMP code, and some key semantic aspects of array_view. In this post we will look at the implicit synchronization behavior on destruction of array_views.
Implicit synchronization on destruction
Our earlier blog post on array_view synchronization lists the common operations that result in implicit synchronization of an array_view’s contents across different locations (accelerator_views and/or the host CPU). While most instances that trigger an implicit synchronization are fairly intuitive and easy to reason about, the semantics around the implicit synchronization upon destruction of an array_view are somewhat subtle.
When the last array_view associated with a data source is destructed, any pending modifications on a location other than the home storage location of the data source are synchronized back to the data source (unless the modifications were explicitly discarded using the discard_data API).
This is done to ensure that any modifications to the data are not inadvertently lost. The implicit synchronization is done only on destruction of the last array_view associated with a data source(basically concurrency::array objects and the top-level array_views created from CPU data). Synchronizing on destruction of all array_views is obviously undesirable (since it will cause expensive and unnecessary data transfers) as long as you have another array_view on that data source.
As mentioned in the introductory post of this series, the runtime relies on the type of the array_view to determine whether the contents of an array_view captured in a parallel_for_each may be modified on the accelerator_view by the parallel_for_each invocation. An array_view<const T> object captured in a parallel_for_each indicates to the runtime that the array_view is only read from and will not be modified on the accelerator_view. An array_view<T> object captured in a parallel_for_each indicates read-write access . The compiler attempts to analyze the array_view usage in the parallel_for_each kernel and inform the runtime if it is only read from, even if the type of the array_view object indicates read-write access. However there are no guarantees regarding the compiler always being able to determine this and it is highly recommended that programmers use an array_view<const T> object for read-only data, to explicitly communicate the read-only usage intent to the runtime.
Another important detail with respect to implicit synchronization of modifications on array_view destruction is that the destructor swallows any exceptions during the implicit synchronization (general C++ rule for exception handling reasons).
Guidelines regarding implicit synchronization on destruction
Guideline A: Use an array_view<const T> object when the array_view is only read from (not written to) in a parallel_for_each or in your CPU code.
The use of array_view<const T> leads to self-documenting code and also precludes the runtime from thinking that the data is being modified on an accelerator_view. Not doing so may result in an unexpected implicit synchronization on destruction of the last array_view associated with that data source. There are other reasons too why you should explicitly indicate the read-onliness of the array_view but let us save those for another post.
Guideline B: Do not rely on the implicit synchronization on destruction behavior and always perform the synchronization explicitly before the destruction of the array_view, unless the array_view contents have been discarded using discard_data.
This ensures that any exceptions encountered during the synchronization are propagated to the application.
Let us see these guidelines in action:
template <typename T>
void VectorAddition(float *A, float *B, float *C, int numElements)
{
// Guideline A: Explicitly specify read-onliness by creating array_view<const T>
// since if the compiler fails to infer read-onliness the contents would not
// be unnecessarily synchronized from the accelerator_view to the CPU data source
// on destruction of the views
array_view<T> viewA(numElements, A);
array_view<const T> viewA(numElements, A);
array_view<T> viewB(numElements, B);
array_view<const T> viewB(numElements, B);
array_view<T> viewC(numElements, C);
viewC.discard_data();
parallel_for_each(viewC.extent, [=](index<1> idx) restrict(amp) {
viewC(idx) = viewA(idx) + viewB(idx);
});
// Guideline B: Explicitly synchronize instead of relying on the implicit
// synchronization on destruction which would swallow exceptions.
viewC.synchronize();
}
In closing
In this post we looked at the implicit synchronization behavior for array_views on destruction. Subsequent posts will dive into other functional and performance aspects of array_view - stay tuned!
I would love to hear your feedback, comments and questions below or in our MSDN forum.