concurrency::array_view –array_views on staging arrays
The previous posts in this series on C++ AMP array_view covered:
- Introduction to array_view and some of its key semantic aspects
- Implicit synchronization on destruction of array_views
- array_view discard_data function
- Caching and coherence policies underlying array_view implementation
In this post we will look at using array_views with staging arrays.
array_views with a staging array as data source
As described in a previous post, C++ AMP provides staging arrays for efficient data transfers between the host and accelerators. A staging array can only be accessed on the accelerator_view where it is allocated and additionally has an associated accelerator_view (indicated by the get_associated_accelerator_view method of concurrency::array) to/from which it can be copied efficiently. When using a staging array as the host memory data source for an array_view, any implicit data transfers from the staging array data source to its associated accelerator_view are fastercompared to an array_view on regular (non-staging) host memory where an extra intermediate copy to a temporary staging buffer is performed.
Staging arrays have certain limitations that you must be aware of should you choose to use them as the data source for array_views. It is NOT safe to access a staging array when a copy from (or to) that staging array is concurrently in progress. Hence, for an array_view with a staging array as its data source, any operation that may result in transfer of data from the staging array data source to its associated accelerator_view (or vice versa) must not be concurrently executed with another operation accessing the array_view on the CPU or another accelerator_view where the array_view is not already cached. Any such concurrent operations have undefined behavior (for example may cause an access violation error).
Guidelines regarding using staging array as array_view data source
Guideline A: Consider using staging arrays as your array_view data source if the view is to be accessed only on the host plus exactly one accelerator_view.
accelerator_view cpuAv = accelerator(accelerator::cpu_accelerator).default_view;
// Guideline A: Use a staging array as the data source for an array_view
// to be used in a parallel_for_each computation, for faster transfer of data
// between the CPU and the accelerator
std::vector<float> sourceVec(size);
float *hostPtr = sourceVec.data();
concurrency::array<float> sourceArray(size, cpuAv, accelerator().default_view);
float *hostPtr = sourceArray.data();
std::generate(hostPtr, hostPtr + size, rand);
// Using a staging array as the data source for the array_view
// results in faster transfer of data from the CPU to the accelerator_view
// where the parallel_for_each kernel executes
array_view<float> dataView(size, sourceVec);
array_view<float> dataView(sourceArray);
parallel_for_each(dataView.extent, [=](index<1> idx) restrict(amp) {
dataView(idx) = fast_math::cos(dataView(idx));
});
// Using a staging array as the data source for the array_view
// also results in faster transfer of data from the accelerator_view
// to the CPU
dataView.synchronize();
Guideline B: Exercise extreme caution when using array_views over staging arrays in multi-threaded CPU code that can potentially access such array_views concurrently from multiple threads. As described earlier such accesses have undefined behavior and may result in fatal errors.
accelerator_view cpuAv = accelerator(accelerator::cpu_accelerator).default_view;
concurrency::array<float> sourceArray(size, cpuAv, accelerator().default_view);
float *hostPtr = sourceArray.data();
std::generate(hostPtr, hostPtr + size, rand);
array_view<const float> sourceView(sourceArray);
array_view<float> outputView(array<float>(size));
std::vector<float> sourceCopy(size);
concurrency::task<void> t([&]() {
for (int i = 0; i < size; ++i) {
sourceCopy[i] = sourceView[i];
}
});
// Guideline B violation: An array_view over a staging array should
// not be concurrently accessed on the CPU as in the concurrency::task above
// (or another accelerator_view) with an operation that transfers data from
// the staging array to the associated_accelerator_view of the staging array
// (the parallel_for_each invocation results in such a transfer here)
parallel_for_each(sourceView.extent, [=](index<1> idx) restrict(amp) {
outputView(idx) = fast_math::cos(sourceView(idx));
});
In closing
In this post we looked at some key aspects regarding using array_views over staging arrays as their data source. Subsequent posts will dive into other functional and performance aspects of array_view - stay tuned!
I would love to hear your feedback, comments and questions below or in our MSDN forum.