Jaa


writeonly becomes discard_data for C++ AMP array_view

In my, admittedly long, introduction of array_view I mentioned how by constructing an array_view with a const value type(e.g. array_view<const int,2> av) you could achieve perf gains by letting the compiler know that you will not make changes to this data, i.e. that you will not need to copy this data back from the accelerator after the parallel_for_each computation completes.

In that same blog post I hinted that “There is also a similar mechanism to achieve the reverse, i.e. not to copy the data of an array_view to the GPU. ” In this blog post I’ll share a change in that area for C++ AMP in the Visual Studio 11 Beta.

concurrency::writeonly<> in Developer Preview

If you looked at our classic matrix multiply or the other C++ AMP samples, you’ll know that you can do something like the following

 array_view<writeonly<int>, 2> av(N, M, my_vector);
parallel_for_each(av.extent, [=](index<2> idx) restrict(amp)
{
    // lambda code
});
av.synchronize();

The good thing about the approach above is that the data behind av (in my_vector) will not get copied to the accelerator that the lambda passed to the parallel_for_each gets executed on.

The not so great thing is that the syntax of writeonly is not intuitive to everyone, that you had to make that perf decision the moment you constructed the array_view, and that you cannot read from the array_view in the lambda (e.g. after you wrote some data to it in the lambda) or even read from it outside of the lambda on the host (that is what writeonly implies, right?).

array_view::discard_data() in Beta

In the Visual Studio 11 Beta, concurrency::writeonly goes away. The replacement is a function on array_view called discard_data() . So the code above becomes:

 array_view<int, 2> av(N, M, my_vector);
av.discard_data();
parallel_for_each(av.extent, [=](index<2> idx) restrict(amp)
{
    // lambda code
});
av.synchronize();

You can call this method on the host at any point on any array_view instance, as long as you call it before the parallel_for_each invocation. It remains an optimization hint so the data will not be copied to the accelerator, but that does not prevent you from reading data from it in the lambda or after the parallel_for_each computation completes. Of course, every time you touch the array_view, if you capture it again in a lambda of a parallel_for_each, you have to call discard_data again.

The change is fairly mechanical, so you should be able to quickly update your existing code.

Important note for post-Beta change

In the Beta bits, you will find that every array_view constructor has an optional Boolean last parameter named _Discard_original_data that by default is false. Do not ever change the value of that parameter, because it is going away post-Beta (it is already gone on my machine). Sorry this was a last minute change that we did not have time to get into the Beta. There is little point in this post delving into why we changed our minds, but suffice to say that you should ignore that parameter in the Beta if you want your code to compile unchanged with the final RTM build of Visual Studio.