Jaa


Asynchronous Operations and Continuations in C++ AMP

We have seen in previous blog posts that C++ AMP provides a comprehensive set of asynchronous APIs which enable users to continue performing useful work on the current CPU thread while an operation (such as transferring data between host and accelerator, synchronizing an array_view etc.) is concurrently (and asynchronously) executed on the accelerator. In the Beta bits, all the asynchronous APIs return a std::shared_future<void> object that can be used to wait for the accelerator operation when the results of the operation are needed.

Additionally, in many real life scenarios you may need to perform a dependent operation, for instance, in GUI applications to update the UI after some computation. To achieve this, you will have to “wait” on the asynchronous operation to complete before executing dependent tasks. This may result in threads idling for the duration of the asynchronous operation and that is often undesirable.

Continuations

A better alternative to explicit waiting, for dependencies to finish execution, would be the ability to chain the dependent tasks using “continuations”. A continuation is an asynchronous task that is invoked by another task, which is known as the antecedent, when the antecedent completes. Continuations provide the ability of “wait-free” composition of tasks. Unfortunately, C++ 11 futures do not support continuations. The choice of std::shared_future<void> as the return type of C++ AMP asynchronous APIs was motivated by a desire to interoperate with standard C++ asynchronous features and avoid inventing a new type for the purpose. However, given the importance of the scenario under discussion and to avoid unwanted burning of CPU threads, we are introducing a new return type for all C++ AMP asynchronous APIs, viz. concurrency::completion_future which has an interface analogous to std::shared_future<void> and it introduces just two additional members. Following is the interface of this new type:

class completion_future

{

public:

       completion_future();

       completion_future(const completion_future& rhs);

       completion_future(completion_future&& rhs);

       ~completion_future();

       completion_future& operator=(const completion_future& rhs);

       completion_future& operator=(completion_future&& rhs);

 

       void get() const;

       bool valid() const;

       void wait() const;

 

       template <class Rep, class Period>

       std::future_status::future_status wait_for(

             const std::chrono::duration<Rep, Period>& rel_time

             ) const;

       template <class Clock, class Duration>

       std::future_status::future_status wait_until(

             const std::chrono::time_point<Clock, Duration>& abs_time

             ) const;

 

       operator std::shared_future<void>() const;

 

       template <typename _Functor>

       void then(const _Functor &_Func) const;

 

       concurrency::task<void> to_task() const;

};

The concurrency::completion_future type supports implicit conversion to std::shared_future<void> which allows objects of this type to be passed (by value) where a std::shared_future<void> is expected. Hence, any existing C++ AMP code that uses or depends on C++ AMP asynchronous APIs returning std ::shared_future<void> object will continue to compile without any modification. In addition to providing the same interface as std::shared_future<void> , the concurrency::completion_future type also provides a then() member method to chain a continuation to a C++ AMP asynchronous operation.

#include <amp.h>

#include <iostream>

 

using namespace concurrency;

 

int main()

{

       array<int, 1> arr(512);

       array_view<int, 1> arr_v(arr);

 

       // Kick-off computation on accelerator

       parallel_for_each(arr.extent, [=] (index<1> idx) restrict(amp) {

             arr_v[idx] = 10;

       });

 

       // Issue an asynchronous synchronization of

       // the results and chain the dependent task

       arr_v.synchronize_async().then([](){

             //Perform dependent task here

             std::cout << "Hi there" << std::endl;

       });

      

       // Continue performing other useful work on current CPU thread

       Sleep(1000);

       return 0;

}

Further this type also provides a to_task() member method that returns a concurrency::task<void> object. The concurrency::task<void> type already provides a then() member method which can also be used tospecify or chain continuations. Continuations chained to the concurrency::task<void> object return another task object which can be either waited on or used for chaining further continuations.

concurrency::task<void> tsk = copy_async(src, dest).to_task();
tsk.then(/*work*/).then(/*work*/) //And so on

This feature would essentially enable seamless “wait-free” composition of PPL and C++ AMP asynchronous tasks, and users will find this useful for development of hybrid CPU-GPU applications. You can read more about PPL tasks and continuations on our blog.

Exception Forwarding

One of the questions that can come to mind is – What happens if the asynchronous operation encounters an exception during the operation? If an exception occurs during the asynchronous operation then:

  1. Exception is forwarded to and is stored with the concurrency::completion_future object returned by the asynchronous API. This exception is thrown on calling the completion_future::get() method.
  2. Any continuation chained using the completion_future::then() method is cancelled.
  3. The PPL task (concurrency::task<void> ) object returned by the completion_future::to_task() method is also forwarded the exception and any chained continuation cancelled. You can read more about PPL continuation chaining on our blog.

// Chain a continuation to copy_async

completion_future cf = copy_async(src, dest); // If this encounters exception

cf.then([] () {/*some work*/}); // then this lambda is not invoked

cf.get(); // This will throw stored exception

cf.to_task().wait(); // This will also throw stored exception

A point to note here is that only those exceptions, that occur after the actual asynchronous operation has been kicked-off, is forwarded to the concurrency::task<void> object. If an exception occurs before the actual asynchronous operation is kicked off (e.g. during validating the arguments etc.), it is thrown in the same thread where the asynchronous API (copy_async in above example) is invoked and the actual asynchronous operation never takes off.

A note about the C++ AMP open specification

The C++ AMP open specification has been updated with details about semantics of the API enhancement described in this blog post. Of the two methods (then() and to_task() ) offered by concurrency::completion_future, the method to_task() is Microsoft specific extension and is not part of open specification.

Feedback welcome

I hope you now have a better understanding of how to take advantage of the capabilities provided by the C++ AMP asynchronous APIs. As usual, I would love to read your comments below or in our MSDN forum.

Comments

  • Anonymous
    May 31, 2012
    Out of curiousity, why does then take a const _Functor&amp; rather than a _Functor&amp;&amp;? Presumably you have to persist the passed-in functor - why force a copy rather than allowing movable functors? Not every functor passed in will be a lambda, you know...

  • Anonymous
    June 05, 2012
    Hi ildjarn, That's a valid question. We will consider using the rvalue reference overload for task related functions in future releases. For now, if you need to use a functor that have expensive copy constructors, the workaround is to use the PImpl idiom (hide the implementation in a different object that is behind a shared_ptr allowing copies to be fast). You can read more about PImpl idiom here: msdn.microsoft.com/.../hh438477(v=VS.110).aspx Thanks, Hasibur

  • Anonymous
    June 07, 2012
    To be clear, because of reference-collapsing, you wouldn't want/need an additional overload, just a single member function template taking _Functor&amp;&amp;. When persisted with std::forward&lt;_Functor&gt;, it will be moved or copied automatically according to whether an lvalue or rvalue was passed in.