Jaa


What’s new in Beta 2 for the Concurrency Runtime, Parallel Pattern Library and Asynchronous Agents Library

Last week Visual Studio 2010 Beta was released for download. Since Beta1, the team has been pretty busy adding enhanced functionality to make you more productive at expressing parallelism in your applications and improving the quality and performance of our runtime and programming models.

Here’s a guide to what’s new in Beta2: we’ve added 2 new concurrent containers, significantly enhanced our online documentation and improved our debug experience by adding better visualizations and more intuitive views in the parallel debug window. We’ve also modified a small number of APIs which you should be aware of if you are using Beta 1.

Concurrent containers – concurrent_queue and concurrent_vector

We’ve alluded more than once to concurrent_queue and concurrent_vector in our videos and live talks, these are finally in the box and with them two new header files concurrent_queue.h and concurrent_vector.h.

concurrent_queue<T> is very similar to std::queue<T> and it offers push, try_pop interfaces and ‘unsafe’ iterators and size accessors (these aren’t threadsafe during concurrent pushes and pops).

concurrent_vector<T> is most similar to a std::vector<T> and it offers a push_back method that is internally synchronized across threads and allows efficient thread safe growth of the vector. Like std::vector, concurrent_vector has random access iterators, but unlike std::vector, the guarantee of contiguous storage is removed and there are no insert and erase methods.

The interfaces to concurrent_queue and concurrent_vector will be incredibly familiar if you are a user of Intel’s Threading Building Blocks (the interfaces are identical), and a very big thank you goes out to their team for their assistance with this.

Here’s a brief example that uses both std::queue and std::vector in a parallel loop:

 #include <ppl.h>
#include <concurrent_vector.h>
#include <concurrent_queue.h>
#include <iostream>
using namespace Concurrency;
using namespace std;
int main()
{
    concurrent_vector<int> odds;
    concurrent_queue<int> evens;
    parallel_for(0,100,1,[&odds,&evens](int i){
        if (i%2 == 0)
            evens.push(i);
        else
            odds.push_back(i);
    }); 
    cout << "We expect 100 items: " << evens.unsafe_size() + odds.size() << endl;
}
Debugging enhancements in Beta2

VS 2010 Beta 2 also now includes significant enhancements for parallel debugging. The locals window now includes visualizers for all first class objects in the PPL, the Agents Library so now when you a look at an instance of a concurrent_queue or an ubounded_buffer in the locals window they both look a lot like a std::queue instead of exposing implementation details. The Parallel Tasks and Parallel Stacks windows have also been enhanced; in addition to those videos, try the C++ code in the MSDN walkthrough and Daniel Moth’s blog. Finally, like the rest of the C Runtime, we’ve made our source code for the Concurrency Runtime available as part of the install, so if you need to debug deeper or really want to see how things work internally, you can now.

Documentation updated with more How To’s and walkthroughs

Our offline and online documentation has been significantly updated for Beta2. We’ve added multiple conceptual topics and expanded our How To topics significantly to include information not just on PPL and the Agents Library but on the underlying Concurrency Runtime and how to manage and use scheduler instances. Any feedback on these topics is greatly appreciated.

API updates to task_group and agent

For Beta 2, we’ve made a very small number of API updates to our task_group, structured_task_group and to our agent classes. The change to the agent is simple describe, we’ve simplified the state management and removed the parameter for the agent::done method, it’s sufficient to call this->done() instead of this->done(agent_done).

The change to task_group and structured_task_group is additive, we’ve added the method run_and_wait which takes a functor and runs it inline on the current thread or task. This offers the major benefit of being able to compose tasks and nest their cancellation. One of the easiest ways to see this in action is through implementing a parallel search algorithm, like a parallel version of the new C++0x library function ‘all_of’:

 template<class InIt,class Pr>

inline bool parallel_all_of(InIt first, InIt last, Pr pred)
{

   typedef iterator_traits<InIt>::value_type Item_type;

   //create a structured task group
   structured_task_group tasks;

   auto for_each_predicate = [&pred,&tasks](const Item_type& cur){

   if (!pred(cur))
      tasks.cancel();
   };

   auto task = make_task([&](){
      parallel_for_each(first,last,for_each_predicate);
   });

   return tasks.run_and_wait(task) != canceled;
}

Here we are placing a call to parallel_for_each inside of a structured_task_group and cancelling the work when the predicate is not true. This has the effect of cancelling all nested tasks in the structured_task_group, potentially saving work if the predicate is long running and expensive to compute. We can use this in our example above like this (but don’t expect to see significant speedups over std::all_of for this example):

 if (parallel_all_of(odds.begin(),odds.end(),[](int i){return i % 2;}))
    cout << "success!" << endl;
So try Beta 2 if you haven’t already

That’s about it for now, so download VS 2010 Beta 2 if you haven’t already and if you’re at PDC or TechEd Europe next month stop by our booth or come to one our talks.

-Rick

Comments

  • Anonymous
    October 29, 2009
    in terms of performance, which would be better, combinable<T> or concurrent_containers(e.g. concurrent_queue, concurrent_vector) ? I guess combinable<T> would involve less locks and thus better, can you confirm it? Thanks.

  • Anonymous
    October 30, 2009
    Both the containers and combinable should perform well. I have a simple object pool class that is implemented both as a wrapper around a concurrent_queue<T> and a combinable<queue<T>> and performance is similar in both cases. If you have a particular scenario in mind, I would recommend trying it both ways to see if there's a difference.