concurrent_vector and concurrent_queue explained
As a user of Visual Studio, you might have had a chance to use the concurrent data structures introduced in the Beta 2 release – concurrent_vector<T> and concurrent_queue<T>. A detailed discussion about the design, usage, and methods of concurrent_queue can be found here. A similar discussion of concurrent_vector will be posted in the near future.
In this post, we will go over the inner workings of these two containers, specifically, how they are laid out in memory.
concurrent_vector
The concurrent_vector class can be thought of as a parallel version of std::vector that allows thread safe growth. However, unlike std::vector, iterators to concurrent_vector are not invalidated when its storage is grown, and it does not support the pop_back() operation.
In std::vector, storage is dynamically allocated, and its elements follow a strict linear sequence. Due to the fact that elements are guaranteed to be in contiguous memory locations, they are all relocated to new locations when the container grows. However, concurrent_vector does not make the guarantee that all elements are stored in contiguous memory locations. Consequently, existing elements are not relocated when more memory needs to be allocated. Instead, elements are stored in a series of different contiguous arrays of elements. When an empty concurrent_vector is instantiated, no memory is allocated to start with. As elements are added to the container, arrays of increasing sizes are allocated as and when required. The sizes of the allocated arrays are always powers of 2, and increase as the container grows.
If you look at the raw view of concurrent_vector in a debugger, you will see that it contains a segment table member called _My_segment, which looks like an array of arrays. (To enable raw view, go to Tools -> Options -> Debugging and check the “Show raw structure of objects in variables windows” checkbox). In the segment table, _My_segment[0] points to the first 2 elements of the container, _My_segment[1] to the next 2 elements, _My_segment[2] to the next 4 elements, _My_segment[3] to the next 8 elements, etc. See Figure 1 (not to scale).
Figure 1
This arrangement of the segment table is independent of the sizes of the actual storage arrays contained within concurrent_vector, and is simply a logical view of the elements. In other words, the sizes of the logical arrays as seen by the segment table are always 2, 2, 4, 8, 16, 32, 64, 128, etc. However, the sizes of the actual arrays allocated from the heap may very well be 16, 16, 32, 64, 128, etc. The size of the first array is determined by the first reservation, growth or assignment operation.
For example, take a concurrent_vector<int> v that is instantiated with an initial size of 10. The actual size of the first array is the lowest power of 2 that can fit 10 elements, i.e., 16. As the container grows, the sizes of the subsequent arrays that are allocated are 16, 32, 64, etc. (increasing powers of 2). The segment table of v is populated such that _My_segment[0] points to the first 2 elements of the first array, _My_segment[1] points to the next 2 elements of the same first array, _My_segment[2] points to the next 4 elements, again in the first array, _My_segment[3] points to the next 8 elements, still in the first array, _My_segment[4] points to the second array (of size 16), _My_segment[5] points to the third array (of size 32), etc. See Figure 2.
Figure 2
When shrink_to_fit() is called, the storage is defragmented and the internal layout is optimized. However, it is not guaranteed that all elements of v are in one contiguous array. When clear() is called, all arrays are deallocated.
This, in short, is how the elements of concurrent_vector are laid out in memory.
concurrent_queue
concurrent_queue can be thought of as a parallel version of std::queue that offers thread safe addition and removal of elements. The elements contained in a concurrent_queue are held in blocks of memory called pages. As elements are enqueued and dequeued from the container, pages are added and deleted. The size of a single page and the number of elements a page can hold vary with the size of the data structure being held. See figure 3.
Figure 3
The concurrent_queue class is implemented as an array of 8 micro-queues, where each micro-queue is a linked list of pages. This layout is illustrated in Figure 4. Take the example of a concurrent_queue<int> q, to which we enqueue integers in the order 0, 1, 2, 3, etc. such that q[i] is equal to i. When q is instantiated, it is empty and no pages are allocated at first. As elements are enqueued, they are added to the first available spot in the first available page in a specific micro-queue. The formula used to select which micro-queue the ith element is added to is i*3%8. Therefore, element 0 is added to the micro-queue at _Array[0], element 1 to _Array[3], element 2 to _Array[6], element 3 to _Array[1], and so on. If there is no available page in the micro-queue that was selected, a new page is allocated and linked to the end of the micro-queue. The _Page data structure has a member variable that keeps track of the next page in the micro-queue (denoted by N in Figure 4). For a concurrent_queue<int>, the number of integers stored in each page is 32 (from Figure 3). Since the number of micro-queues in a concurrent_queue is always fixed at 8, the first layer of pages in q will hold the first 256 integers that are enqueued (0 through 255). When more integers are added to q, a new layer of pages is allocated.
Figure 4
There are 2 member variables in concurrent_queue that keep track of the number of elements enqueued and dequeued through its lifetime –_Tail_counter and _Head_counter. When an element is enqueued, _Tail_counter is incremented. When an element is dequeued, _Head_counter is incremented. The number of elements in the queue at any given time is the difference between _Tail_counter and _Head_counter.
When enough elements are dequeued from q such that a particular page becomes empty, the page is destroyed and its memory is deallocated. In the above example, when elements 0 – 247 are dequeued, each page in the first layer of pages ends up containing exactly 1 element. When element 248 is dequeued, the first page in _Array[0] becomes empty, and is therefore deallocated. When element 249 is dequeued, the first page in _Array[3] is deallocated. See Figure 5.
Figure 5
As more elements are enqueued and dequeued from q, new pages are added and older pages are deallocated. This, in short, is how concurrent_queue is laid out in memory.
A note on visualizers
So far, we have discussed the internals of concurrent_vector and concurrent_queue. Chances are that when you are using one of them in a program, you are less interested in the details of implementation of the data structure, and more interested in seeing the elements contained within them, similar to how you would see std::vector or std::queue under a debugger in the locals window. In the Beta 2 release, we have included debugger visualizers for concurrent_vector and concurrent_queue so that they appear like their corresponding STL data structures.
That’s it for now. Feel free to post a comment if there’s something more you would like to know about concurrent data structures or visualizers in Visual Studio 2010.
- Raghu Simha.
Comments
Anonymous
January 18, 2010
I would be interested in hearing more about the visualizer support in VS 2010. We rely upon them heavily for debuggin our large projects, and in VS 2005 visualizers are both undocumented and unstable. For example visualizers will work for a time, and then suddenly begin crashing VS every time a breakpoint is hit; deleting the visualizers always fixes the issue. And this is after installing all known service packs and hotfixes. Has any work been done on visualizer extensions, stability, or documentation for VS 2010? If so, I'd be very excited to hear about them. -chrisAnonymous
January 18, 2010
Unfortunately, visualizer support hasn't changed significantly for Visual Studio 2010. Visualizers are still undocumented, and custom visualizers are still unsupported. This means that while we do attempt to enable them to work, we don't make any guarantees, nor do we test Visual Studio with any other version of autoexp.dat other than the one it ships with. We did, however, make a few minor changes that include fixing a few crashes, making visualizers work on pointers, and adding support for the __log2() function.Anonymous
January 19, 2010
Thanks for the info Raghu. It's a shame that more work isn't being put into supporting visualizers, but bugfixes are appreciated. In the event that we suspect visualizers are crashing the IDE, what's the right way to go about generating a bug report? For example, is there some way to generate a dump that we can then send to Microsoft to help them track down the problem? And to whom should we send it? -chrisAnonymous
January 19, 2010
Chris, we are aware of some of the C++ visualizer issues and plan on addressing them in a future version of Visual Studio. Meanwhile, the best way to report issues with Visual Studio, particularly if you have a set of repro steps that can be documented, is via the Microsoft Connect website: https://connect.microsoft.com/site/sitehome.aspx?SiteID=210Anonymous
January 20, 2010
I'd be interested in knowing some of the rationale behind the design of these containers. Do these layouts aid concurrency? Why was the guarantee of contiguous memory dropped for concurrent_vector?Anonymous
February 01, 2010
pjackson, sorry for the delayed response to your question. The guarantee of contiguous memory was dropped for concurrent_vector because growing the vector would invalidate iterators that might be concurrently iterating over it. In general, the internal layouts aid in maximizing concurrency by enabling fast growth, concurrent operations, and minimizing false sharing.