volatile and MemoryBarrier()...

Artikel
05/13/2004

One thing I love about my job is that I learn something new all the time. Today I got a little bit smarter about volatile. One of the devs on the Indigo team was asking about the double check locking pattern. Today the Design Guidelines doc says:

public sealed class Singleton {

private Singleton() {}

private static volatile Singleton value;

private static object syncRoot = new Object();

public static Singleton Value {

get {

if (Singleton.value == null) {

lock (syncRoot) {

if (Singleton.value == null) {

Singleton.value = new Singleton();

}

return Singleton.value;

}

He wanted to know if “volatile” was really needed. Turns out the answer is “sorta”. Vance, a devlead on the CLR JIT team explained that the issue is around the CLR memory model… Essentially the memory model allows for non-volatile reads\writes to be reordered as long as that change can not be noticed from the point of view of a single thread. The issue is, of course, there is often more than one thread (like the finalizer thread, worker threads, threadpool threads, etc). volatile essentially prevents that optimization. As a side note. notice some other folks have a little problem in this space. A major mitigation here is that x86 chips don’t take advantage of this opportunity… but it will theoretically cause problems in IA64. As I was writing this I noticed that Vance has already done a very good write up a while ago…

That part I knew… what we news to me is there is a better way to do volatile, and that is with an explicitly memory barrier before accessing the data member.. We have a an API for that: System.Threading.Thread.MemoryBarrier(). This is more efficient than using volatile because a volatile field requires all accesses to be barriers and this effects some performance optimizations.

So, here is the “fixed” double check locking example..

public sealed class Singleton {

private Singleton() {}

private static Singleton value;

private static object syncRoot = new Object();

public static Singleton Value {

get {

if (Singleton.value == null) {

lock (syncRoot) {

if (Singleton.value == null) {

Singleton newVal = new Singleton();

// Insure all writes used to construct new value have been flushed.

System.Threading.Thread.MemoryBarrier();

Singleton.value = newVal; // publish the new value

}

return Singleton.value;

}

I have not completely internalized this yet, but my bet is it is still better to just make ” value” volatile to ensure code correctness at the cost of (possibly) minor perf costs.

Thoughts?

Update: Vance some new information about how this works in 2.0..

https://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/default.aspx

Update 2: Even more great information from Joe

https://www.bluebytesoftware.com/blog/2007/06/09/ALazyInitializationPrimitiveForNET.aspx

Update 3: This gets even better with 3.5, again from Joe!

https://www.bluebytesoftware.com/blog/PermaLink,guid,a2787ef6-ade6-4818-846a-2b2fd8bb752b.aspx

Comments

Anonymous
May 12, 2004
The comment has been removed
Anonymous
May 12, 2004
This link describes the memory barrier issue in great detail.

http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375
Anonymous
May 12, 2004
Could it be that your "fixed" example is broken? Should there not also be a memory barrier prior to the first read of Singleton.value (in the condition of the if statement)? Otherwise, it might be that the field values of the new object are read from cache, right?
Anonymous
May 12, 2004
An <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/singletondespatt.asp">msdn article</a> on singletons says that double-checking is built-in with C#. Which is correct?
Anonymous
May 12, 2004
Sorry, Here's the link:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/singletondespatt.asp
Anonymous
May 13, 2004
Ken, both are correct. If all you need to get instance of class is write "new Singleton()" - go get framework feature. However, this is not always the case. For example, you may need to check configuration to decide instance of which class to create. Then you do double-checking by hands.
Anonymous
May 13, 2004
The comment has been removed
Anonymous
May 13, 2004
I have a somewhat unique perspective on this question because I was an operating system programmer on one of the first large-scale multiprocessors (Sequent) in the 1980s, and now I'm an application programmer.

So. Chris Brumme wrote all about this in the context of the CLR, too:

http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/480d3a6d-1aa8-4694-96db-c69f01d7ff2b

As a person who has sat on both sides of this issue, I agree totally with Chris' comments about this subject, and disagree with Vance's. The CLR specifies a memory model that is a poor tradeoff for the real world. Your job is to enable reliable applications, not prevent them.

Don't drink that performance improvement Kool-aid that the hardware guys are serving. CPU performance is not the limiting factor on apps these days: programmer productivity and reliability are far more important.

Any environment in which double-check locking doesn't work in the natural way is simply broken. The CLR team should specify a default memory consistency model which is as strong as existing x86 implementations. If you want to allow that model to be broken by a select few people using #pragmas or similar kinds of hints, fine. Just don't inflict that complexity on the rest of us who are busy trying to use the CLR to add business value in the real world.

Jeff Berkowitz
pdxjjb at hotmail.com
Anonymous
May 13, 2004
The comment has been removed
Anonymous
May 13, 2004

To answer Brian's question, yes, only a write barrier (not a full read-write memory barrier is needed). This is in fact what is suggested in

http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375

However, this API has not made its way into the BCL yet, so you have to do a full memory barrier at the present time. Since this happens on the rare code path this is not a big deal in this case.

To comment on Jeff Berkowitz issue, I can say two things

1) First, I completely agreee with his tradeoff (you just want straigtforward things to work without any tricky issues). This argues for NOT using double check locking in the vast majority of cases (only those places you know you need the scaling). Thus you might see it in libraries, but you woudl expect to see it only VERY rarely in actual applications. As mentioned int he article above, simply placing a lock around the whole method is simple and will always work without these subtlties.

2) For the reaons Jeff mentions (people program to the simpler model whether we tell them about it or not), when the CLR comes out on weak memory model machines, we are very likely by default going to support the strong x86 model. Only by opting in to the weaker model will you have to worry about this.

Having said this, I am not comfortable telling people 'you can skip the memory barrier, because the runtime will make it right'. Multiprocessors are arlready common, and the memory model is a MAJOR issue for scaling. In 10 years we could easily be regretting the decision above.

Thus I prefer to actually tell people: if you want simplicity, just put locks around the whole thing. If that is not good enough, you really should step up and write the code properly for a weak model (There really are not that may lock-free patterns like the one above, and if you follow recipes, you should also be fine).
Anonymous
May 16, 2004
I think the implementation without using volatile is missing one memory barrier. According to
http://www.google.com/groups?q=g:thl1857306645d&dq=&hl=en&lr=&ie=UTF-8&selm=1998May28.082712%40bose.com&rnum=3
memory barriers are required for both read and write code paths. The read path extracted from the code is:

if ( Singleton.value == null ) // false
{// not executed }
return Singleton.value;

There is no memory barrier on this path. In the CLR memory model as described in Chris Brumme's blog (http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx), only volatile loads are considered "acquire", but normal loads can be reordered.

The correct implementation will be:

public sealed class Singleton {
private Singleton() {}
private static Singleton value;
private static object sync = new object();

public static Singleton Value {
get {
Singleton temp = Singleton.value;
System.Threading.Thread.MemoryBarrier(); // this is important

if ( temp == null ) {
lock ( sync ) {
if ( Singleton.value == null ) {
temp = new Singleton();
System.Threading.Thread.MemoryBarrier();
Singleton.value = temp;
}
}
}

return Singleton.value;
}
}
}

Let me expand on the performance of the two implementations of the double checked locking pattern. Obviously we want to make the read path faster and don't care about the write path because the write path is taken only once. The read path extracted from the code is:

// using volatile (Singleton.value is volatile)
get {
if ( Singleton.value == null ) {
// ... not taken
}
return Singleton.value;
}

// using memory barriers
get {
Singleton temp = Singleton.value;
System.Threading.Thread.MemoryBarrier();
if ( temp == null ) {
// ... not taken
}
return Singleton.value;
}

The volatile load in the first code has the acquire semantics and is equivalent to the non-volatile load plus the memory barrier in the second code. There are two volatile loads in the first code and only one memory barrier in the second. So I expect the code with memory barriers to perform faster than the code that uses volatiles. But as any performance speculations it has to be taken with a grain of salt. I haven't done any measurements here.
Anonymous
May 17, 2004
I agree with Alexei and Bart in that a read barrier is also needed on the read path.

Here's another example from MSDN that is broken in this regard:

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp

The fact that this code is wrong was confirmed by Neill Clift in a comp.programming.threads post:

http://groups.google.com/groups?q=fetchcomputedvalue&hl=en&lr=&ie=UTF-8&selm=3f79e7ee%241%40news.microsoft.com&rnum=1

The explanation provided by MSDN is also wrong by the way - CPU caches have nothing to do with this problem.
Anonymous
May 18, 2004
Don't use Double-check locking. You'll get it wrong.

Prove otherwise by coming up with an exmaple that isn't wrong, put money on it, someone will disprove it.
Anonymous
May 28, 2004
Among other things, you need to understand weak memory models.
Anonymous
May 28, 2004
Alexei, I don't understand the need for a memory barrier in the "read path".

We are making the following writes:

1. Write member variables in new Singleton constructor
2. Write new Singleton to temp
3. Write temp to Singleton.value

We are making the following reads:

A. Read Singleton.value
B. Read member of Singleton.value

The problem we have is that a thread executing the reads may see the writes in the wrong order, that is, it might see 3 before it sees 1. But the effect of putting a full memory barrier means that 3 cannot move above 1 in any respect. So if A sees the results of 3, then B must see the results of 1.
Anonymous
May 28, 2004
More specifically... you are correct if the memory barrier between 2 and 3 is only a release barrier. In that case, there is nothing to keep 3 from moving up to before A, and that's why you need an acquire barrier between A and B.

But, to be entirely accurate, what you need is an acquire barrier between A and 3. Putting a full memory barrier between 2 and 3 is certainly sufficient.
Anonymous
May 28, 2004
The comment has been removed
Anonymous
May 28, 2004
I might be totally wrong about this, but here's how I understand it.

> 1. Write member variables in new Singleton constructor
> 2. Write new Singleton to temp
> 3. Write temp to Singleton.value

> A. Read Singleton.value
> B. Read member of Singleton.value

If there is no memory barrier between the last two memory accesses, B can be fetched before A.

So if you observe reads and writes to the main memory, you might see the following sequence:

B
1
2
3
A

and that would be a problem.
Anonymous
May 28, 2004
The comment has been removed
Anonymous
June 03, 2004
I would like to see the Alexei reply to John Doty's reply "Alexei, I don't understand the need for a memory barrier in the "read path". Or other. Cheers!
Anonymous
June 03, 2004
The following two seem to suggest (In my mind) that Alexei is right and the only way to do this is a lock around the whole thing or read and write memory barrier which the lock gives you (I think.) Are these papers wrong or could some guru clear this up once and for all in terms of the .Net memory model? Cheers!

Andrew Birrell
"An Introduction to Programming with C# Threads"
http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf

The "Double-Checked Locking is Broken" Declaration (IBM)
http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.
Anonymous
June 03, 2004
The latter paper relates to the Java memory model, not the .NET one. The former paper seems to rely on the latter paper.

It would make sense to me that a read memory barrier is not required here, as read B depends on A as far as I can see. (That is, until you've completed A, you don't know which piece of memory B needs to read.)
Anonymous
June 04, 2004
The comment has been removed
Anonymous
June 04, 2004
If 1, 2 and 3 happen after the value read in A has been published, then A will return null, and you'll enter the lock, which is fine, surely?
Anonymous
June 04, 2004
The comment has been removed
Anonymous
June 05, 2004
Ah, I think I see what you mean.

Although there's a memory barrier after 1 and 2, that only stops those writes from being delayed - it doesn't prevent the write to three from being brought forward. Is that what you meant?

Oh, it's all too much... I'll stick with locking :)
Anonymous
June 05, 2004
The comment has been removed
Anonymous
June 06, 2004
Yes exactly Jon, that's what I meant.

I believe memory barrier's semantics are that no write can cross it, up or down, but I am not sure about it. If it doesn't have this semantic, then what is the point of the barrier in the first place?

Assuming it has this meaning, then putting a barrier there will make sure that 1 and 2 are definitaly seen before 3, which is what we want.

But I do agree with you, normal locking is good enough, if you have a design that falls on this, I mean doesn't scale or has a performance bottleneck here(in the singleton), then redesign the code instead. Rule number one in a threaded application is to not have more threads then you have parallell work, where most threads should work on their own instances of data. Given of course that they must share some.

And finally, don't use singletons. If your design rely on it, then the design is most likely not suffering from performance issues, so go for the full locking scheme. If the program is performance critical, then you don't want to have a dynamic behaviour anywhere and you won't to know the characteristic of your program at all times. I put it very black and white of course.
Anonymous
June 06, 2004
> It would make sense to me that a read memory
> barrier is not required here, as read B depends
> on A as far as I can see. (That is, until you've
> completed A, you don't know which piece of
> memory B needs to read.)

It's true in this particular case however in general there can be no dependency between reads.

For example in the MSDN article discussed above (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp) both reads are from global variables so as far as I can tell there's nothing preventing the CPU from reordering them. In this case you have to use another memory barrier on the read path.
Anonymous
June 06, 2004
It seems to me that the MemoryBarrier documentation is sorely lacking. If it just has the semantics of a volatile read and a volatile write, then it has very little use at all - reads can still come later than the barrier, and writes can still come earlier than the barrier. As Niclas says, only if it's a bidirectional barrier is it useful. Assuming I'm not missing something, that is...

Pavel: certainly in the general case it won't be true. I think we're really after working out just how optimised the singleton implementation here can be though. If the memory barrier before the assignment of value is bidirectional, I think we can get away with just that. If it's not, just using MemoryBarrier calls will never do enough.
Anonymous
June 07, 2004
The comment has been removed
Anonymous
June 09, 2004
Does this spin version work? Why or why not? Cheers!

public sealed class Singleton
{
private static int spinLock = 0; // lock not owned.
private static Singleton value = null;
private Singleton() {}

public static Singleton Value()
{
// Get spin lock.
while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
Thread.Sleep(0);

// Do we have any mbarrier issues?
if ( value == null )
value = new Singleton();

Interlocked.Exchange(ref spinLock, 0);
return value;
}
}

This would help answer a few related questions for me on how Interlocked works with mem barriers and cache, etc. TIA -- William
Anonymous
June 15, 2004
With thread local variable,we can ignore other thread.
We do not need synchorinization.
With thread local variable,we can get simple,safe,quick code.

public sealed class Singleton {
private Singleton() {}
private static object syncRoot = new Object();

// thread shared variable
private static Singletone sharedvalue;

// thread local variable for singletone
[ThreadStatic]
private static Singleton value;

public static Singleton Value {
get {
if (Singleton.value == null) {
lock (syncRoot) {
if (Singleton.sharedvalue == null) {
Singleton.sharedvalue = new Singleton();
}
Singleton.value=Singleton.sharedvalue;
}
}
return Singleton.value;
}
}
}

Gime me your some comment,please.
Anonymous
June 16, 2004
Tatsuhiko, in this case you don't really have a singleton. Each thread will get a single copy of the Singleton class. Hence, it really doesn't follow the semantics of the singleton pattern.
Anonymous
June 16, 2004
But I think,in this case Singleton class is a very class not a struct.
So threads will use very same instance of singleton class.
That is what we want to do,I think.

Using thread local variable,we can get back DCL performance.

Perhaps,I have less understanding than you,
about the semantics of the singleton pattern.
But,I understand that we want share same instance with many thread,
we hate overhead of synchronization per accessing the singleton.

With thread local variable,we can share same instance with many thread
without overhead of synchronization except initialization.
it is pragmatic enough to use thread local variable,I think.

Anyway,I like response for my article,so I thank you for your comment.
Anonymous
June 20, 2004
Tatsuhiko, I think the reason this technique has not received more attention (I first heard of it a couple of years ago) is because thread local variables have had their own performance problems, at least in common implementations of Java at the time. If I remember correctly, under low contention, simply using a lock was faster.

Again, that was a couple of years ago, and on the JVM, not the CLR. I'd be interested if you or anyone else has info to share about the performance of thread local storage on .NET.
Anonymous
June 21, 2004
Help me out here because I'm not getting the ThreadStatic approach. The docs clearly say this about ThreadStaticAttribute:

A static (Shared in Visual Basic) field marked with ThreadStaticAttribute is not shared between threads. Each executing thread has a separate instance of the field, and independently sets and gets values for that field. If the field is accessed on a different thread, it will contain a different value.

So again, how can this be a "Singleton" for all threads in an AppDomain when each thread winds up getting its own "Singleton"?
Anonymous
June 22, 2004
Joe,Thank you for your advice.
I did not think of performance of using thread local variable itself.
I hope that thread local variable access cost on the CLR is lower than JVM.

Keith,in my expamle,real singleton instance is set to 'sharedvalue'
which is normal static field,not ThreadStatic.
ThreadStatic field which is named as 'value' is cache of 'sharedvalue'.
That's all.

the flow is .....

(1) Begining states are....

Singleton.sharedvalue == null
ThreadA's Singleton.value == null
ThreadB's Singleton.vluae == null

(2) ThreadA access Singleton.value.
(2.1) Singleton.sharedvalue is initialized.
(2.2) ThreadA's Singleton.value is initialized.

so,results are
Singleton.sharedvalue == somevalue;
ThreadA's Singleton.value == somevalue;
ThreadB's Singleton.value == null;

In this state,different thread get different value from Singleton.value.
The description about ThreadStatic of the docs you read,indicates this situation.

(3) ThreadB access Singleton.value.
results are
Singleton.sharedvalue == somevalue;
ThreadA's Singleton.value == somevalue;
ThreadB's Singleton.value == somevalue;

In this state,different thread get same value from Singletone.value.
threads are share same instance of Singleton.

I wish this explanation make you understand about what I meant.
Anyway,thank you for your comments.
Anonymous
June 23, 2004
Oh I see. I should have looked a little more closely at your sample. You wind up with a little extra checking per thread (value == null) but that happens only once per thread so it doesn't hurt perf much.

However, don' t you need to either use the MemoryBarrier trick or use volatile on the sharedValue field? Since sharedValue is not ThreadStatic it is shared amongst all threads. So wouldn't it be possible for the assignment of sharedValue to happen after the read (sharedValue == null) on a MP system?
Anonymous
June 23, 2004
The comment has been removed
Anonymous
July 31, 2004
Tatsuhiko Machida:

It is indeed a valid technique, but I too have been living under the impression that thread local storage is as bad as taking a lock (which is used to do to update the thread global area to create the thread local areas).

--------

I am glad to see that the barrier after read A wasn't needed, however I now see, with new eyes, why it was there in the first place, he was using a temp variable in the read path too!! bad bad, we are trying to synchronize the pointer, we don't want another silly value to synch as well =)

Interesting thread.
Anonymous
September 21, 2004
Imagine my surprise then after finally tracking down a threading issue only to discover that the bug was in fact caused by the .NET synchronized Hashtable and it turns out that the synchronized Hashtable is in fact not thread safe, but thread safe
Anonymous
December 08, 2004
Brad Abram's blog entry on volatilty and memory barriers
Anonymous
August 23, 2005
Ich habe mich die letzten Tagen in Vorbereitung auf den heutigen Patterns-WebCast ein wenig ausgiebiger...
Anonymous
August 23, 2005
Ich habe mich die letzten Tagen in Vorbereitung auf den heutigen Patterns-WebCast ein wenig ausgiebiger...
Anonymous
August 25, 2005

Dieser Post stammt aus Dirks Web-Log
Das Singleton - das unbekannte Wesen
Ich habe mich die letzten...
Anonymous
August 25, 2005

Dieser Post stammt aus Dirks Web-Log
Das Singleton - das unbekannte Wesen
Ich habe mich die letzten...
Anonymous
August 25, 2005
Ich habe mich die letzten Tagen in Vorbereitung auf den heutigen Patterns-WebCast ein wenig ausgiebiger...
Anonymous
February 06, 2006
Notes on the February Atlanta C# Users Group.
Anonymous
May 03, 2006
If you have developed traditional Windows Client/Server applications on single-CPU machines for all your...
Anonymous
May 27, 2006
PingBack from http://www.theoldmonk.net/blog/2006/05/25/double-check-lock-and-multithreading/
Anonymous
September 01, 2006
Brad Abrams on volatile and MemoryBarrier(). Someone sent this to the&nbsp;team and I couldn&#8217;t...
Anonymous
March 08, 2007
前面两章主要涉及了一些预备的知识，从这一章起，我们将真正开始单例模式的研究 Singleton 首先看一个最常见的单例模式的实现，也是很多人常用的一种方式： Singleton 设计模式的下列实现采用了
Anonymous
July 15, 2007
PingBack from http://www.ekampf.com/blog/2007/07/15/WhatsWrongWithThisCode1Discussion.aspx
Anonymous
July 15, 2007
The Singleton implementation in the snippet I gave works fine as a lazy, thread-safe Singleton as it
Anonymous
February 02, 2008
Porque cash advance loan oregon pay day loan cash advance
Anonymous
May 24, 2008
Work from home mlm business opportunity. Work from home opportunities. Work at home.
Anonymous
July 17, 2008
You've been kicked (a good thing) - Trackback from DotNetKicks.com

Freigeben über

volatile and MemoryBarrier()...

Comments

Zusätzliche Ressourcen