volatile and MemoryBarrier()...

One thing I love about my job is that I learn something new all the time. Today I got a little bit smarter about volatile. One of the devs on the Indigo team was asking about the double check locking pattern. Today the Design Guidelines doc says:

 

public sealed class Singleton {

   private Singleton() {}

   private static volatile Singleton value;

   private static object syncRoot = new Object();

   public static Singleton Value {

          get {

                 if (Singleton.value == null) {

                        lock (syncRoot) {

                               if (Singleton.value == null) {

                                      Singleton.value = new Singleton();

                               }

                        }

                 }

                 return Singleton.value;

          }

   }     

}

He wanted to know if “volatile” was really needed. Turns out the answer is “sorta”. Vance, a devlead on the CLR JIT team explained that the issue is around the CLR memory model… Essentially the memory model allows for non-volatile reads\writes to be reordered as long as that change can not be noticed from the point of view of a single thread. The issue is, of course, there is often more than one thread (like the finalizer thread, worker threads, threadpool threads, etc). volatile essentially prevents that optimization. As a side note. notice some other folks have a little problem in this space. A major mitigation here is that x86 chips don’t take advantage of this opportunity… but it will theoretically cause problems in IA64. As I was writing this I noticed that Vance has already done a very good write up a while ago…

That part I knew… what we news to me is there is a better way to do volatile, and that is with an explicitly memory barrier before accessing the data member.. We have a an API for that: System.Threading.Thread.MemoryBarrier(). This is more efficient than using volatile because a volatile field requires all accesses to be barriers and this effects some performance optimizations.

So, here is the “fixed” double check locking example..

public sealed class Singleton {

   private Singleton() {}

   private static Singleton value;

   private static object syncRoot = new Object();

   public static Singleton Value {

          get {

                 if (Singleton.value == null) {

                        lock (syncRoot) {

                               if (Singleton.value == null) {

 Singleton newVal = new Singleton();

// Insure all writes used to construct new value have been flushed.

 System.Threading.Thread.MemoryBarrier();

                                      Singleton.value = newVal; // publish the new value

                               }

                        }

                 }

                 return Singleton.value;

          }

   }     

}

I have not completely internalized this yet, but my bet is it is still better to just make ” value” volatile to ensure code correctness at the cost of (possibly) minor perf costs.

Thoughts?

 

Update: Vance some new information about how this works in 2.0..

https://msdn.microsoft.com/msdnmag/issues/05/10/MemoryModels/default.aspx

 

Update 2: Even more great information from Joe

https://www.bluebytesoftware.com/blog/2007/06/09/ALazyInitializationPrimitiveForNET.aspx

 

Update 3: This gets even better with 3.5, again from Joe!

https://www.bluebytesoftware.com/blog/PermaLink,guid,a2787ef6-ade6-4818-846a-2b2fd8bb752b.aspx

Comments

  • Anonymous
    May 12, 2004
    The comment has been removed

  • Anonymous
    May 12, 2004
    This link describes the memory barrier issue in great detail.

    http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375


  • Anonymous
    May 12, 2004
    Could it be that your "fixed" example is broken? Should there not also be a memory barrier prior to the first read of Singleton.value (in the condition of the if statement)? Otherwise, it might be that the field values of the new object are read from cache, right?

  • Anonymous
    May 12, 2004
    An <a href="http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/singletondespatt.asp">msdn article</a> on singletons says that double-checking is built-in with C#. Which is correct?

  • Anonymous
    May 12, 2004
    Sorry, Here's the link:
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dnbda/html/singletondespatt.asp

  • Anonymous
    May 13, 2004
    Ken, both are correct. If all you need to get instance of class is write "new Singleton()" - go get framework feature. However, this is not always the case. For example, you may need to check configuration to decide instance of which class to create. Then you do double-checking by hands.

  • Anonymous
    May 13, 2004
    The comment has been removed

  • Anonymous
    May 13, 2004
    I have a somewhat unique perspective on this question because I was an operating system programmer on one of the first large-scale multiprocessors (Sequent) in the 1980s, and now I'm an application programmer.

    So. Chris Brumme wrote all about this in the context of the CLR, too:

    http://blogs.gotdotnet.com/cbrumme/PermaLink.aspx/480d3a6d-1aa8-4694-96db-c69f01d7ff2b

    As a person who has sat on both sides of this issue, I agree totally with Chris' comments about this subject, and disagree with Vance's. The CLR specifies a memory model that is a poor tradeoff for the real world. Your job is to enable reliable applications, not prevent them.

    Don't drink that performance improvement Kool-aid that the hardware guys are serving. CPU performance is not the limiting factor on apps these days: programmer productivity and reliability are far more important.

    Any environment in which double-check locking doesn't work in the natural way is simply broken. The CLR team should specify a default memory consistency model which is as strong as existing x86 implementations. If you want to allow that model to be broken by a select few people using #pragmas or similar kinds of hints, fine. Just don't inflict that complexity on the rest of us who are busy trying to use the CLR to add business value in the real world.

    Jeff Berkowitz
    pdxjjb at hotmail.com

  • Anonymous
    May 13, 2004
    The comment has been removed

  • Anonymous
    May 13, 2004

    To answer Brian's question, yes, only a write barrier (not a full read-write memory barrier is needed). This is in fact what is suggested in

    http://discuss.develop.com/archives/wa.exe?A2=ind0203B&L=DOTNET&P=R375

    However, this API has not made its way into the BCL yet, so you have to do a full memory barrier at the present time. Since this happens on the rare code path this is not a big deal in this case.

    To comment on Jeff Berkowitz issue, I can say two things

    1) First, I completely agreee with his tradeoff (you just want straigtforward things to work without any tricky issues). This argues for NOT using double check locking in the vast majority of cases (only those places you know you need the scaling). Thus you might see it in libraries, but you woudl expect to see it only VERY rarely in actual applications. As mentioned int he article above, simply placing a lock around the whole method is simple and will always work without these subtlties.

    2) For the reaons Jeff mentions (people program to the simpler model whether we tell them about it or not), when the CLR comes out on weak memory model machines, we are very likely by default going to support the strong x86 model. Only by opting in to the weaker model will you have to worry about this.

    Having said this, I am not comfortable telling people 'you can skip the memory barrier, because the runtime will make it right'. Multiprocessors are arlready common, and the memory model is a MAJOR issue for scaling. In 10 years we could easily be regretting the decision above.

    Thus I prefer to actually tell people: if you want simplicity, just put locks around the whole thing. If that is not good enough, you really should step up and write the code properly for a weak model (There really are not that may lock-free patterns like the one above, and if you follow recipes, you should also be fine).

  • Anonymous
    May 16, 2004
    I think the implementation without using volatile is missing one memory barrier. According to
    http://www.google.com/groups?q=g:thl1857306645d&dq=&hl=en&lr=&ie=UTF-8&selm=1998May28.082712%40bose.com&rnum=3
    memory barriers are required for both read and write code paths. The read path extracted from the code is:

    if ( Singleton.value == null ) // false
    {// not executed }
    return Singleton.value;

    There is no memory barrier on this path. In the CLR memory model as described in Chris Brumme's blog (http://blogs.msdn.com/cbrumme/archive/2003/05/17/51445.aspx), only volatile loads are considered "acquire", but normal loads can be reordered.

    The correct implementation will be:

    public sealed class Singleton {
    private Singleton() {}
    private static Singleton value;
    private static object sync = new object();

    public static Singleton Value {
    get {
    Singleton temp = Singleton.value;
    System.Threading.Thread.MemoryBarrier(); // this is important

    if ( temp == null ) {
    lock ( sync ) {
    if ( Singleton.value == null ) {
    temp = new Singleton();
    System.Threading.Thread.MemoryBarrier();
    Singleton.value = temp;
    }
    }
    }

    return Singleton.value;
    }
    }
    }

    Let me expand on the performance of the two implementations of the double checked locking pattern. Obviously we want to make the read path faster and don't care about the write path because the write path is taken only once. The read path extracted from the code is:

    // using volatile (Singleton.value is volatile)
    get {
    if ( Singleton.value == null ) {
    // ... not taken
    }
    return Singleton.value;
    }

    // using memory barriers
    get {
    Singleton temp = Singleton.value;
    System.Threading.Thread.MemoryBarrier();
    if ( temp == null ) {
    // ... not taken
    }
    return Singleton.value;
    }

    The volatile load in the first code has the acquire semantics and is equivalent to the non-volatile load plus the memory barrier in the second code. There are two volatile loads in the first code and only one memory barrier in the second. So I expect the code with memory barriers to perform faster than the code that uses volatiles. But as any performance speculations it has to be taken with a grain of salt. I haven't done any measurements here.

  • Anonymous
    May 17, 2004
    I agree with Alexei and Bart in that a read barrier is also needed on the read path.

    Here's another example from MSDN that is broken in this regard:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp

    The fact that this code is wrong was confirmed by Neill Clift in a comp.programming.threads post:

    http://groups.google.com/groups?q=fetchcomputedvalue&hl=en&lr=&ie=UTF-8&selm=3f79e7ee%241%40news.microsoft.com&rnum=1

    The explanation provided by MSDN is also wrong by the way - CPU caches have nothing to do with this problem.

  • Anonymous
    May 18, 2004
    Don't use Double-check locking. You'll get it wrong.

    Prove otherwise by coming up with an exmaple that isn't wrong, put money on it, someone will disprove it.

  • Anonymous
    May 28, 2004
    Among other things, you need to understand weak memory models.

  • Anonymous
    May 28, 2004
    Alexei, I don't understand the need for a memory barrier in the "read path".

    We are making the following writes:

    1. Write member variables in new Singleton constructor
    2. Write new Singleton to temp
    3. Write temp to Singleton.value

    We are making the following reads:

    A. Read Singleton.value
    B. Read member of Singleton.value

    The problem we have is that a thread executing the reads may see the writes in the wrong order, that is, it might see 3 before it sees 1. But the effect of putting a full memory barrier means that 3 cannot move above 1 in any respect. So if A sees the results of 3, then B must see the results of 1.

  • Anonymous
    May 28, 2004
    More specifically... you are correct if the memory barrier between 2 and 3 is only a release barrier. In that case, there is nothing to keep 3 from moving up to before A, and that's why you need an acquire barrier between A and B.

    But, to be entirely accurate, what you need is an acquire barrier between A and 3. Putting a full memory barrier between 2 and 3 is certainly sufficient.

  • Anonymous
    May 28, 2004
    The comment has been removed

  • Anonymous
    May 28, 2004
    I might be totally wrong about this, but here's how I understand it.

    > 1. Write member variables in new Singleton constructor
    > 2. Write new Singleton to temp
    > 3. Write temp to Singleton.value

    > A. Read Singleton.value
    > B. Read member of Singleton.value

    If there is no memory barrier between the last two memory accesses, B can be fetched before A.

    So if you observe reads and writes to the main memory, you might see the following sequence:

    B
    1
    2
    3
    A

    and that would be a problem.

  • Anonymous
    May 28, 2004
    The comment has been removed

  • Anonymous
    June 03, 2004
    I would like to see the Alexei reply to John Doty's reply "Alexei, I don't understand the need for a memory barrier in the "read path". Or other. Cheers!

  • Anonymous
    June 03, 2004
    The following two seem to suggest (In my mind) that Alexei is right and the only way to do this is a lock around the whole thing or read and write memory barrier which the lock gives you (I think.) Are these papers wrong or could some guru clear this up once and for all in terms of the .Net memory model? Cheers!

    Andrew Birrell
    "An Introduction to Programming with C# Threads"
    http://research.microsoft.com/~birrell/papers/ThreadsCSharp.pdf

    The "Double-Checked Locking is Broken" Declaration (IBM)
    http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.

  • Anonymous
    June 03, 2004
    The latter paper relates to the Java memory model, not the .NET one. The former paper seems to rely on the latter paper.

    It would make sense to me that a read memory barrier is not required here, as read B depends on A as far as I can see. (That is, until you've completed A, you don't know which piece of memory B needs to read.)

  • Anonymous
    June 04, 2004
    The comment has been removed

  • Anonymous
    June 04, 2004
    If 1, 2 and 3 happen after the value read in A has been published, then A will return null, and you'll enter the lock, which is fine, surely?

  • Anonymous
    June 04, 2004
    The comment has been removed

  • Anonymous
    June 05, 2004
    Ah, I think I see what you mean.

    Although there's a memory barrier after 1 and 2, that only stops those writes from being delayed - it doesn't prevent the write to three from being brought forward. Is that what you meant?

    Oh, it's all too much... I'll stick with locking :)

  • Anonymous
    June 05, 2004
    The comment has been removed

  • Anonymous
    June 06, 2004
    Yes exactly Jon, that's what I meant.

    I believe memory barrier's semantics are that no write can cross it, up or down, but I am not sure about it. If it doesn't have this semantic, then what is the point of the barrier in the first place?

    Assuming it has this meaning, then putting a barrier there will make sure that 1 and 2 are definitaly seen before 3, which is what we want.


    But I do agree with you, normal locking is good enough, if you have a design that falls on this, I mean doesn't scale or has a performance bottleneck here(in the singleton), then redesign the code instead. Rule number one in a threaded application is to not have more threads then you have parallell work, where most threads should work on their own instances of data. Given of course that they must share some.

    And finally, don't use singletons. If your design rely on it, then the design is most likely not suffering from performance issues, so go for the full locking scheme. If the program is performance critical, then you don't want to have a dynamic behaviour anywhere and you won't to know the characteristic of your program at all times. I put it very black and white of course.

  • Anonymous
    June 06, 2004
    > It would make sense to me that a read memory
    > barrier is not required here, as read B depends
    > on A as far as I can see. (That is, until you've
    > completed A, you don't know which piece of
    > memory B needs to read.)

    It's true in this particular case however in general there can be no dependency between reads.

    For example in the MSDN article discussed above (http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/synchronization_and_multiprocessor_issues.asp) both reads are from global variables so as far as I can tell there's nothing preventing the CPU from reordering them. In this case you have to use another memory barrier on the read path.

  • Anonymous
    June 06, 2004
    It seems to me that the MemoryBarrier documentation is sorely lacking. If it just has the semantics of a volatile read and a volatile write, then it has very little use at all - reads can still come later than the barrier, and writes can still come earlier than the barrier. As Niclas says, only if it's a bidirectional barrier is it useful. Assuming I'm not missing something, that is...

    Pavel: certainly in the general case it won't be true. I think we're really after working out just how optimised the singleton implementation here can be though. If the memory barrier before the assignment of value is bidirectional, I think we can get away with just that. If it's not, just using MemoryBarrier calls will never do enough.

  • Anonymous
    June 07, 2004
    The comment has been removed

  • Anonymous
    June 09, 2004
    Does this spin version work? Why or why not? Cheers!

    public sealed class Singleton
    {
    private static int spinLock = 0; // lock not owned.
    private static Singleton value = null;
    private Singleton() {}

    public static Singleton Value()
    {
    // Get spin lock.
    while ( Interlocked.Exchange(ref spinLock, 1) != 0 )
    Thread.Sleep(0);

    // Do we have any mbarrier issues?
    if ( value == null )
    value = new Singleton();

    Interlocked.Exchange(ref spinLock, 0);
    return value;
    }
    }

    This would help answer a few related questions for me on how Interlocked works with mem barriers and cache, etc. TIA -- William

  • Anonymous
    June 15, 2004
    With thread local variable,we can ignore other thread.
    We do not need synchorinization.
    With thread local variable,we can get simple,safe,quick code.

    public sealed class Singleton {
    private Singleton() {}
    private static object syncRoot = new Object();

    // thread shared variable
    private static Singletone sharedvalue;

    // thread local variable for singletone
    [ThreadStatic]
    private static Singleton value;


    public static Singleton Value {
    get {
    if (Singleton.value == null) {
    lock (syncRoot) {
    if (Singleton.sharedvalue == null) {
    Singleton.sharedvalue = new Singleton();
    }
    Singleton.value=Singleton.sharedvalue;
    }
    }
    return Singleton.value;
    }
    }
    }

    Gime me your some comment,please.

  • Anonymous
    June 16, 2004
    Tatsuhiko, in this case you don't really have a singleton. Each thread will get a single copy of the Singleton class. Hence, it really doesn't follow the semantics of the singleton pattern.

  • Anonymous
    June 16, 2004
    But I think,in this case Singleton class is a very class not a struct.
    So threads will use very same instance of singleton class.
    That is what we want to do,I think.

    Using thread local variable,we can get back DCL performance.

    Perhaps,I have less understanding than you,
    about the semantics of the singleton pattern.
    But,I understand that we want share same instance with many thread,
    we hate overhead of synchronization per accessing the singleton.

    With thread local variable,we can share same instance with many thread
    without overhead of synchronization except initialization.
    it is pragmatic enough to use thread local variable,I think.

    Anyway,I like response for my article,so I thank you for your comment.

  • Anonymous
    June 20, 2004
    Tatsuhiko, I think the reason this technique has not received more attention (I first heard of it a couple of years ago) is because thread local variables have had their own performance problems, at least in common implementations of Java at the time. If I remember correctly, under low contention, simply using a lock was faster.

    Again, that was a couple of years ago, and on the JVM, not the CLR. I'd be interested if you or anyone else has info to share about the performance of thread local storage on .NET.

  • Anonymous
    June 21, 2004
    Help me out here because I'm not getting the ThreadStatic approach. The docs clearly say this about ThreadStaticAttribute:

    A static (Shared in Visual Basic) field marked with ThreadStaticAttribute is not shared between threads. Each executing thread has a separate instance of the field, and independently sets and gets values for that field. If the field is accessed on a different thread, it will contain a different value.

    So again, how can this be a "Singleton" for all threads in an AppDomain when each thread winds up getting its own "Singleton"?

  • Anonymous
    June 22, 2004
    Joe,Thank you for your advice.
    I did not think of performance of using thread local variable itself.
    I hope that thread local variable access cost on the CLR is lower than JVM.

    Keith,in my expamle,real singleton instance is set to 'sharedvalue'
    which is normal static field,not ThreadStatic.
    ThreadStatic field which is named as 'value' is cache of 'sharedvalue'.
    That's all.

    the flow is .....

    (1) Begining states are....

    Singleton.sharedvalue == null
    ThreadA's Singleton.value == null
    ThreadB's Singleton.vluae == null

    (2) ThreadA access Singleton.value.
    (2.1) Singleton.sharedvalue is initialized.
    (2.2) ThreadA's Singleton.value is initialized.

    so,results are
    Singleton.sharedvalue == somevalue;
    ThreadA's Singleton.value == somevalue;
    ThreadB's Singleton.value == null;

    In this state,different thread get different value from Singleton.value.
    The description about ThreadStatic of the docs you read,indicates this situation.


    (3) ThreadB access Singleton.value.
    results are
    Singleton.sharedvalue == somevalue;
    ThreadA's Singleton.value == somevalue;
    ThreadB's Singleton.value == somevalue;

    In this state,different thread get same value from Singletone.value.
    threads are share same instance of Singleton.

    I wish this explanation make you understand about what I meant.
    Anyway,thank you for your comments.

  • Anonymous
    June 23, 2004
    Oh I see. I should have looked a little more closely at your sample. You wind up with a little extra checking per thread (value == null) but that happens only once per thread so it doesn't hurt perf much.

    However, don' t you need to either use the MemoryBarrier trick or use volatile on the sharedValue field? Since sharedValue is not ThreadStatic it is shared amongst all threads. So wouldn't it be possible for the assignment of sharedValue to happen after the read (sharedValue == null) on a MP system?

  • Anonymous
    June 23, 2004
    The comment has been removed

  • Anonymous
    July 31, 2004
    Tatsuhiko Machida:

    It is indeed a valid technique, but I too have been living under the impression that thread local storage is as bad as taking a lock (which is used to do to update the thread global area to create the thread local areas).



    --------

    I am glad to see that the barrier after read A wasn't needed, however I now see, with new eyes, why it was there in the first place, he was using a temp variable in the read path too!! bad bad, we are trying to synchronize the pointer, we don't want another silly value to synch as well =)


    Interesting thread.

  • Anonymous
    September 21, 2004
    Imagine my surprise then after finally tracking down a threading issue only to discover that the bug was in fact caused by the .NET synchronized Hashtable and it turns out that the synchronized Hashtable is in fact not thread safe, but thread safe

  • Anonymous
    December 08, 2004
    Brad Abram's blog entry on volatilty and memory barriers

  • Anonymous
    August 23, 2005
    Ich habe mich die letzten Tagen in Vorbereitung auf den heutigen Patterns-WebCast ein wenig ausgiebiger...

  • Anonymous
    August 23, 2005
    Ich habe mich die letzten Tagen in Vorbereitung auf den heutigen Patterns-WebCast ein wenig ausgiebiger...

  • Anonymous
    August 25, 2005

    Dieser Post stammt aus Dirks Web-Log
    Das Singleton - das unbekannte Wesen
    Ich habe mich die letzten...

  • Anonymous
    August 25, 2005

    Dieser Post stammt aus Dirks Web-Log
    Das Singleton - das unbekannte Wesen
    Ich habe mich die letzten...

  • Anonymous
    August 25, 2005
    Ich habe mich die letzten Tagen in Vorbereitung auf den heutigen Patterns-WebCast ein wenig ausgiebiger...

  • Anonymous
    February 06, 2006
    Notes on the February Atlanta C# Users Group.

  • Anonymous
    May 03, 2006
    If you have developed traditional Windows Client/Server applications on single-CPU machines for all your...

  • Anonymous
    May 27, 2006
    PingBack from http://www.theoldmonk.net/blog/2006/05/25/double-check-lock-and-multithreading/

  • Anonymous
    September 01, 2006
    Brad Abrams on volatile and MemoryBarrier(). Someone sent this to the&amp;nbsp;team and I couldn&amp;#8217;t...

  • Anonymous
    March 08, 2007
    前面两章主要涉及了一些预备的知识,从这一章起,我们将真正开始单例模式的研究 Singleton 首先看一个最常见的单例模式的实现,也是很多人常用的一种方式: Singleton 设计模式的下列实现采用了

  • Anonymous
    July 15, 2007
    PingBack from http://www.ekampf.com/blog/2007/07/15/WhatsWrongWithThisCode1Discussion.aspx

  • Anonymous
    July 15, 2007
    The Singleton implementation in the snippet I gave works fine as a lazy, thread-safe Singleton as it

  • Anonymous
    February 02, 2008
    Porque cash advance loan oregon pay day loan cash advance

  • Anonymous
    May 24, 2008
    Work from home mlm business opportunity. Work from home opportunities. Work at home.

  • Anonymous
    July 17, 2008
    You've been kicked (a good thing) - Trackback from DotNetKicks.com