MSMQ prefers to be unique
If you want to quickly create a few test machines or roll out dozens of branch offices, DON'T clone MSMQ. The end result will be messages that just stick in the outgoing queue for no obvious reason and then, when you've got fed up waiting, get delivered to the destination just as inexplicably. In some situations the messages may just disappear without a trace.
If you look in the registry at HKLM\Software\Microsoft\MSMQ\Parameters\Machine Cache you will find a binary value called QMId. This value, as you'd guess, is the ID of the queue manager and MSMQ uses it to distinguish between different machines (IP address and computer name are never reliably unique enough). All message communication makes use of the value but, more importantly here, MSMQ uses the QMId for performance.
MSMQ maintains a temporary cache in memory of a received message's QMId property and the IP address of the sender. The queue manager uses the cache to find the IP address of the sender so it can correctly address the acknowledgement messages that underpin MSMQ's delivery system. The cache entries do have a lifetime and the queue manager will refresh or purge them at regular intervals.
Should a system have cloned machines running MSMQ then they will effectively all be sharing the same QMId row in the cache table. Every time a queue manager receives a message from one of these machines, it will acknowledge back to the IP address it found in the cache and there is a chance that this will mean a delivery to the wrong sender. The receiver of the misrouted message will simply discard it as the qeue manager cannot find anything locally that the acknowledgement corresponds to.
How this manifests itself will depend on how many clones there are and how busy the system is. For example, if there are only a handful of MSMQ machines and they just send a few times an hour then it is very unlikely that there will be a delay. The cached entries will have timed out and been removed before the next clone machine sends a message. On the other hand, a busy system could end up with a queue {no pun intended} of client machines all waiting on the cached entry to expire before their outgoing messages can finally be delivered.
What to do
If your clients are active directory-integrated then the QMId is stored in the msmqConfiguration object and the recommended path is to reinstall Message Queuing to generate a unique value for that machine.
Workgroup mode, though, is less drastic:
- Stop the MSMQ Service
- Delete the QMId value completely
- Add a SysPrep DWORD (Under HKLM\Software\Microsoft\MSMQ\Parameters) and set it to 1
- Start the MSMQ Service
To add a sense of risk to cloning MSMQ, there is also this KB article:
830639 Access Violation Occurs in the MSMQ Service
the hotfix for which is not currently included in any service packs but is part of Update Rollup 1 for Microsoft Windows 2000 Service Pack 4.
But, in all cases, prevention is better than cure so always install MSMQ (manually or scripted) after you have finished deploying the operating system to the machine and not before.
Comments
Anonymous
February 07, 2007
Great info. We saw this strange behavior with some ghosted machines but could not figure out the cause and effect. Now we know, thanks.Anonymous
April 03, 2008
Thank you! This info has saved me. I cloned a lot of machines and we started to lose 40% messages into thin air. We also load balance. So svr1 outgoing points at LB VIP which passes to svr2/3/4 incoming queues with NAT. "Workgroup mode, though, is less drastic" A note on this. Tried in lab. It removed all our queues (30) and triggers (30). We do have scripts to recreate them thankfully. Is there any way to reset MQID without losing config?Anonymous
April 03, 2008
Ignore me sysprep does not remove queues, triggers or rules. I tested on a outgoing queue server instead of a incoming server.Anonymous
January 30, 2009
May sound like a strange question but what if you have many clients spread around the world sending messagesAnonymous
September 22, 2009
Great blog! How is the QMId generated. I'm running into a problem where managed computers are booting up and their QMId is the same for all servers. Obviously, I didn't follow your advise from this article and I am stuck. What I want to do is be able to generate a 16-byte ID and assign it to my machies, so even if I reimage HOST-A always has the same value. How would I did that? Could I use the UUID of the computer? Thanks.Anonymous
September 23, 2009
Hi, In the past, the MSMQ client used CoCreateGUID (http://msdn.microsoft.com/en-us/library/ms688568(VS.85).aspx) to generate a value for QMId. Maybe MSMQ will use a different method call in more recent versions but functionally it will be the same. As long as you are using MSMQ in workgroup mode and not AD-integrated then you should be OK. The client will complain if the MSMQ object for this computer already exists in Active Directory with a different QMId. Note that this approach won't be supported by Microsoft - the only official way to generate a new QMId is to reinstall MSMQ. Cheers John Breakwell (MSFT)Anonymous
November 26, 2012
Thanks for the really useful blog post. Just wanted to add a lesson we learned. We had the following situation:
- server A was a sender of messages
- we cloned A into B
- when we started B up, messages sent OK from B, but not from A
- we could see that the messages were building up in the outbound queues of A We could find no way of forcing the messages to get sent from A, and in fact could not get it to start sending messages at all. We uninstalled and reinstalled MSMQ from B plus restarted Message Queuing service on the recipient server (in the hope that this would clear the in-memory cache mentioned above) but to no avail. As a final desperate measure, we tried the "Workgroup Method" of assigning a new QMId. The method worked, but all the messages in the outbound queues disappeared.