ARP Caching and SQL Server Failover Cluster install failures
Now I’m definitely not a networking guy, so even the more basic networking concepts fill me with wonder. One area I learned about recently with regards to the following scenario was related to Address Resolution Protocol (ARP) caching.
Consider the following scenario… You’re performing a SQL Server Failover Cluster installation and it is failing with a message like “IP Address xyz is already in use. To continue, specify a different IP address.” You validate that this IP is not being used elsewhere and when you ping it, you don’t get a response.
Enter ARP and ARP caching as one area to validate. ARP is a protocol that maps IPs to Ethernet MAC addresses. A network router sitting in between a client and server can in turn have an ARP cache which holds a list of the IP/Mac mappings. The cached mappings can be dynamic (created automatically) or static (created manually). These mappings can have timeouts associated with them – causing disused entries to be removed after a set period of time. You can view an ARP cache on your PC (for example – I did this on my Windows 7 OS) using “arp –a” at the command prompt. And as you can imagine, it is possible to have bad IP/MAC mappings.
There is also the concept of a Proxy ARP where a network device answers ARP requests on behalf of another destination and re-routes out the request accordingly. This then relates to ARP cache poisoning where, in one manifestation of it, an ARP cache can be tampered with in order to point to a bad MAC address for a man-in-the-middle attack.
Back to SQL Server Failover Cluster install failures… The whole ARP caching concept is of particular interest to me in the context of SQL Server installations because I haven’t yet seen it listed on standard SQL Server clustering / install checklists as something to validate (perhaps because the aforementioned issue shouldn’t be too common).
So if you’re seeing an “already in use” message for your SQL FC install and you’re running out of ideas – ask your networking folks about ARP caching and look for bad entries that potentially need to be cleared out in order to continue with your install.
Comments
Anonymous
March 27, 2011
I've definitely seen ARP cache issues cause a multitude of problems, to the point where we had to hard-code entries for a particular situation. This shouldn't just be a consideration for SQL Server FC, though, but for FC configurations in general.Anonymous
March 27, 2011
Good point, Brian. Thanks.Anonymous
March 29, 2011
Good blog post, Joe. There's also some inherent issues with other things like CNO/VCO and OUs that also get overlooked and can trip up installations as well.Anonymous
March 29, 2011
Thanks Allan - good point too.