次の方法で共有


When should I evict a cluster node?

image 

I thought I’d post a quick blog on this topic since we run into cases where evicting a cluster node is used as a troubleshooting step. That being said, evicting a node should NEVER be a primary troubleshooting step.

Evicting a node to try and resolve a cluster issue may get you deeper in the hole and ultimately make the issue more complex than it started out.  As an example, you originally started with a failover issue.  You evict the node but now you can’t get the node back into the cluster. Since you can no longer add the node back, you have this secondary issue that must be resolved before you can address your original problem.

In my experience of working many cluster issues, I have never resolved an issue by evicting a node. The only times you should ever evict a node are under the following scenarios.

  • Replacing a node with different hardware.
  • Reinstalling the operating system.
  • Permanently removing a node from a cluster.
  • Renaming a node of a cluster.

Let’s take a look at some very common scenarios where I’ve seen evicting a node used improperly.

Cluster service won’t start on node 2 of a cluster. Node 2 is evicted from the cluster. The original problem with why the cluster service didn’t start is still there but now that same problem also prevents node 2 from coming back into the cluster.

Resources don’t failover to node 2. Every time a failover occurs, the disks don’t come online and fail back to node 1. One of the nodes is evicted and then added back to the cluster. None of this addresses the disk issue so problem still remains.

If the reason for the disk failure is an Error 2, then the drives not seen properly by the evicted node. So when you go to try and add the evicted node back in and take the defaults, it could error trying to join back with this error in CLCFGSRV.LOG

Major Task ID: {B8C4066E-0246-4358-9DE5-25603EDD0CA0}
Minor Task ID: {3BB53C9E-E14A-4196-9066-5400FB8860C9}
Progress (min, max, current): 0, 1, 1
Description:
Checking that all nodes have access to the quorum resource
Status: 0x800713de
The quorum disk could not be located by the cluster service.
Additional Information:
For more information, visit Help and Support Services at
https://go.microsoft.com/fwlink/?LinkId=4441.

I could go on and on but the point I am trying to make is that unless you fall into the four specific scenarios I mention, don’t evict your cluster nodes. Your Microsoft Support Engineers thank you and your users will thank you.

Jeff Hughes
Senior Support Escalation Engineer
Microsoft Enterprise Platforms Support

Comments

  • Anonymous
    January 01, 2003
    @Dennis, I was able to add my server back into the cluster by disabling the cluster service as you instructed. @jeff, really? It was a little late when I found your article. Perhaps you might want to revise it?

  • Anonymous
    January 01, 2003
    Or, it could be your security settings.  You would need to review the cluster log to see the errors when the join takes place.  If it is a 1722 or 1726, it is the connectivity issue I just explained.  If it is a 5, it is an access denied.  If you click on my name in the tag list, I created a blog about access denied errors and cluster administrator.  It has the info you need on what to look for and how to fix it.

  • Anonymous
    January 01, 2003
    Good post Jeff,  I'll be forwarding this one on to co-workers for sure as I've seen evict node sometimes used as one of the first steps taken.ThanksMike

  • Anonymous
    January 01, 2003
    To fix this issue all you have to do is set server’s cluster service to disabled.

  • Anonymous
    January 01, 2003
    What is going on is that when the cluster service starts, it will always try to join the cluster first.  If it cannot join a cluster, it will try to form the cluster.  However, in your case, the cluster is already running on the other node, it has the quorum/witness disk.  This node tries to get it and cannot since it is already owned.  So the result is that the cluster terminates and gives the error about the drive.  The focus on your troubleshooting is why cannot the node join.  In most all cases, it is that the nodes Cannot communicate over port 3343.  Something is blocking the port.  It could be antivirus, firewall, etc.

  • Anonymous
    January 01, 2003
    p.s. the linked=4441 is a dead link

  • Anonymous
    July 31, 2010
    si la interfaz nueva es muy bacana..pero me colgo cuando abri un pdf online...no se si por el acrobat reader

  • Anonymous
    December 07, 2010
    The comment has been removed

  • Anonymous
    October 06, 2013
    Great and many thanks for writing this blog. I have the following questions though- would appreciate your input:Differences between Evict Node in Windows server 2008 and Remove Node in SQL Server 2008 r2? What are the difference [extra steps etc] when adding back: a) the evicted node and b) the removed node. Beside patching what would be other situations when one would have to remove a SQL Server Node. I would appreciate if you shoot me an email at naprico@hotmail.com listing the URL of your response to these questions. Many thanks again.BestNaprico

  • Anonymous
    October 10, 2013
    how can you fix the following situation. There is a cluster with two nodes A,B. Node A is up and running Cluster service on node B can't start with the message related to signature problems on the quorum. I tried to force the cluster service to start without a quorum but without success.This is a cluster 2003.

  • Anonymous
    March 22, 2014
    Hi i am in a very critical situation. my secondary cluster server is not working i am not able to see the quorum drive, share drive neither the solution services, but i am able to see the same on primary cluster. This is happened when i tried to upgrade the solution without stopping the cluster services. so can you help me with how can i re add the secondary server in cluster.

    please help this is do and die situation for me.

  • Anonymous
    July 01, 2015
    Nice post, however I dont understand how evicting a node can be a troubleshooting step.. I can only see making it worst..

  • Anonymous
    June 04, 2016
    What's up, I check your new stuff like every week. Your humoristic style is awesome, keep it up!