Freigeben über


Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes.

Although Exchange 2010 no longer deploys a cluster resource model we still use Windows Failover Clustering service for certain functions.

When a Windows 2008 / 2008 R2 cluster is created, the cluster core resources are groups together in the ‘Cluster Group’.  THe Cluster Group is a hidden group that contains the following resources:

  • Cluster Name:  This is the cluster name object (CNO).  Exchange 2010 uses the name of the DAG to create this resource.  The name of the DAG is always the name of the cluster and the CNO.
  • Cluster IPv4 Addresses:  These are the IPv4 addresses that are associated with the DAG.  If the members of the DAG span multiple subnets, there will be multiple IPv4 resources.
  • File Share Witness:  This is the quorum resource that is created using the witness server and witness directory settings of the DAG.  This resource should only be present when there is an even number of DAG members.

You can see the cluster core resources in failover cluster manager by selecting the cluster name in the upper left hand pane.  In the center pane, expand the cluster core resources section.

image

The cluster core resource group can also be seen using cluster.exe (or in Windows 2008 R2 cluster powershell extensions).

Windows 2008 / Windows 2008 R2:  Cluster.exe DAG.company.com group

cluster.exe dag.company.com group
Listing status for all available resource groups:

Group Node Status
-------------------- --------------- ------
Cluster Group DAG-1 Online
Available Storage DAG-1 Offline

Windows 2008 R2:  Get-ClusterGroup –Cluster DAG.company.com

PS C:\Users\Administrator> Get-ClusterGroup -Cluster DAG.company.com

Name OwnerNode State
---- --------- -----
Cluster Group dag-1 Online
Available Storage dag-1 Offline

From an Exchange 2010 perspective you do not really need to manage the cluster core resources.  As members join and depart the cluster this resource group will be automatically moved to a remaining member.  Each member of the DAG should have the ability to arbitrate and fully bring online the cluster core resources.

When a cluster is created in Windows 2008 or Windows 2008 R2, the cluster service enumerates all network ports found on the nodes.  These network ports are then combined into cluster networks.  You can view the cluster networks in failover cluster manager by expanding the cluster name and expanding networks.

image

You can also view the cluster networks using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network

cluster.exe dag.company.com network
Listing status for all available networks:

Network Status
---------------------------------------- -----------
Cluster Network 2 Up
Cluster Network 4 Up
Cluster Network 1 Up

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com

Get-ClusterNetwork -Cluster DAG.company.com

Name State
---- -----
Cluster Network 1 Up
Cluster Network 2 Up
Cluster Network 4 Up

A cluster network has three settings:

  • Do not allow cluster network communications on this network
  • Allow cluster network communications on this network
    • Allow clients to connect through this network

You can see these settings in failover cluster manager by getting the properties of a cluster network.

image

You can also view the network role either by using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network "Cluster Network 1” /prop

cluster dag.company.com network "Cluster Network 1" /prop

Listing properties for 'Cluster Network 1':

T Network Name Value
-- -------------------- ------------------------------ -----------
SR Cluster Network 1 Name Cluster Network 1
MR Cluster Network 1 IPv6Addresses
MR Cluster Network 1 IPv6PrefixLengths
MR Cluster Network 1 IPv4Addresses 10.0.0.0
MR Cluster Network 1 IPv4PrefixLengths 24
SR Cluster Network 1 Address 10.0.0.0
SR Cluster Network 1 AddressMask 255.255.255.0
S Cluster Network 1 Description
D Cluster Network 1 Role 3 (0x3)
D Cluster Network 1 Metric 1200 (0x4b0)
D Cluster Network 1 AutoMetric 1 (0x1)

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com | fl name,role

Get-ClusterNetwork -Cluster DAG-1.company.com | fl name,role

Name : Cluster Network 1
Role : 3

Name : Cluster Network 2
Role : 1

Name : Cluster Network 4
Role : 1

The role of the networks can also be viewed in the registry of each node.  This information is located at:  HKEY_LOCAL_MACHINE\Cluster\Networks.  Each cluster network is represented by a subkey which is the GUID of the network.  Expanding the GUID, you will see sub-values including Name and Role.

[HKEY_LOCAL_MACHINE\Cluster\Networks\2cd2b920-0a2a-4851-bb24-de02d4a70b7e]
@="class mscs::TmNetworkInfo"
"Id"="2cd2b920-0a2a-4851-bb24-de02d4a70b7e"
"Name"="Cluster Network 2"
"Signature"="NETW"
"Description"=""
"Role"=dword:00000001
"Priority"=dword:ffffffff
"Transport"="TCP/IP"
"Ignore"=dword:00000000
"Address"="192.168.0.0"
"AddressMask"="255.255.255.0"
"IPv6Address"=""
"State"=dword:00000003
"Metric"=dword:0000044c
"AutoMetric"=dword:00000001

The role value can contain three different values depending on the cluster network settings.  The values are:

  • 0:  Do not allow cluster network communications on this network
  • 1:  Allow cluster network communications on this network
  • 3:  Allow clients to connect through this network

In order for an IPv4 resource to be brought online it must be associated with a network that  is configured to “Allow cluster network communications on this network” and to “Allow clients to connect through this network”.  If for any reason the “Allow clients to connect through this network” option is not enabled, the IPv4 resource associated with that network will not be able to be brought online.

On an Exchange 2010 DAG member, when attempting to move the cluster core resources to another DAG member the resources may fail to come online.  Specifically the IPv4 resource fails to come online which results in the network name resource failing to come online (due to dependency).

If using Failover Cluster Manager and attempting to bring online the IPv4 resource in the cluster core resources group, the following pop up error is displayed:

image

A review of the system log shows event 1223:

Log Name: System

Source: Microsoft-Windows-FailoverClustering

Date: 5/10/2010 1:14:42 PM

Event ID: 1223

Task Category: IP Address Resource

Level: Error

Keywords:     

User: SYSTEM

Computer: dagNode.company.com

Description:

Cluster IP address resource 'IPv4 Static Address 2 (Cluster Group)' cannot be brought online because the cluster network 'Cluster Network 2' is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

This Event 1223, described above, indicates that the effective setting for Cluster Network 2 is “Allow cluster network communications on this network” but does not have “Allow clients to connect through this network” set.  However, when reviewing the settings in failover cluster manager for Cluster Network 2 you might see that both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled. 

The Microsoft Exchange Replication Service is responsible for assisting to maintain the cluster network configuration.  There is an issue in the current Replication Service where settings are not changed.  This essentially causes a difference between the setting inside the cluster and the setting displayed in Failover Cluster Management tools.

Workaround:

A quick and easy workaround for this issue is to simply reset the state of the network.  There are multiple ways to accomplish this and I will outline each below.  Step zero before proceeding with any other steps is to note the cluster network that is displayed in the above event since that is the network that will need to be reset (in this example Cluster Network 2). 

Windows 2008 / Windows 2008 R2 – Using Failover Cluster Management Tool

The network state can be reset using Failover Cluster Manager

  • Launch Failover Cluster Management
  • Expand the cluster \ networks.

image

  • Get the properties of the cluster network in question.
  • Uncheck the box to “Allow clients to connect through this network”.

image

  • Press <apply> - you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.
  • The network is disabled for “Allow clients to connect through this network”. 

Next we need to enable the network for “Allow clients to connect through this network”.

  • Get the properties of the cluster network.
  • Check the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.

The network has been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 / Windows 2008 R2: Using cluster.exe

  • Launch a command prompt with administrative privileges.
  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=1

  • The network is disabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=3

  • The network is enabled for “Allow clients to connect through this network”.  At this time we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 R2: Using powershell

  • Launch powershell with administrative privileges.
  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=1}

  • The network is disabled for “Allow clients to connect through this network”. 

Next, enable the network for “Allow clients to connect through this network”.

  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=3}

  • The network is enabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

 

LONG TERM FIX

This issue will be fixed in Exchange 2010 Service Pack 1. The issue will not be fixed in Exchange 2010 RTM.

==========================================

Updated – 6/2/2010

Updated to list Exchange 2010 SP1 confirmed to contain fix. 

==========================================

Comments

  • Anonymous
    January 01, 2003
    @Gaz: Can you provide some more information for me.  This is not something that you can reproduce.  If you manually change the cluster settings you can force this issue to occur but it is not covered under the fix described in this blog post.  Are you saying that you have experienced this issue on an SP1 DAG? TIMMCMIC

  • Anonymous
    January 01, 2003
    @GAZ Are you sure your issue is not fixed by SP1.  We have had no reported cases of any issues post SP1 when the instructions are followed.  As a reminder, if you had the issue pre-SP1 SP1 alone does not fix the issue, the workaround must be followed.  The issue is simply prevented from re-occuring. Also - just becuase you have a resource offline does not mean you have this issue - there are multiple reasons a resource maybe reporting offline. TIMMCMIC

  • Anonymous
    January 01, 2003
    @Tim Unfortunately no, I had already installed EX2010-SP1 when I installed BE2010 which is to be my last step before moving real mailboxes to the EX2010 server. McCue

  • Anonymous
    January 01, 2003
    The comment has been removed

  • Anonymous
    January 01, 2003
    @Phil: I am not aware of what BE2010 is. TIMMCMIC

  • Anonymous
    January 01, 2003
    @Tim, We're still encountering this problem numerous times in SP2 with Update Rollup 4. We've to manually bring the cluster online!

  • Anonymous
    January 01, 2003
    @James: We do not expect this to fix any underlying cluster network communications problems. TIMMCMIC

  • Anonymous
    January 01, 2003
    @mfahey Assuming you actually have this issue if you followed the workaround then you should have been fine.  Take a look at my collapsing DAG networks blog post as this could be another reason your cluster networks are not maintained correctly. TIMMCMIC

  • Anonymous
    January 01, 2003
    @McCue Thanks for posting.  Can you confirm whether or not the issue was present prior to upgrading to SP1? TIMMCMIC

  • Anonymous
    January 01, 2003
    Tim, This is still a problem with Service pack 1.  Today I had to go to the failover cluster manager and remove the check, click apply, then add the check and click apply and finally I can bring the DAG online to both ping it and use Backup Exec to select the DAG.   McCue

  • Anonymous
    January 01, 2003
    @Joe: This is correct but you should not have to reset the IP address to correct the issue outlined in this blog. TIMMCMIC

  • Anonymous
    January 01, 2003
    @Sadda: If you can bring the cluster core resources online then this is not your issue.  This issue would prevent you from manually brining the cluster core resources online. I would suggest reviewing the application and system logs for events regarding the cluster core resources. TIMMCMIC

  • Anonymous
    January 01, 2003
    The solutions above do not work. I do not have SP1 installed yet. When bringing the IP ADDRESS online I get error code: 0x80071737 When bringing the dag name online I get: Error code: 0x80071736 The resource failed to com online due to the filure of one or more provider resources. Any other fixes for this?

  • Anonymous
    January 01, 2003
    @curropar...

    I can assure you that in the specific instance cited here this was an exchange issue. If you have had this issue outside of Exchange it would be caused by other factors.

    TIMMCMIC

  • Anonymous
    September 06, 2010
    Can anyone confirm this has 100% fixed in e2010sp1?

  • Anonymous
    September 17, 2010
    This is not fixed in SP1. I have tested this myself.

  • Anonymous
    September 20, 2010
    This is an issue before SP1 as well.  I have Exchange 2010 and BE2010 R2, and cluster was offline after a reboot.

  • Anonymous
    September 23, 2010
    To set an IP address for the DAG, use the following exchange shell command: Set-DatabaseAvailabilityGroup -identity DAGGroupName -databaseavailabilitygroupipaddress 192.168.x.x Confirm with Get-DatabaseAvailabilityGroup -Idenity DagGroupName |fl

  • Anonymous
    December 30, 2010
    Is this related to BE2010? I had this same issue with Exchange 2010 (not SP1) but not until after BE2010 was installed. Thanks for the fix !

  • Anonymous
    January 13, 2011
    @Tim I'm pretty sure BE2010 is Backup Exec 2010

  • Anonymous
    January 27, 2011
    With Exchange 2010 SP1 UR0 this is not fixed.

  • Anonymous
    January 27, 2011
    If the issue existed prior to upgrading then you will have to follow the workaround.  SP1 will prevent the issue from reoccuring. TIMMCMIC

  • Anonymous
    February 14, 2011
    I have run the fix a number of times after installing SP1 for Exchange 2010 and one of my two DAG addresses are reporting as offline. Is there any other fix available?

  • Anonymous
    May 23, 2011
    So, again, another issue that is meant to be fixed by SP1, isnt....................... Currently I have a DAG member that is reporting as offline, failed......         Dont msft test anything anymore?????

  • Anonymous
    October 04, 2011
    I had the same experience as @MattP_75, I had to change the role in registry: http://lokna.no/?p=998

  • Anonymous
    May 29, 2013
    I had the same problem where the cluster IP Address wouldn't come online.  I tried the workaround but it didn't seem to make a difference.  It wasn't until I changed the setting to "Do not allow cluster network communication on this network" that the cluster was able to come online.  After changing that setting it must have reset the network because it immediately went to an online state and the setting was reverted back to "Allow cluster network communication on this network" and "Allow clients to connect through this network".

  • Anonymous
    May 29, 2013
    Oh, and this was with Exchange 2010 SP2.

  • Anonymous
    July 23, 2013
    I also have the same issue. I have exchange 2010 Sp3 RU1 on the server. when i tried to select the "Allow clients to connect through this network"  option and click ok. after few second it again deselect the option automatically. I dont know why this is misbehaving.

  • Anonymous
    August 05, 2013
    The comment has been removed

  • Anonymous
    August 05, 2013
    Also fixed partitioned network and event ID's 1129, 1126 and 1564

  • Anonymous
    September 09, 2013
    I have Exchange 2010 SP3 RU1 with DAG and see the same issue where the resource would not come online. Any fix for this?

  • Anonymous
    January 28, 2014
    This is not fixed for me. we're at exchange 2010 SP2 and its the same problem

  • Anonymous
    August 07, 2014
    Or service pack 3 rollup 5

  • Anonymous
    October 01, 2014
    I have faced same issue but i am using Exchange 2013 SP1.

  • Anonymous
    November 02, 2014
    worked fine for me exchange 2010 without sp

  • Anonymous
    May 22, 2015
    Hi, this has been an issue for me: although it's not an Exchange, just a file server, it failed in the same way, with the same error code and the same events on the log. It's Windows 2008 R2 Enterprise SP1 x64. I didn't have the time to look for a solution on the internet (it's a cluster, it's supposed to be HA!) , so I'd to delete the File Server Witness roles, created them again and create the shares (it's more than 120 shares!). Luckily, I did an export of all shares the week before!

    So basically, I meant this is not an Exchange issue, but Windows Server 2008 issue. Don't expect this to be solved by any patch or SP for Exchange.

  • Anonymous
    September 07, 2015
    I'm also having this issue in Exchange 2013 CU7 on Server 2012 R2. I can manually change the Role dword valuein the registry from 1 to 3 and bring the cluster resources online. However, the following day Exchange appears to have reverted this setting... Still looking for a fix.

  • Anonymous
    October 18, 2015
    @Roeland...

    This would indicate that the DAG networks are not collapsed correctly you do not have the correct flags on the DAG networks.

    This assumes that you are not trying to set a secondary network to allow cluster IP addresses as that will never work.

    TIMMCMIC