Database Mirroring Breakdown
Originally published on: Database Mirroring Breakdown
Hi Friends,
Recently one of my mirroring environment reported breakdown and as usual it was thought that it will require with lots of troubleshooting to figure out and fix the issue.
In my case I thought to start eradicating the common causes first for this breakdown.
Technically speaking, mirroring is actually a "Self-Managed" solution offered by MicrosoftSQL Server.
It is very simple in nature and if one had right level of configurations (i.e. valid credentials etc.) in place, mirroring is not so complicated as many might perceives it. It is more friendly for those who understands the security context and basics of Database and log files (LSNs etc.)
So, coming back to the main topic, mine is a SQL Server 2008-R2 64-bit RTM with a mirrored and witness server of same configuration. Database size was not so huge and there were total 2 databases mirrored to different partners (one of them was without witness i.e. manual fail-over).
So when I looked at the mirroring status I saw that the Principal was reporting:
XYZ (Principal, DISCONNECTED / In Recovery) --Here XYZ is for my Principal database name.
Mirroring Partner also had same status for the mirrored database at partner end:
XYZ (Mirror, DISCONNECTED / In Recovery) --Here XYZ is for my mirrored database name.
Immediately after looking at it, what I did was running the below command first on principal and then on mirroring Partner in order to recycle the Endpoints:
ALTER ENDPOINT Mirroring STATE=STOPPED (Principal and then on Partner)
ALTER ENDPOINT Mirroring STATE=STARTED (Principal and then on Partner)
This was post checking the error logs and part of initial few steps which I took.
In my case, it was not my overconfidence but because of the fact that a day before mirroring was broken the same way and after some trouble shoot I figured out and recycled the Endpoints to restore it back. so, I was sure that nothing else could go wrong than a security configuration.
Although, the previous day was a different scenario altogether and restarting ENDPOINT was the simplest solution for that scenario.
So, I thought the problem might reoccurred and I directly, recycled the Endpoint (anyway I had nothing to lose as the mirroring was already broken and if recycling of endpoints couldn't help restoring the session, that will not harm either.
While checking through the error logs in details the below is what it was reporting from Principal, Partner and Witness error logs:
Principal:
Message:
Database Mirroring login attempt by user 'domain\userprincipal.' failed with error: 'Connection handshake failed. The login 'domain\userprincipal' does not have CONNECT permission on the endpoint. State 84.'. [CLIENT: xxx.xxx.xxx.xx]
Witness:
Message:
Database Mirroring login attempt by user 'domain\usermirror.' failed with error: 'Connection handshake failed. The login 'domain\usermirror' does not have CONNECT permission on the endpoint. State 84.'. [CLIENT: xxx.xxx.xxx.xx]
Mirroring Partner:
Message
Database Mirroring login attempt failed with error: 'Connection handshake failed. An OS call failed: (8009030c) 0x8009030c(The logon attempt failed). State 67.'. [CLIENT: xxx.xxx.xxx.xx]
To explain, all the above errors are related to the permissions and authentication (easy to understand from the error message itself) but the challenge was to check from WHERE they were missing?
Well, to narrow down that, I first thought of checking that what SQL Server is complaining really missing or its just part of the generic error that many of the times bound to occur. And for this I query sys.server_permissions and sys.endpoints tables.
Below is the query that I used:
SELECT EPS.name, SPS.STATE,
CONVERT(nvarchar(38),
SUSER_NAME(SPS.grantor_principal_id))AS [GRANTED BY],
SPS.TYPE AS PERMISSION,
CONVERT(nvarchar(46),SUSER_NAME(SPS.grantee_principal_id))AS[GRANTED TO]
FROM sys.server_permissions SPS , sys.endpoints EPS
WHERE SPS.major_id = EPS.endpoint_id
ORDER BY Permission,[GRANTED BY], [GRANTED TO]
the result of this was little surprising given that the error logs were clearly complaining of the handshake failing although this query was stating all the users and permissions to be desired order which is required for the successful handshake or Authentication. So ideally ,SQL Server should not atleast complain about the "CONNECT" permission on the Endpoints.
However, I remapped the permissions for both Endpoint on Principal and Witness as below:
GRANT CONNECT ON ENDPOINT**::**Mirroring TO [domain\user] --actual user in my environment
I then check back the error logs and saw that error message has now been changed on Witness was changed to same as the error message we had on Partner i.e.
Message
Database Mirroring login attempt failed with error: 'Connection handshake failed. An OS call failed: (8009030c) 0x8009030c(The logon attempt failed). State 67.'. [CLIENT: xxx.xxx.xxx.xx]
This Error message had my attention since beginning however, I wanted to validate the permission on Endpoint first and then jump on this one.* *
*
*Now, as this error message explains itself "An OS call failed: (8009030c) 0x8009030c(Thelogon attempt failed)." What this means is the authentication is failing at the Domain controller level itself (mine was configured through domain account) and SQL Server has not even started authenticated yet (the code 0x8009030c pretty much said the things. You can decode it through MSDN or other sources for windows event logs detailing).
This interest me to check why the authentication for OS calls was failing. Sometimes, It happens because of a reasons that credentials are changed but SQL Server is still looking for the old credentials (perhaps from the OS memory buffer which is why I thought of restarting the ENDPOINTS in the first place itself but that didn't help) or in other words was not aware of the new set of credentials and in order to establish the handshake its sending the old set of credentials which is denied by the Operating system.
To verify this, I went into the Windows Events Logs (security) and figured out a the most expecting thing that the password of the SQL Server Service account was recently updated (damn it....breached all the process and directly updated it in silos.). Perhaps, not done through a change request as the environment wasn't production.
The rest was pretty much a cake walk, recycling the SQL Server service account revealed that the service was using an account with old password as it failed start-up saying that*"Service has failed to start due to a login failure".*
Not only SQL Server but all windows services has this behavior that one need to update the password manually in the service's properties. My request to Microsoft to integrate this to some level (if not fully) perhaps in next releases of windows/Active Directory.
I got the new passwords and update it to the SQL Server Service account and started the services.
the next part was just to recycle the Endpoints as I mentioned above.
This had resolved the mirroring issue at this scenarios and my databases are now synchronized.
Food for Thoughts:
When Mirroring starts complaining about the OS call failure with the error code starting with '0x' the first things to check is if the password has been updated correctly for all the accounts viz. Principal, Mirrored Partner and Witness.
Mirroring never requires all the accounts to be 'SA'* on each server OR*
all the partners should run under the same domain accounts
In fact, this will be the easiest way to configure database mirroring and even a system administrator can configure mirroring with this* *setup.
Database Mirroring can be configured with different service accounts and the minimum permission that these accounts needs to have is "CONNECT" permission on each others' Endpoints that's it.
So, what is your Mirroring Breakdown Scenario. share here and be part of the discussion and get help from the other readers.
Disclaimer: All inputs provided here are my personal opinion and are neither verified nor approved by Microsoft before it is published. All information, and code samples(if any) is provided "AS IS" with no warranties and confers no rights hence carefully test them before directly executing in Production Environment.