How to Troubleshoot Local Continuous Replication Issues
Microsoft Exchange Server 2007 will reach end of support on April 11, 2017. To stay supported, you will need to upgrade. For more information, see Resources to help you upgrade your Office 2007 servers and clients.
Applies to: Exchange Server 2007, Exchange Server 2007 SP1, Exchange Server 2007 SP2, Exchange Server 2007 SP3
This topic discusses how to troubleshoot issues that you may experience when running Microsoft Exchange Server 2007 in an LCR environment. The procedures in this topic address the following issues:
The Get-StorageGroupCopyStatus cmdlet reports that the database has failed and is not seeded.
The Get-StorageGroupCopyStatus cmdlet reports that the database has failed. The FailedMessage value provides specific information about the source of the failure.
Alerts, performance counters, or the Get-StorageGroupCopyStatus cmdlet indicate that copy or replay queues are backed up for a storage group copy.
The Get-StorageGroupCopyStatus cmdlet reports a stale time for the LastInspectedLogTime value.
Seeding is failing.
The Restore-StorageGroupCopy cmdlet on LCR reports Exx.log was not available.
When issues occur other than those listed here, look at the event log to determine the cause and the potential course of action that must be taken to recover. When the time of the failure is identified, other event logs may help you better understand the problem. For more information about tools that may assist in troubleshooting LCR issues, see Tools for Troubleshooting Issues with High Availability Deployments.
Before You Begin
To perform these procedures, the account you use must be delegated the Exchange Server Administrator role and local Administrators group for the target server. For more information about permissions, delegating roles, and the rights that are required to administer Exchange 2007, see Permission Considerations.
Procedure
Get-StorageGroupCopyStatus cmdlet reports that the database has failed and is not seeded
Possible causes A configuration problem, or the passive copy does not have a valid baseline database. This issue could also be caused by not enabling the storage group on the local computer.
Resolution Do the following:
Verify that storage for the copy is correctly configured and operational. If you find an error, you can trigger a new check of the copy by suspending and resuming the storage group.
Verify that the LCR copy's paths are correctly configured. You can do this by using the Get-StorageGroup cmdlet in the Exchange Management Shell. For more information about using the Get-StorageGroup cmdlet to view configuration information, see How to View Local Continuous Replication Configuration Settings.
Use the Update-StorageGroupCopy cmdlet to seed the storage group copy.
Get-StorageGroupCopyStatus cmdlet reports that the database has failed, and the FailedMessage value provides specific information about the source of the failure
Possible causes Many potential causes could result in a passive copy being determined as failed. The FailedMessage value specifically identifies the detected problem.
Resolution You can run the Get-StorageGroupCopyStatus cmdlet with the fl command to obtain the complete FailedMessage value. This string identifies the specific problem that was detected. If the reported condition is a corrupted or missing log, try to find a log that is not corrupted with the correct generation number. If the correct log cannot be found, use the Update-StorageGroupCopy cmdlet to reseed. If the message implies the logs on the source are not available, remove the share on the source's log directory and restart the Microsoft Exchange Replication service on the computer. Analyze the information provided by the FailedMessage value, and then resolve the identified condition.
Alerts, performance counters, or the Get-StorageGroupCopyStatus cmdlet indicate that copy or replay queues are building up for a passive copy
Possible causes A backlog of log copying or replay activity could indicate either a problem or a transitional condition in a recovery process. A transitional condition occurs when a passive copy is recently resumed after it has been suspended for a significant period of time. If the condition is not transitional, the issue could be caused by one of the following:
Configuration issue exists.
Replication activity is suspended.
Microsoft Exchange Replication service is stopped.
Storage has failed or is offline.
Resolution Determine whether there is an actual problem or a transitional condition by doing the following:
Verify that the Microsoft Exchange Replication service is running. You can do this by using the Services snap-in. If this service is stopped, you must start it.
Run the Exchange Management Shell cmdlet Get-StorageGroupCopyStatus with the fl command, and then determine if the passive copy is suspended. If it is suspended, verify that the files of the passive copy are correctly present, and then resume the passive copy by using the Resume-StorageGroupCopy cmdlet.
Run the Exchange Management Shell cmdlet Get-StorageGroupCopyStatus with the fl option and determine if the copy is healthy. If the copy has failed, review the list of status fields to determine the corrective action that is necessary.
Watch the replication performance counters over a several minute period to determine if progress is being made. Specifically, look at the replay generation number and the inspection generation number. If the copy queue length keeps increasing, but the replay queue length is short or decreasing, there may be an issue with the network file share on the active copy or the active server itself. Verify that the active storage group copy's log directory has a network file share defined on it by using the GUID of the storage group. You can determine the GUID of the storage group by using the Get-StorageGroupCopyStatus cmdlet with the fl option in the Exchange Management Shell.
Get-StorageGroupCopyStatus reports a stale time for LastInspectedLogTime
Possible causes There are three possible causes of this issue:
The active copy's database is dismounted.
The active copy is mounted, but it is not changing at a significant rate. Therefore, logs are not being produced by the active copy.
The Microsoft Exchange Replication service is not running.
Resolution Determine which of the three causes is occurring. You can make this determination by doing the following:
Determine if the database is dismounted by using the Exchange Management Console or by running the Get-StorageGroupStatus cmdlet in the Exchange Management Shell. If it is dismounted, it must be mounted and a new log file generation sequence must be created before the LastInspectedLogTime will change.
Verify that the Microsoft Exchange Replication service is running. If this service is stopped, you must start it.
After verifying that the database is mounted, check whether the database is generating logs. Look in the active database's log directory and identify the log file with the highest generation number. Check the time stamp on that log. It should match the LastInspectedLogTime.
Seeding is failing
Possible causes A backup is in progress on the active copy or a communication issue exists.
Resolution Verify that a backup of the affected storage group or database is not in progress.
Restore-StorageGroupCopy cmdlet reports Exx.log is not available
Possible causes The Restore-StorageGroupCopy cmdlet prompts you to determine if it should continue with a missing Exx.log.
Resolution If you are expecting the activation to produce a database that has lost no data, respond No to the prompt. If Exx.log is not available at the time of the Restore-StorageGroupCopy cmdlet operation, the recovery will be lossy. When you do respond No, you must resolve any issues that are preventing access to the production logs. When those issues are corrected, you can run the Restore-StorageGroupCopy cmdlet again.
For More Information
For more information about the Exchange Management Shell cmdlets mentioned in this topic, see the following topics: