Monitoring Active Directory

發行項
12/09/2009

Monitoring the distributed Active Directory service and the services that it relies upon helps maintain consistent directory data and the needed level of service throughout the forest. You can monitor important indicators to discover and resolve minor problems before they develop into potentially lengthy service outages. Most large organizations with many domains or remote physical sites require an automated monitoring system such as Microsoft Operations Manager 2000 (MOM) to monitor important indicators. An automated monitoring system provides the necessary consolidation and timely problem resolution to administer Active Directory successfully.

Benefits for End-Users Benefits for Administrators Risks of not Monitoring Active Directory Levels of Monitoring Active Directory Monitoring During the Deployment Phase Service-Level Baseline Requirements for Monitoring Relationship between Monitoring and Troubleshooting Reports Frequency of Monitoring Tasks Daily Monitoring Tasks Weekly Monitoring Tasks Monthly Monitoring Tasks

Benefits for End-Users

Monitoring Active Directory helps resolve issues in a timely manner, and users experience the following benefits:

Improved reliability of productivity applications that rely on back-end servers, such as e-mail.
Quicker logon time and more reliable resource usage.
Decreased help desk support issues.

Benefits for Administrators

Monitoring Active Directory provides administrators with a centralized view of Active Directory across the entire forest. By monitoring important indicators, administrators can realize the following benefits:

Higher customer satisfaction, because issues can be resolved before users notice problems.
Increased service levels, due to improved reliability and system understanding.
Greater schedule flexibility and ability to prioritize workload, due to early notification of problems, allowing resolution of issues while they are still a lower priority.
Increased ability for the system to cope with periodic service outages.

Monitoring Active Directory also assures administrators that:

All necessary services that support Active Directory are running on each domain controller.
Data is consistent across all domain controllers and end-to-end replication completes in accordance with your service level agreements.
Lightweight Directory Access Protocol (LDAP) queries respond quickly.
Domain controllers do not experience high CPU usage.
The central monitoring console collects all events that can adversely affect Active Directory.

Risks of not Monitoring Active Directory

Systematic monitoring is necessary to ensure consistent service delivery in a large environment with many domain controllers, domains, or physical sites. As a distributed service, Active Directory relies upon many interdependent services distributed across many devices and in many remote locations. As you increase the size of your network to take advantage of the scalability of Active Directory, monitoring becomes more important. It helps you avoid potentially serious problems, including:

Logon failure. Logon failure can occur throughout the domain or forest if a trust relationship or name resolution fails, or if a global catalog server cannot determine universal group membership.
Account lockout. User and service accounts can become locked out if the PDC emulator is unavailable in the domain or replication fails between several domain controllers.
Domain Controller failure. If the drive containing the Ntds.dit file runs out of disk space, the domain controller stops functioning.
Application failure. Applications that are critical to your business, such as Microsoft Exchange or another e-mail application, can fail if address book queries into the directory fail.
Inconsistent directory data. If replication fails for an extended period of time, objects (known as lingering objects and re-animated objects) can be created in the directory and might require extensive diagnosis and time to eliminate.
Account creation failure. A domain controller is unable to create user or computer accounts if it exhausts its supply of relative IDs and the RID master is unavailable.
Security policy failure. If the SYSVOL shared folder does not replicate properly, Group Policy objects and security policies are not properly applied to clients.

Levels of Monitoring

Use a cost-benefit analysis to determine the degree or level of monitoring that you need for your environment. Compare the cost of formalizing a monitoring solution with the costs associated with service outages and the time that is required to diagnose and resolve problems that might occur. The level of monitoring also depends on the size of your organization and your service level needs.

Organizations with few domains and domain controllers, or that do not provide a critical level of service, might only need to periodically check the health of a single domain controller by using the built-in tools provided in Windows 2000 Server.

Larger organizations that have many domains, domain controllers, sites, or that provide a critical service and cannot afford the cost of lost productivity due to a service outage, need to use an enterprise-level monitoring solution such as MOM.

Enterprise-level monitoring solutions use agents or local services to collect the monitoring data and consolidate the results on a central console. Enterprise-level monitoring solutions also take advantage of the physical network topology to reduce network traffic and increase performance. In a complex environment, directory administrators need enterprise-level monitoring to derive meaningful data and to make good decisions and analysis. For more information about MOM, see https://www.microsoft.com/mom/.

Active Directory Monitoring During the Deployment Phase

As a best practice, deploy monitoring with the first domain controller. By integrating monitoring into the design and deployment process, you can avoid many of the problems that arise during deployment. Because monitoring solutions require network connectivity between the monitored servers and the management consoles, you must account for particular TCP/IP ports and bandwidth usage.

As with any sophisticated service, implement a monitoring solution such as MOM in a lab before you deploy it in a production environment.

Service-Level Baseline

A baseline represents service level needs as performance data. By setting thresholds to indicate when the baseline boundaries are exceeded, your monitoring solution can generate alerts to inform the administrator of degraded performance and jeopardized service levels. For example, you can use performance indicators to set a baseline and monitor for low disk space on the disk drives that contain the Active Directory database and log files, and you can monitor CPU usage of a domain controller. You can also monitor critical services running on a domain controller. Monitoring these indicators allows the administrator to ensure adequate performance.

To determine an accurate baseline, monitor and collect data for a time period that is long enough to represent peak and low usage. For example, monitor during the time in the morning when the greatest number of users log on. Monitor for an interval that is long enough to span your password change policy and any month-end or other periodic processing that you perform. Also, collect data when network demands are low to determine this minimal level. Be sure to collect data when your environment is functioning properly. To accurately assess what is acceptable for your environment, remove data caused by network outages or other failures when you establish your baseline.

The baseline that you establish for your environment can change over time as you add new applications, users, hardware, and domain infrastructure to the environment, and as the expectations of users change. Over time, the directory administrator might look for trends and changes that occur, and take actions designed to meet the increased demands on the system and maintain the desired level of service. Such actions might include fine-tuning the software configuration and adding new hardware.

Determining the thresholds when alerts are generated to notify the administrator that the baseline has been exceeded is a delicate balance between providing either too much information or not enough. The vendor of your monitoring solution, such as MOM, can provide general performance thresholds, but you must periodically adjust these thresholds to meet your service level requirements. To adjust these thresholds, first collect and analyze the monitoring data to determine what is acceptable or usual activity for your environment. After you gather a good data sample and consider your service level needs, you can set meaningful thresholds that trigger alerts.

To determine thresholds:

For each performance indicator, collect monitoring data and determine the minimum, maximum and average values.
Analyze the data with respect to your service level needs.
Adjust thresholds to trigger alerts when indicators cross the parameters for acceptable service levels.

As you become more familiar with the monitoring solution you choose, it becomes easier to correlate the thresholds that trigger the alerts to your service level delivery. If you are uncertain, it is usually better to set the thresholds low to view a greater number of alerts. As you understand the alerts you receive and determine why you receive them, you can increase the threshold at which alerts are generated, thereby reducing the amount of information that you receive from your monitoring solution. MOM uses thresholds that are a reasonable starting point and work for the majority of medium-sized customers. Larger organizations might need to increase the thresholds.

Requirements for Monitoring

Managing an enterprise-level directory requires monitoring many important indicators. Failure to monitor all of the important indicators can create gaps in coverage. Use any monitoring solution that best suits your needs, but monitor the necessary important indicators to ensure that all aspects of Active Directory are functioning properly. MOM monitors all of the important indicators.

For more information about monitoring Active Directory see: https://www.microsoft.com/windows2000/technologies/directory/ad/default.asp.

For more information about MOM, see: https://www.microsoft.com/mom/.

For more information about installing MOM, see https://www.microsoft.com/mom/docs/DeployGuide.doc.

Relationship between Monitoring and Troubleshooting

The goal of a comprehensive monitoring solution is to monitor all of the important indicators and provide alerts that are concise, highly relevant, and lead an operator to resolve the problem. Ideally, the monitoring solution alerts the operator only when a problem requires action. In this case, monitoring alerts are the first indicator that a problem exists. If the operator cannot easily resolve the problem that generated an alert, you might want to create a help desk ticket to begin troubleshooting and root-cause analysis. Your monitoring solution can initiate your troubleshooting processes or flowcharts.

Monitoring helps ensure that the Active Directory service is available for service requests. Active Directory is designed to be fault tolerant and can continue to operate if individual servers are unavailable for periodic maintenance or while operators troubleshoot them. You can assure a high-degree of reliability by monitoring the distributed services that make up Active Directory, and resolving issues as they develop.

In addition to providing increased service availability, the relationship between monitoring and troubleshooting increases your understanding of the root causes of most problems that arise. As your environment becomes more reliable, monitoring alerts more precisely indicate the cause of new problems that arise.

Reports

Many important problems do not cause alerts, but they still require periodic attention. Your monitoring solution might generate reports that display data over time and present patterns that indicate problems. Review the reports to resolve issues before they generate alerts.

Frequency of Monitoring Tasks

You can perform the daily, weekly, and monthly tasks as specified in the following tables, but you must adjust the frequency to meet the needs of your particular environment and monitoring solution.

Daily Monitoring Tasks

Table 1.5 Daily Tasks and Their Importance

Tasks	Importance
Verify that all domain controllers are communicating with the central monitoring console or collector.	Communication failure between the domain controller and the monitoring infrastructure prevents you from receiving alerts so you can examine and resolve them.
View and examine all new alerts on each domain controller, resolving them in a timely fashion.	This precaution helps you avoid service outages.
Resolve alerts indicating the following services are not running: FRS, Net Logon, KDC, W32Time, ISMSERV. MOM reports these as Active Directory Essential Services.	Active Directory depends on these services. They must be running on every domain controller.
Resolve alerts indicating SYSVOL is not shared.	Active Directory cannot apply Group Policy unless SYSVOL is shared.
Resolve alerts indicating that the domain controller is not advertising itself.	Domain controllers must register DNS records to be able to respond to LDAP and other service requests.
Resolve alerts indicating time synchronization problems.	The Kerberos authentication protocol requires that time be synchronized between all domain controllers and clients that use it.
Resolve all other alerts in order of severity. If alerts are given error, warning, and information status similar to the event log, resolve alerts marked error first.	The highest priority alerts indicate the most serious risk to your service level..

Weekly Monitoring Tasks

Table 1.6 Weekly Tasks and Their Importance

Tasks	Importance
Review the Time Synchronization Report to detect intermittent problems and resolve time-related alerts.	The Kerberos authentication protocol requires that time be synchronized between all domain controllers and clients that use it.
Review the Authentication Report to help resolve problems generated by computer accounts with expired passwords.	Expired passwords must be reset to allow the computers to authenticate and participate in the domain.
Review the Duplicate Service Principal Name Report to list all security principals that have a service principal name conflict.	User or computer accounts cannot be authenticated or log on if they share an SPN with another account.
Review a report of the top alerts generated by the Active Directory monitoring indicators and resolve those items that occur most frequently.	Report shows alerts that occur most often. Focusing on the top alert generators significantly reduces the number of alerts seen by the operator.
Review the report that lists all trust relationships in the forest and check for obsolete, unintended, or broken trusts.	Authentication between domains or forests requires trust relationships.

Monthly Monitoring Tasks

Table 1.7 Monthly Tasks and Their Importance

Tasks	Importance
Verify that all domain controllers are running with the same service pack and hot fix patches.	Potential issues can arise if distributed services are running with different versions of software.
Review all Active Directory reports and adjust thresholds as needed. Examine each report and determine which reports, data, and alerts are important for your environment and service level agreement.	Examining the data that is relevant to your environment allows you to determine the thresholds that trigger the alerts to your service level delivery.
Review the Replication Monitoring Report to verify that replication throughout the forest occurs within acceptable limits	Timely replication helps assure that you meet your service level agreements.
Review the Active Directory response time reports.	Services must respond quickly for the system to function properly and applications such as e-mail to work properly.
Review the domain controller disk space reports.	The drives containing the Active Directory database and log files must have sufficient free space to accommodate growth and routine processing.
Review all performance-related reports. These reports are called Health Monitoring reports in MOM.	These reports can help you determine the baseline for your environment and adjust thresholds.
Review all performance-related reports for capacity planning purposes to ensure that you have enough capacity for current and expected growth. These reports are called Health Monitoring reports in MOM.	These reports help you track growth trends in your environment and plan for future hardware and software needs.
Adjust performance counter thresholds or disable rules that are not applicable to your environment or that generate irrelevant alerts.	Monitoring indicators must be adjusted to suit your environment. The goal is to provide alerts that are concise, highly relevant, and lead an operator to resolve the problem.

共用方式為