SCOM Agent problem

Alex_T 20 Reputation points
2024-11-26T12:49:44.5666667+00:00

We have a SCOM server for monitoring local domain infrastructure. One time one of the agents went gray and in "Agent Health State" we have Errors like:

The System Center Management Health Service has stopped on a computer.

This alert is generated by a Heath Service Watcher. This object is run on the All Management Servers Resource Pool and monitors the health of all System Center Management Health Services in a Management Group. When a System Center Management Health Service fails to heartbeat, as set of Diagnostics are run and Recoveries are then executed to attempt to fix this problem with the remote agent.

Causes

This can happen when:

• The System Center Management Health Service has been stopped.

• The System Center Management Health Service failed to start up correctly.

• The System Center Management Health Service has been set to Manual/Disabled and the machine was rebooted.

Also, on SCOM server, no any errors in Event log about this Client VM.

On Client VM we have Error - ID 20070

The OpsMgr Connector connected to SCOM.mydomain.com, but the connection was closed immediately after authentication occurred. The most likely cause of this error is that the agent is not authorized to communicate with the server, or the server has not received configuration. Check the event log on the server for the presence of 20000 events, indicating that agents which are not approved are attempting to connect.

SCOM and Client VM is Windows 2019 Server, Firewall is off, Antivirus - onboarded Defender.

Other VMs with the same configuration are green and work.

What I have done:

  1. checked the "Health Service" service on Client VM, restart it
  2. checked ports from SCOM to Client VM - 135/TCP, 137/UDP, 138/UDP, 139/TCP, 445/TCP, RPC/DCOM High Ports 49152-65535/TCP, ICMP - opened
  3. From Client VM to SCOM - port 5723 - opened
  4. Tried to repair agent from SCOM, but it didn't work at all, repair pending state was 2 days.
  5. Deleted and reinstated agent from SCOM (approved this VM from SCOM for connection after each installation)
  6. Deleted and reinstated agent manually with log (installation was successfully completed)
  7. checked DNS suffix on Client VM

Any ideas how it can be fixed?

Windows Server 2019
Windows Server 2019
A Microsoft server operating system that supports enterprise-level management updated to data storage.
3,848 questions
Operations Manager
Operations Manager
A family of System Center products that provide infrastructure monitoring, help ensure the predictable performance and availability of vital applications, and offer comprehensive monitoring for datacenters and cloud, both private and public.
1,515 questions
0 comments No comments
{count} votes

Accepted answer
  1. XinGuo-MSFT 20,156 Reputation points
    2024-11-27T02:41:33.24+00:00

    Hi,

    It sounds like you've already tried many of the common troubleshooting steps for resolving SCOM agent connectivity issues.

    Here are a few additional steps you can consider:

    • Stop the HealthService on the client VM.
    • Delete the Health Service State folder located at C:\Program Files\Microsoft Monitoring Agent\Agent\Health Service State.
    • Restart the HealthService.
    • Uninstall the SCOM agent, remove all related folders, and then reinstall the agent. Make sure to approve the agent in the SCOM console after installation
    • Consider disjoining the client from the domain and then rejoining it. This can sometimes resolve connectivity and authorization issues with the SCOM agent. This process can help reset any domain-related issues that might be affecting the SCOM agent's connectivity.
    1 person found this answer helpful.

2 additional answers

Sort by: Most helpful
  1. Jing Zhou 7,675 Reputation points Microsoft Vendor
    2024-11-28T07:41:56.8933333+00:00

    Hello,

     

    Thank you for posting in Q&A forum.

    To further troubleshoot this issue, please kindly try below steps:

    1.Please check "Approvaed Computers" under SCOM console and ensure that agent is properly authorized in SCOM and approved to communicate with the server.

    2.Please re-approve the agent and check if issue is mitigated.

    3.Verify that the Health Service on the client VM is configured correctly.

    4.Check if there's any firewall or antivirus that blocking the communication.

     

    I hope the information above is helpful.

    If you have any questions or concerns, please feel free to let us know.

     

    Best regards,

    Jill Zhou

     


    If the Answer is helpful, please click "Accept Answer" and upvote it.


  2. Alex_T 20 Reputation points
    2024-12-03T09:34:46.9433333+00:00

    Problem solved. The server properties did not contain the full FQDN name, despite the parameters from DHCP. I added primary FQDN suffix to registry - "NV Domain" in Computer\HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters

    After restarting, all start work properly.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.