Hello
It looks like you're dealing with a time sync issue where the system clock on your Windows Server 2019 cluster nodes (and possibly the Domain Controller) is resetting to an incorrect date and time (e.g., March 5, 1906) for short periods. This issue can cause data integrity problems, especially with database systems like PostgreSQL.
Based on the event logs you provided and the context, here are a few potential causes and troubleshooting steps to help resolve this issue:
- Time Synchronization and NTP Configuration
The most likely issue seems to be with the NTP synchronization. You mentioned that the Domain Controller is set to use NTP time, but the cluster nodes are receiving incorrect time. It's important to confirm that both the Domain Controller and the cluster nodes are using a reliable and stable time source. Here are some things to check:
Check Time Configuration on the Domain Controller:
Ensure the Domain Controller is properly synchronized with a reliable external NTP server.
Run w32tm /query /status on the DC to verify the current time source.
Ensure the DC is syncing with a valid NTP server and not with a local machine or a misconfigured source.
Use a public NTP server or a hardware time source if possible (e.g., time.windows.com or pool.ntp.org).
Check NTP on the Cluster Nodes:
Ensure that the cluster nodes are correctly configured to sync with the Domain Controller as their NTP source.
Run w32tm /query /source on the cluster nodes to verify that the NTP source is set to the Domain Controller.
You can also force the NTP sync by running w32tm /resync on the cluster nodes and checking if the time corrects itself.
If the nodes are not syncing correctly, you can try to reconfigure the NTP settings on the nodes:
w32tm /config /manualpeerlist:"<DC_IP_or_NTP_Server>" /syncfromflags:manual /reliable:YES /update
- W32Time Service
The W32Time service (Windows Time service) might be having issues, especially in a cluster environment where time synchronization is critical. The Event ID 261 indicates that the system is being forced to reset the time.
Reset the Windows Time Service:
Try restarting the Windows Time service on both the Domain Controller and the cluster nodes:
net stop w32time
net start w32time
You can also configure the service to be more reliable by setting it as "reliable":
w32tm /config /manualpeerlist:"<your NTP server>" /syncfromflags:manual /reliable:YES /update
After resetting, monitor the event logs to see if the issue continues.
- Clock Drift / Hardware Issues
Since you mentioned that the nodes have been up for almost 3 years, it's possible that hardware-related issues (e.g., clock drift) might be contributing to the problem. You should check for:
System clock issues: Ensure that the system's hardware clock (CMOS) is functioning correctly. Sometimes, a failing CMOS battery can cause time resets.
To check the hardware clock:
w32tm /stripchart /computer:<your_server_IP> /samples:5
- PostgreSQL and Time Handling
It's important to ensure that your PostgreSQL server isn't handling time incorrectly due to these changes in the system time. If the system clock is jumping, PostgreSQL might store timestamps in an unexpected manner, which could explain the weird values (like "March 5th, 1906").
Check PostgreSQL configuration: Ensure that PostgreSQL is using UTC for time handling to avoid confusion with time zones.
Verify that timestamps are stored correctly in the database and ensure no "time zone drift" is happening due to time inconsistencies.
- Event ID 259 and 261 Analysis
The logs you’ve shared show that Event ID 259 is being logged, which is a periodic NTP sync log entry. The system is confirming it’s receiving time data from NTP servers (in your case, 10.255.2.1), but then Event ID 261 shows a drastic time change to a previous date (1906-03-06T01:56:38.759Z), which could suggest some sort of synchronization issue or a service trying to revert to an incorrect time source.
Check for time service misconfiguration: If there’s a time source conflict or misconfiguration in the NTP client setup, the system may attempt to adjust the time based on incorrect or outdated information.
- Cluster Nodes & Long Uptime
Since the cluster nodes have been up for almost 3 years, you might be facing some cumulative issues related to uptime, including:
Software bugs that could cause time synchronization issues in long-running clusters.
Memory issues or kernel bugs related to long-running services.
I hope the above information is helpful to you.
Best regards
Runjie Zhai