Share via


Windows Server Patching: Best Practices

As we all know applying security/cumulative updated in the Windows environment is very important to secure the environment from external attack. It also helps to fix identified bugs in the previous version and improve stability and performance.

Writing this article,  because we have seen many administrators struggle to accomplish patching activity into own organization or in a customer environment. Most of the struggle is not because of technical challenges, but it’s due to operational challenges, for example, server downtime, scheduling, change management etc.

I am taking this opportunity to share the best practices which I have followed while performing Windows Patching. This may help to organize/structure the patching activity in your organization.

Audience

Windows Server Administrators and the people who follow server patching via any patching tool available in the market.

Note

This article covers for the Windows Server environment and applicable for Operating System patching  - like Windows Server 2008, 2012, 2016, this will not cover for installed Microsoft or any other application on servers - like Microsoft Exchange, SharePoint, SQL etc. For application patching separate test needs to be carried by application specialist before deployment to production environment.

Downtime

Downtime is the key factor in entire patching activity. Many of us are experiencing issues while getting downtime for business.  Every organization handles it differently, based on business approval. Few of the methods which can be used mentioned below.!

By following one of this method you can reduce the efforts which we add for getting downtime for business.

  1. Follow the standard maintenance window. There are a couple of organizations which follow the standard maintenance window of the environment. We can utilize the same window for patching. Example – Standard maintenance window for DEV and UAT environment during weekday post business hours (10 PM – 3 AM) and weekend for production and DR environment. Servers can be restarted within the provided Standard maintenance window.
  2. Or Standard four hours patching window, which is only approved to perform patching activity. This window should be agreed on by all business units. For example – Second Friday (DEV), Second Saturday (UAT), Third Saturday (Production) and Fourth Saturday ( DR).
  3. Or Another example – First week after second Tuesday à (Week 1) DEV, (Week 2) UAT, (Week 3 ) Production, (Week 4) DR (DR week can be the first week of the upcoming month).

Scheduling

As mentioned above, getting downtime is always a challenging task for all administrators. There are a couple of points, that we need to consider while preparing a schedule to line up it with agreed downtime.

Bellow scenario is describing based on the downtime method (3) mentioned in the above section of “Server Downtime”.

  1. It’s recommended to perform Windows patching on a monthly basis, not by quarterly.
  2. List out the Servers which are in scope for patching. If your organization has segregated environment like DEV/UAT/Production/DR, then prepare the schedule starting with DEV than UAT, Production, and DR. Using this schedule you can patch the servers within four weeks of time span. If you don’t have any tool to prepare schedules, then you can prepare It in excel sheet and share It will all the stakeholders/Servers, Application Owners via email notification. This notification is important because it reminds them about the upcoming patching activity and accordingly they can do pre-work on the application front if needed.  This also helps, If they want to exclude the server from pathing due to scheduled application release. (It’s  not recommended to exclude servers from patching unless it’s really a valid/business need). Even If you have excluded the server, make sure you will take next agreed downtime window from application team to cover patching activity.
  3. Microsoft will release the patches on the second Tuesday of every month, post that you can identify the patch and get the Security Team/CISO approval (Security Team/CISO approval process may vary based on the organization). After approval, we have to perform Initial testing on all the versions of Windows OS (Windows Server 2008, Windows Server 2012, Windows Server 2016). This will clarify us whether the OS is booting and coming up without any issue, MMC snap-in is working as expected, no error reported under Windows event logs, all Windows automated services are running, server utilization/performance is normal, etc.
  4. As shown in below schedule, from Second Saturday you can start your first week of patching, which will cover DEV server’s patching, the second week for UAT, Third for Production and fourth week for DR servers. The Individual application team needs to carry out application-level testing post completion of DEV/UAT patching before proceeding to patch on Production and DR environment. This will help to avoid production impact.
  5. It’s also administrators responsibility to notify all stakeholders/Servers, Application Owners via email, post completion of patching activity so that they can carry further application level testing to make sure hosted applications on the server are working as expected.

Change Management

Change management is also one of the important factors in patching. This gives awareness about the upcoming changes in the environment and also help from an audit point of view. Every organization will have defined process based on business needs. It's recommended using Standard Change Template since patching activity is one of the mandatory activities which will be performed on a monthly basis. Using Standard Template we minimize the change initiator work of drafting the Change Description/Change Task etc.

Compliance and Reporting

It’s very important to carry out compliance check post completion of patching. Measuring the implanted work is always beneficial to the organization from the security audit point of view.

It’s recommended to perform patching compliance imitated post completion of patching. For example – if you have four hours of downtime, then perform the patching compliance scan on second of third hours so that you can re-patch the servers within the same downtime under approved change. If you missed checking compliance within the same downtime window, then you may need to request for new downtime for business and also need to raise a separate change ticket.

If your compliance mechanism is giving compliance data after 24/48 hours, then its recommended patching missing servers in upcoming downtime windows. Do not keep a backlog for a longer time. This impact on the overall compliance by end of month cycle.

Additional Notes

  1. Make sure you are performing daily health check for the patching tool agent (The agent will depend on patching tool which you are using example Microsoft SCCM, HPSA etc.). All agents should be reported as healthy. The agent who are not healthy should remediate them immediately. If the agent is not healthy, it may fail to patch the server and it will impact on patching compliance.
  2. If any issue encountered on the application during DEV/UAT testing, then make sure to exclude production and DR servers from patching until issue fixed on DEV/UAT
  3. If we are uninstalling the patches due to the reported issue, then make sure the application team will consult with App vendor for solution and compatibility. Because we can’t keep servers without patching for a longer duration. 

Microsoft zero days/Out-of-band patches

Microsoft zero days/Out-of-band patches can be deployed once the risk assessment is done by the internal security team. Microsoft recommends deploying OOB patches as soon as possible to avoid the external attack.

If the security team confirms to deploy the patches within the next 48 hours, then we have to define the scope by identifying servers running with an impacted software/product under the venerability. For example, If the vulnerability is identified in Internet Explorer 9, then we have to identify how many servers in the environment are running with IE9. Data can be fetched by the compliance tool which you are using in your environment. If you are using Microsoft SCCM , then you can create a custom report with a custom query to fetch this data. If you don’t have any tool, then you have to use any scripting method, the last option is a manual method, but fetching this information manually will be a tedious job if you have more servers.

Assume after assessment, you have 100 servers running with IE9 out of 4000 servers. In this case, you have to plan to patch these 100 servers on priority. Since the timeline is short, you may need to notify/contact server owner/application owner to take explicit approval for a server reboot. After the approval servers can be patched and reboot post business hours to minimize the business impact. If the standard changed management is not helping to fulfill the change management requirement, then you may need to go with an emergency change request.

Apart from these impacted 100 servers, the rest of the servers you can patch as per your standard patching schedule.

Sometimes installed antivirus software can mitigate the vulnerability, In this situation, you have to take a call with the security team. As far as installed antivirus is securing your environment,  you can patch the servers in regular patching schedule. Make sure you have confirmation from antivirus vendor about security coverage.

Check the compliance status post completion of patching.

Windows Fail-over Cluster patching

You can use the below method to patch Windows Failover Cluster unless you are using Cluster Aware Updated feature for Windows 2012.

Consider you have two node windows Fail-over cluster running File Server Role.

  1. Move all the running resources from Node1 to Node2.
  2. Make sure after moving resources to Node2, all are online and all the shares are accessible.
  3. Install patches on Node1, restart Node1.
  4. Move all the resources from Node2 to Node1. Make sure they are online and all the shares are accessible.
  5.  Install patches on Node2, restart Node2.
  6. Re-balance all the resources on their preferred cluster node. Check cluster log to make sure everything is green.

Microsoft recommends running all the cluster nodes on the same patch level.

Patching and restart you can automate If you are going to take care of pre-work of resources movement before Patch deployment schedule.

Hope this article will be helpful for you. Your comments and feedback is important

Note

Suggesting you consult with your senior staff/Technical Lead/Technical Manager before you follow any of the approaches, the above best practices are shared based on my experience.

 

Credits

Originally this article is written on my blog http://arbanwintech.blogspot.in/2017/09/windows-servers-patching-best-practices.html