This blog post is intended to provide a detailed guidance around setting up a Patch Management process on Microsoft Azure Cloud.
For all Cloud IaaS deployments, having a Patch Management process is essential. It is as Important as Patch Management process at your on-premises DC
Why we need Patch management?
There are many compelling reasons, like:
- Plugging any security vulnerabilities in the OS or Installed Application Software
- Proactive protection against newer threats and malware
- Fixing existing platform/software bugs
- Performance and stability Improvements
- Addressing known Issues
- Meet Compliance requirements (like SOX)
- And many more…
In this post, we will look at Patch Management for Cloud IaaS deployments, specifically on Microsoft Azure, and for Windows Server based Azure VMs. We will not specifically cover Linux based Azure VMs here, but same base guidance would apply to them equally.
However, what we discuss here would equally apply to any other Cloud IaaS platform like AWS or GCP. Though, we will occasionally reference the traditional on-premises Patch management process, wherever required.
Fundamentally, Cloud IaaS model is a virtualized abstraction of physical Infrastructure. It is built on underlying clusters of physical host servers of various capacities/capabilities. The responsibility of patching these underlying physical host servers rests with the Cloud provider. In case of Azure, Microsoft holds this responsibility.
However, for VMs provisioned on the Cloud IaaS layer, VM maintenance is sole responsibility of the customers. This model of shared responsibility is same across all Cloud providers, like Azure, AWS, GCP etc.
Now let’s focus on Patch Management from Microsoft Azure perspective.
Microsoft does regularly update VM Images they have published in the Azure Marketplace, with latest patches. These Images are thoroughly tested for stability, before being published in the Marketplace. However, Microsoft does not make the frequency/schedule public for updation of these VM Images. Hence, whenever you create a new VM based on an Image from the Azure Marketplace, you would be lucky if you get one which has been just updated with latest patches. That will save you from applying any additional updates (rare chance). In nearly most cases, depending on how far did Microsoft update the Image, you will have to download a larger/smaller delta of the applicable patches.
Given that both Windows Server OS and Azure Platform are Microsoft products, would have been Ideal if Microsoft had a native automated patch management service in Azure.
[Update Start: 28th Jan 2018]
However, Microsoft does not currently have any full-featured standard Azure based native service offering for Patching/Update management. At best what they offer is a revised “Update Management” solution (still in preview as of date of this blog post version) through Azure Automation, which is linked with another external Microsoft service called Microsoft Operations Management Suite (OMS). This new Update Management solution collects Updates related data from all the VMs (Windows/Linux) deployed in Azure and/or On-premises (Hybrid setup) through Microsoft OMS agents installed on those VMs, and pushes that data to OMS. Thereafter, you can use OMS to monitor the Update status of the monitored VMs to see which ones are missing any updates, and push Installation of those missing Updates unilaterally. However, OMS based Update management solution currently misses many critical features/capabilities essential for a good Update/Patch management solution, and is a no-go option for any production IaaS deployments.
[Update End: 28th Jan 2018]
Microsoft still expects customers to either manually do the patch management themselves (using native tools like WSUS, MBSA, PowerShell etc.), or use commercial patch management systems. This strategy does not make things any easier for customers. However, it does Indirectly benefit promoting an ecosystem of ISVs, who build such products to be sold commercially.
You can see my feedback to Microsoft around this concern at the Official Azure User Voice forum here: Azure User Voice. Once I get a response on this feedback, I will update this post with the response.
Organizations considering either migrating their existing on-premises workloads to Azure, or building net new Cloud Infrastructure, will necessarily need to consider having a Cloud Patch management process.
- Orgs already having an existing and mature patch management process at on-premises, would assume that all they need to do is follow the same process on Azure. While that is true to some extent, they will still need to revisit their existing process, and fine-tune it for Azure IaaS model
- Orgs who do not already have an existing or mature Patch management process, can follow guidance in this post to help them establishing one for their Azure IaaS environment.
Let’s look at the following step-wise approach an Organization should consider, for establishing a patch management process on Azure (or any Cloud IaaS for that matter):
- Prepare Patch Inventory
- Perform VM Baselining
- Discover Patch Notification & Repository Channels
- Setup Patch Management System
- Patch Testing & Authorization
- Patch Monitoring
Stage 1: Prepare Patch Inventory
You should first create a Patch Inventory, which should capture following information for your IaaS deployment:
- Identify and list of all patches, past and present, for each VM Server OS versions – You can start with patches applied to the VM Baseline you prepare. See Stage 2 below.
- All patches which failed during testing, and were eventually never applied in production– When? With reason(s)
- All patches which failed during testing, but were later fixed and applied in production – Why/How/When?
- Details of any patch related support Incidents raised with Microsoft PSS or an external Support provider
- Authorization status for each patch – This will come after Patch testing stage
- Production Impact of applying each patch- This will come from Patch testing stage
- Justification for applying the patch in production – This will come from Patch testing stage
- Approvals for applying patch in production – This will come after patch testing stage
Additionally, you should also prepare another related Inventory for production VM’s in your environment, which should capture following information:
- List of all Azure production VMs deployed in the concerned Azure IaaS Solution
- For each production Azure VM:
- Configuration Information like Server OS/version, Software/versions Installed
- Role, function, business and security criticality
- Access/ownership information
- All patches applied to the VM in chronological order – Since VM provisioning till current date
- For each patch successfully applied – Testing date and outcome status
- Any patch rollbacks performed – Why/How/When?
- All rollbacks performed, due to Issues arising from failed/rogue patches applied – Why/How/When?
- Known security Issues, and newly discovered ones
- Change tracking/history for any changes on Security levels
These Inventory Items should be regularly updated on a predefined frequency, which will depend on the patching cycle you may want to follow. Inputs for this Inventory will also come from later stages in the Patch management process, like from Patch Testing stage.
The above listed Inventory data points are not absolutely exhaustive, but should give you a fair Idea on what levle of Inventory you must have, before embarking on Incorporating a patch management process on Azure.
Stage 2 – Perform VM Baselining
Baselining VMs refers to building an initial stable configuration of the VMs, established at a specific point-in-time. This means that the VM Server OS, Application Software(s) Installed within, and any Initial configurations done on either of these, are thoroughly tested, found stable, and standardized for being used as a base VM configuration. Baselining VMs enables us to reliably restore them from any future state to a previously stable state, and helps probing/rectifying any potential problems with a later version. It also helps to minimize amount of patches/updates we need to deploy on the VMs as well as gives us an ability to monitor compliance at a granular level.
For baselining Azure VMs, you should consider following high-level process:
- Group Azure VMs in your Azure IaaS deployment, into different Asset categories
- Prepare and maintain standard VM baselines for each category, which should have similar Azure VM Server OS/version, Application Software/version, and patches
- You could either have a single VM baseline for all Asset categories in your deployment, or have different VM baselines for each Asset category
- Whether you need a single or multiple VM baselines primarily depends on the differences between VM and Application Software configuration across different Asset categories, and how certain patches affect different baselines differently
- Prioritize distribution of patches to Azure VMs on the basis of Asset categories
Stage 3 – Discover Patch Notifications and Repository Channels
Next, you would need to discover and setup channels for getting regularly notified on new patches for the VM Server OS/version and Application Software(s)/version installed within.
You will also need a remote repository source/mechanism to download these patches on an Update server (where they will be tested first against VM Asset categories), through an automated mechanism preferably.
For the Windows Server OS running on Azure VM, and any other Microsoft Application Software(s) Installed within, you can get regular notifications through Microsoft Security Bulletin Service from Microsoft Security Response Center (MSRC). You can then automatically trigger download of these patches/updates through existing native services/tools (like WSUS, MBSA etc.)
However, for non-Microsoft Application Software(s) installed in the Azure VM, this will vary greatly, and will depend on existing update notification channel for those Software vendors (if they exist, what frequency they operate on, and in what form) as well as downloading mechanism.
Stage 4 – Setup Patch Management System
After you have discovered and setup patch notification and repository channels, next step would be to look at setting up a patch management system.
Before you move forward on selecting a patch management system, you should:
- Determine one or more locations (a.k.a Update Servers), where the patches would be downloaded for further distribution. For Azure IaaS environment, you could either have these Update Server(s) located on Azure itself, or on-premises (In case of a Hybrid setup), or at both places. You will need to carefully decide on where these Update Server(s) should be for your specific scenario, and will depend heavily on your current Infrastructure architecture. Some common scenarios are depicted below:
- Cloud only scenario: If your entire Infrastructure is on Azure, you will obviously decide to have the Update Server(s) on Azure itself. If your deployment is spread across Azure regions/subscriptions, better Idea would have an Update Server for each region/subscription combine
- Hybrid Cloud scenario: If you have majority of your servers (>50%) on-premises, but relatively fewer servers on Azure (>10%), or vice-versa, you should consider having an Update Server both on-premises, and on Azure. If you have very minimal number of servers (<10%) at either location, compared to the other location having majority of the servers (>=90%), you are better off having an Update Server only at the location with majority of the severs and distributing the patches/updates to the location with minimal servers
- Remember this – If you have Update Server(s) on-premises, and would be pushing patches/update to Azure, or vice-versa, you will generate considerable traffic between boundaries, leading to reliability, latency, and cost Implications
- Ensure that you maintain patch Inventory for production based on stable criteria), and for pre-production environments as given in stage 1 above – This will simplify the overall patch management process
There are a number of tools/solutions available for Patch management, few from Microsoft, and several from commercial vendors. Some of these tools/solutions support only Windows Server OS, and others also support Linux Server OS. You could use either of these tools/solutions for your Azure IaaS environment. However, your choice will depend on factors like Implementation efforts, time, cost of deployment, licensing, support options etc.
Few such popular tools/solutions are listed below:
- Microsoft Baseline Security Analyzer and WSUS – Free
- System Center Configuration Manager (SCCM) – Paid
- Microsoft OMS – Paid
- SolarWinds Patch Manager – Paid
- Shavlik Protect + Empower, and Shavlik Patch – Paid
- LANDesk Patch Manager – Paid
- GFI LanGuard – Paid
- PDQ Deploy Pro – Paid
Some of these tools offer limited support for few stages detailed in this post, but none of them supports the whole defined process end-to-end.
Stage 5 – Patch Testing & Authorization
You need to establish a mandatory Patch Testing process as part of the overall Patch Management process. Let us look why.
Imagine a scenario, where you apply a new patch on one or more VMs in your Azure IaaS environment. You then discover that suddenly one or many things stopped working. Maybe you are unable to RDP into the VMs, or Installed Application starts misbehaving, or host of other problems surface. These are some of the many common Issues, which frequently occur when you don’t test patches before applying them in production VMs.
Testing any patches, before applying on production Azure VMs is always deemed a mandatory step you will need to rigorously follow. Not doing so may lead to very serious Implications for your deployment.
- You should NEVER consider applying any patches directly to the Azure VMs in your Production environments. It is a BIG RISK, any whichever way you look at it.
- You should first test patches on Azure VMs in a test (Pre-Production/Staging) Infrastructure environment on Azure, with corresponding equivalent configuration/roles of the Production Environment Azure VMs. You might ask here on the need for requiring exact configuration/role Azure VMs are in the test environment. This should be so that you don’t get unpredictable outcomes from applying patches on different VM configurations.
However, misses do happen in real-life, and few untested patches may very well make their way to production Azure VMs. Also, If the testing process is not thorough, problematic patches can easily escape undetected to production, causing Issues.
When untested patches make their way to production environment, they may fail and also break current configuration/operations of the VMs. Your patch management process should have the ability to rollback and restore those Azure VMs to an earlier restore point. Not being able to do so can seriously compromise the intended functioning of the concerned VMs.
For VM rollbacks to be possible, you need to be already performing regular backups of your Azure VMs. Couple options for taking backup are through Azure Recovery Services Vault, and through System Center DPM.
All patch testing activity should be recorded in a separate testing repository, and should reference/record against the existing Patch Inventory from Stage 1.
Depending upon if a patch passed or failed during testing, an authorization status should be assigned to it in the Patch Inventory. This authorization status will determine if a patch is ready to be applied to the target VMs (or VM Asset categories), or needs to be deferred for future testing, or rejected.
After successful authorization of each patch, you also need to assess and record the Impact it will have, when applied to an Individual VM or a VM Asset category in your deployment. The possible impacts could be like forced downtime, dependency on other patches/components, order of applying etc.
As a final step, each patch will need to undergo an approval process, based on justification you give on why is it Important to be applied to the production servers. This Information will also get captured in the Patch inventory.
Stage 6 – Patch Monitoring
Once you have the Patch testing process setup, you will then need to setup a Patch monitoring process. Here you will need to regularly probe all your Azure VMs to identify the following:
- Missing Updates
- Installed Updates
- Failed Updates
- Incomplete Updates
Once you are able to get above Information, you will need to compare that against the list of authorized/approved patches in Patch Inventory. This way you will be able to find out which patches need to be applied/reapplied, where, when, and in what order. Thereafter, you can schedule for their manual/automated deployment accordingly.
Additionally, you should consider performing following activities on a schedule as a part of patch monitoring:
- Perform regular Audit for Installed vs Authorized Updates for your Azure VMs
- Regularly track your patch inventory, and update Installation status/progress for all patches on Azure VMs in your deployment.
Intent of this post was to give you good understanding on how to plan for Incorporating patching management for your Azure deployments.
Hope you enjoy reading this post. I would really appreciate any feedback/thoughts/comments/questions you may have, which you can communicate through comments below or direct mail.
I was asked an interesting question from a reader today, after he read this post.
His question was:
“Why don’t we enable auto-update on all Cloud/Azure VMs, and let them update themselves whenever the need be? Windows already has this mechanism of Auto Updates, and same can be scheduled similarly on Linux too. If any updates fail, we can always restore from the Backups, isn’t it?”
“Never should we allow auto-updates to happen on Windows or Linux servers in production, whether on-premises or on Cloud. If we do, we expose our production deployment to a Huge Risk as anytime an update related failure may occur, rendering our production environment unusable. This practice of disallowing auto-updates is mandatorily followed by most Orgs across the world, for both their on-premises and Cloud deployments. You could maybe enable auto-updates in a dev/test environment, because there is minimal Impact there.
Furthermore, all good Infrastructure deployments in the Cloud or on-premises, will either never give VMs direct access to Internet, or only give restricted access secured behind proxies/bastions/WAF’s. So enabling auto-updates over Internet would anyways be not available”