Why is Patch Management Important?
Patch Management, or rather, the lack thereof, is a problem that plagues many organizations, leading to vast numbers of security breaches every single day. In fact, depending on which survey you’re looking at, anywhere from 27% to 34% of all cyber security breaches are the result of missing patches for known vulnerabilities. The tragedy is that these breaches could have otherwise been prevented. My hope for this article is that it will give you a clear understanding of what patches are, what successful Patch Management might look like, examples of high-visibility failures in the industry due to poor Patch Management, and recommendations for improving your own organization’s approach.
What Exactly Do We Mean by Patches and Patch Management?
In IT, we talk about patches all the time. But what exactly is a patch?
In simple terms, a patch is a piece of code designed to fix a problem identified in an existing codebase product. A patch might be referred to as a “bug-fix” if you’re in software development.
A patch can also be update. When Microsoft rolls out new updates to your computer at home on a Tuesday night, those updates are patches. However, updates, as with any other security control or mechanism, can sometimes cause more harm than good. An example would be when a computer gets stuck in an infinite boot cycle while trying to apply updates and has to then roll them back due to failure. Patches can also cause errors and performance issues. For instance, say an update modified the print spooler resulting in the spooler hogging 100% of CPU resources. In this example, the update created major performance degradation. Believe me, that’s not a fun time.
Patch Management is the approach and resulting processes your organization uses to manage how and when you update your IT products and how you mitigate the risks of updates causing unforeseen issues. Each organization’s approach will be a little bit different, but a successful Patch Management process, in support of an enterprise Patch Management strategy, could look something like the workflow pictured here:
What Happens When We Don’t Have a Great Patch Management Practice In Place?
Let’s take a look at a couple famous examples of what happens when your Patch Management process doesn’t keep up with organizational needs and cyber risks – or worse, doesn’t exist in the first place.
Equifax
In 2017, private information held at Equifax, one of the world’s largest credit reporting companies, was breached. It wasn’t until several months later that Equifax notified the public of the breach. By the time of the announcement, the damage to Equifax’s reputation, and the confidentiality of customer data, had already been done.
Back in March of that year, Apache Software Foundation had published an advisory regarding a Remote-Code-Execution (RCE) vulnerability in its Apache Struts software. These types of vulnerabilities are nasty because they allow an attacker to execute code on vulnerable systems remotely. The advisory included a link to a patch or software update that could be applied to remediate the vulnerability. Unfortunately, this vulnerability went unmitigated for weeks, at which point Chinese hackers were able to successfully execute code remotely and install backdoors into the Equifax network, allowing them to pivot their access laterally to a web server where they harvested credentials to move vertically within the network. Eventually, the hackers were able to access a SQL database where they captured personal information of more than 140 million Americans (as shown below).
In the aftermath of this major breach, Equifax saw several members of their executive team depart. They faced fines from the Federal Trade Commission, and individual members of the executive team faced charges from the Securities and Exchange Commission for insider trading after dumping their shares in company stock before Equifax formally announced the breach. It was a public relations disaster, and the last thing any organization wants to have to deal with.
All of this happened because of an unpatched server.
WannaCry
Also in 2017 (apparently a big year for failures in Patch Management), the news began reporting that a major ransomware attack was “worming” its way through the UK’s National Health System (NHS). A series of events (again, caused by issues in Patch Management) were rapidly unfolding and lead to one of the world’s worst Ransomware attacks ever, this time with a real-world, physical impact.
An unidentified (but suspected Russia-affiliated) hacking group known as the Shadow Brokers leaked a number of vulnerabilities and exploits they had stolen from the NSA. The NSA warned Microsoft of one such vulnerability they learned had been stolen in March of that year, leading Microsoft to release a patch for it the same month. However, in April, the Shadow Brokers released EternalBlue, the very same vulnerability exploit that Microsoft had released a patch for a month earlier. In May, WannaCry began spreading like wildfire, making use of the EternalBlue exploit that had been released a month earlier and, for many organizations, could and should have been patched two months prior.
The result? WannaCry caused total work-stoppages in hospitals and clinics throughout the UK, preventing administrative staff from scheduling appointments, ordering supplies, and conducting other key functions. The attack quickly rendered more than 200,000 computers useless in 150 countries while IT departments scrambled to prevent further spread and recover encrypted data. This devastating attack could have been prevented if the NHS and other affected organizations had applied a strong Patch Management process.
How Do We Start Improving Our Own Patch Management Process?
To implement a successful Patch Management program, you need to not only be disciplined about applying patches, but your organization also needs to know what IT assets you have. Without that knowledge, it will be difficult – if not impossible – to know what needs to be patched. This article provides a good explainer on IT Asset and Service Configuration Management best practices. The sections below will further hone in on areas your organization can develop to strengthen your Patch Management strategy by effectively managing assets and changes, establishing reporting, and looking for ways to automate.
Tip 1: Determine What Assets We Have
IT assets, together, show what your environment is made up of – hardware, software, etc. Any device – workstation, laptop, server, switch, or router – is an asset, along with any piece of software deployed on your network – commercial off the shelf (COTS), proprietary in-house applications, as well as products and services from vendors and third parties that provide dedicated support or services (SaaS, PaaS, etc.) for cloud operations.
We also need to ensure that we’re aware of any support contracts with third party vendors, software providers, hardware manufacturers, etc. Within our Asset Management implementation, we should have a link or mapping between assets and any support contracts, and specifically, to any product update portals they provide.
To give a specific example, that means we might want to link our Windows licensing information to the Microsoft Update Catalog. Microsoft releases new updates on Tuesdays (hence the saying “Patch Tuesday”), and since the release of Windows 10, they typically roll out cumulative updates one Tuesday a month — generally close to the 15th of each month. It could be helpful to align our scheduling patches with that of our major suppliers. This could improve our efficiency as we seek to make patching a simple and easy process.
Thus, to truly implement an effective Patch Management program, we must have a complete awareness of all our IT assets. Many IT Service Management products provide this functionality and offer a robust suite of Asset Management tools within their platform.
Tip 2: Enable Our Organization to Quickly Apply Needed Changes
Once we’ve established a strong Asset Management program, we also need to account for how changes will be rapidly and safely introduced as needed; and implementing a solid Change Enablement practice can help us achieve that. Note: Past iterations of ITIL best practice referred to this process as Change Management, and many other frameworks will still call this Change Management, but in ITIL 4, the terminology has transitioned to Change Enablement. It should be easy, simple, and straightforward to make changes rather than preventing them like so many Change Management processes unfortunately, often do.
For a typical Patch Management practice, software updates and patches are considered what’s called a Standard Change, which are low-risk, well-defined types of changes. Because Standard Changes are generally well-understood and low risk, our teams will often pre-authorize them so they don’t need to be reviewed each and every time there’s an update or patch. When it is time to deploy these types of changes, someone that’s familiar with the change and has the authority to approve it has already pre-approved it and does not need to review it further.
The alternative would be labeling updates and patches as Normal Changes, meaning they would need to be reviewed by the appropriate party every single time. This would be time consuming and unnecessary, particularly when making these kinds of changes frequently. We want to enable important updates and patches to be handled in a responsible and timely fashion.
In some cases, particularly due to zero-day attacks, an Emergency Change may be necessary, but for the purposes of Patch Management and applying preventative controls, a Standard Change is often all we need to worry about.
Tip 3: Establish a Series of Patch Testing Phases.
Patch testing is extremely helpful in introducing patches safely. It’s highly dependent on the environment and infrastructure available, but it should generally follow a similar path with different technical applications depending on the environment (whether it be on-premise, cloud, hybrid, etc.)
In a prior role as a Service Desk Manager, I implemented a multi-phased deployment of patches. Phase 1 would roll out updates to completely isolated and separate infrastructure. Most of it was virtual to ensure the updates did not immediately break the operating systems or software they were being applied to. That phase would occur immediately upon a patch being released by a vendor, and would typically conclude immediately upon verifying all components in Phase 1 were unharmed.
Immediately after finishing Phase 1, we would start the “IT” phase, where each Team Lead in IT had a device at their workstation dedicated to patch testing. This approach allowed for Team Leads to do a few quality assurance tests before we began rolling the patch out to end-users. This phase would usually last one business day.
The third and final phase before “Release” was what our organization called the “Tech Evangelist” phase. Each department director outside of IT designated one or two personnel within their department as a “Tech Evangelist.” They were that department’s most tech-savvy users, and those most heavily engaged in technical support issues. We would deploy our final test stage to their workstations, and would then delay any further deployment for two business days to ensure no issues were identified by the tech evangelists of any departments.
Once the final phase concluded, we would deploy the patches to “All Systems” in our organization. One key point worth noting is to ensure that whatever testing procedures you apply, you are testing on relevant hardware and operating systems.
For example, if your users are running Windows 10 20H2 but your tests are conducted on a Windows 10 1709 build, your tests may not be 100% relevant, as each build of the operating system may have specific intricacies that a patch will impact differently, leaving you open to surprises when you deploy to production devices and see errors or performance issues that weren’t observed in your testing phases.
If during any phases of patch testing you encounter a failure, error, or other unwanted or unexpected result, it’s imperative to delay the update and conduct troubleshooting to identify the root cause. If the update was the cause, you may need to delay the rollout and monitor relevant vendor advisories for bug-fixes to their updates. If you do need to delay an update, you need to be certain that it is well-documented as to why, and return to your Change Enablement practice to re-classify this particular update as a “Normal Change” or “Emergency Change” requiring further review and approval by the appropriate Change Authority before rolling it out. Results of each testing phase should be thoroughly documented in whatever ITSM/Change tracking product your organization uses.
How to Measure Your Patch Management With Report Cards: Anything Less an “A” is a Failure!
Once you’ve rolled out or improved your organization’s Patch Management program, you want to ensure all patches are applying successfully. Naturally, some instances will fail due to unexpected circumstances – for example, power failure, user interference, or some other IT sorcery. In a perfect world, you will have 100% compliance for all of your security updates. However, our world isn’t perfect, so what should we realistically expect?
To truly mitigate risks associated with unpatched systems, and provide reasonable assurance to stakeholders that our organization is patched, we should be striving for 90% compliance or better. In one of my previous roles, anything less than 90% was considered a failure, and required a thorough report explaining the root cause of the low compliance rate, a decision to either remediate or accept the percentage, and a justification of the decision. This is important to a successful risk management and change enablement practice to show that we have thoroughly tracked the patch through its lifecycle, and have given real consideration to the risk of the patch not being applied. In most cases I experienced, the root cause was due to devices not being used and powered off by end-users, so we accepted the percentage because due to our automated procedures, as soon as they came online, the patches would be applied.
Why would anything less than 90% compliance be a failure though? To understand that, we need to pull out our dictionary again, and look at “Attack Surface.” NIST defines Attack Surface as:
The set of points on the boundary of a system, a system element, or an environment where an attacker can try to enter, cause an effect on, or extract data from, that system, system element, or environment.
To simplify, anything connected to the internet is part of your attack surface. The greater the percentage of compliance with patching, the more we have reduced our potential attack surface. From a risk management standpoint, anything less than 90% compliance means that the inverse of that number (10% or higher) is attack surface that could be the entry point of a breach.
Using Automation For Patch Management
You may say to yourself, “Worley, this concept is great if you work in a small business of 100 employees. How do we scale it up to 1,000 or 10,000 people, or even 100,000?”
That’s a great question. The answer is the use of automation with a Patch Management and/or configuration management tool or utility that allows your organization to scale up. Any Patch Management tool worth spending money and time on will provide the ability to design groups of users and devices, policies, reporting metrics, and maintenance windows, so you can apply your dedicated Patch Management process in an automated fashion, with minimal input required from IT personnel.
There are many tools available, some well-known favorites are Configuration Manager (formerly System Center Configuration Manager or SCCM), Intune, etc. Some ITSM and ISMS products provide for the ability to automate patching, while others may provide the option to integrate your Patch Management tools within the ITSM product.
In working with SCCM, I have been able to build packages or applications that point to a specific product installation. This functionality allows you to manage not just your Microsoft Windows updates, but also more easily and rapidly deploy third party applications such as Adobe Reader or Google Chrome.
Conclusion – Don’t Let Missed Patches Penetrate Your Organization’s Armor
In my experience, many organizations are missing the mark with how they manage patches and, as a result, how they manage security and protect their staff and customers. The good news is that, if done well, Patch Management is a simple, effective mitigation strategy against malicious threats. It can be challenging to know where to start, but with a tailored approach, any organization can implement a highly effective Patch Management program. ITIL 4 practices in conjunction with other security-focused frameworks such as NIST Risk Management Framework (RMF) or Cyber Security Framework (CSF) is a great starting point. Next time you learn of a vulnerability in your infrastructure, don’t wait two months to patch that up!