The 2024 CrowdStrike incident caused blue screens of death (BSOD) on Microsoft Windows devices worldwide, severely disrupting operations across essential industry sectors.
While this incident may have come out of nowhere for some, third-party-related incidents are becoming increasingly common and impactful, especially as businesses continue to increase their reliance on external vendors, products, and services, so much so that a single faulty software update can cause one of the most severe IT disruptions in history.
Even more alarming, IT disruptions are not the only substantial threat organizations face at the hands of their third-party ecosystems. Recent studies suggest that nearly 30% of all data breaches stem from a third-party attack vector, costing organizations an average of $4.88 million. Despite this, 54% of businesses admit they don’t vet their third-party vendors adequately before onboarding them into their internal systems.
Now that the dust has settled and the consequences of improper third-party risk management are at the forefront of conversations surrounding operational resilience, many chief information security officers (CISOs) are searching for ways to prevent future third-party disruptions from devastating their IT systems and impacting their business continuity.
This blog explores several strategies CISOs can employ to increase their IT resilience and mitigate third-party risks before they result in operational disruptions or other severe consequences.
Learn how gain holistic insight into your third-party attack surface with UpGuard's Vendor Risk Management tool.
To prevent CrowdStrike-type incidents in the future and significantly decrease their impact, CISOs need to adopt comprehensive strategies that reduce third-party risk and increase the resilience of their IT systems. Here are several strategies CISOs can employ to help in some way:
The organizations that reinstated their operations most efficiently in the aftermath of the CrowdStrike incident were those that could quickly bring together key decision-makers in a 'war room.' A war room is a centralized command center where specialists gather to manage a crisis in real time. Many organizations make the mistake of assuming a carefully crafted incident response plan is sufficient enough to reduce operational disruption risks. But -- as the CrowdStrike incident so delicately pointed out - you can't prepare for every possible IT disruption.
A war room is a critical safety measure that can bridge the gap between your response plans and an unplanned IT crisis.
To have the capacity to address the broadest scope of potential disruptions, you need to fill your war room with representatives of your primary risk categories. For medium to large enterprises, the list of specialized personnel should at least include your:
Other personnel that you could include in a war room besides C-Suite members include:
All your war room members should congregate regardless of the specific risk exposure a given event has inflamed. If a disruption is significant enough to trigger a war room gathering, it will likely have rippled effects across multiple risk categories, requiring collaborative response efforts across multiple business functions.
Whether the gathering occurs in-person or remotely, a war room setup should enable the following:
In the case of the CrowdStrike incident, UpGuard provided customers with complete awareness of all their impacted third—and even fourth-party vendors.
Determine a third-party incident impact threshold for activating a war room gathering, as it's a significant resource maneuver. Your definition of this threshold will be a relationship between a static component (your third-party risk appetite) and a dynamic component (emerging risks in your external attack surface).

Your threshold for activating a war room is based on a combination of your third-party risk appetite and your current exposure to emerging third-party risks.
A tool like UpGuard Vendor Risk could support the dynamic component of a war room trigger definition with a real-time news feed of emerging risks in the third and fourth-party networks.

While even the most prepared third-party risk management (TPRM) program wouldn’t have prevented the faulty CrowdStrike update from happening, it would have enabled an organization to better understand which of its vendors was affected. By quickly identifying which vendors were impacted by the CrowdStrike outage, an organization could have pursued mitigation as efficiently as possible, limiting the time operations could have been disabled by an out-of-service vendor.
Also, the next third-party incident your organization faces may not be a software outage. It could be a cyber attack or data breach. By deploying critical TPRM tools and strategies, your organization can better protect itself from the potential risks present across your third-party attack surface.
The most effective Third-Party Risk Management software include the following components:
Establishing a program with these components will empower your organization to swiftly identify, mitigate, and remediate third-party risks before they damage your organization and improve your response time when unavoidable incidents occur.
Automated Third-Party Risk Management software also enable organizations to improve their operational resilience and risk management without excessive manual effort. Compared to traditional risk management workflows, Vendor Risk empowers security teams to conduct comprehensive risk assessments in half the time.
To learn more about how UpGuard can help your organization, book your FREE demo today.
In a world where we're spoiled for choice in terms of process automation options, it's tempting to become complacent, allowing all knowledge of manual approaches to atrophy. The CrowdStrike incident, however, inverted years of IT progress, suddenly popularizing an old-school approach to incident response.
Because the faulty CrowdStrike update affected the core functioning of impacted systems, most automated remediation tasks were ineffective, necessitating a time-consuming, hands-on approach to purging millions of devices of the problematic update.
To ensure your IT personnel maintain sharp manual problem-solving instincts, consider reintroducing a regular rotation of hackathons. To enhance resilience to vendor ecosystem disruptions similar to the CrowdStrike incident, choose projects that will enhance the impact of your Third-Party Risk Management program. Here are some examples.
Incident Response Simulation
Automated Remediation Tools
Enhanced Monitoring and Alerting Systems
Risk Assessment and Management Framework
Disaster Recovery Plan Development
Security Testing Automation
Multi-Cloud Resilience Strategy
Real-Time Incident Communication Platform
For inspiration for an optimal design of an integrated collaboration project, watch this video to learn how UpGuard streamlines vendor collaborations
A key objective to prevent future disruptions similar to the CrowdStrike incident is eliminating all risk concentrations in your IT ecosystem. This can be achieved by architecting increased diversity into the layers of your production system and technology stacks. Such an approach would aim for software agents, components, or IT subsystems with the potential of causing disruption through faulty updates to safely fail without total disablement of viable service capacity.
Diversifying your tech stack through policy changes or architectural reforms also has the benefit of disrupting cyber attack pathways and supporting your cybersecurity program with an additional layer of data breach protection.
One strategy for achieving a more graceful system degradation rather than a sudden catastrophic failure is implementing separate protective security stacks on different portions of the total workload capacity.
An example of this is structuring your infrastructure such that your web and database servers are protected by their own unique set of security controls. This way, if a faulty security update disrupts your web server operations, your database server controls will continue to operate as normal. This approach reduces the risk of your overall system functionality hinging on a single point of failure.

The downside of this approach is that it may increase risk management complexity and environmental and operational risk exposures. However, in high-maturity instances (such as Configuration-as-Code, Infrastructure-as-Code, and IT change management scenarios), the additional risk exposure is smaller, making this an attractive option for dispersing risk concentrations in such cases.
If you decide to diversify your security stack, keep the following implications in mind:
One of the most vital lessons from the CrowdStrike incident is the importance of understanding your end-to-end dependency chains for critical systems. Such awareness will help risk management teams predict the likely impact of external disruptions and the effort required to reinstate regular operation
Your dependency map should identify all interconnected components and services your critical systems rely upon to function correctly. This effort involves several steps:
Watch this video for an overview of how to keep track of all IT assets comprising yout attack surface.
The CrowdStrike incident revealed that even the most innocuous-seeming software updates can cause significant problems to an organization’s IT infrastructure. Moving forward, CISOs need to develop a more comprehensive approach to update management.
CISOs must implement a rigorous update management program that evaluates and tests each update during pre-deployment and throughout different IT environments to detect issues before they become harmful. Staging environments, sometimes called replica environments, can be used to test the performance of updates without subjecting an organization’s actual IT system to an untested software update.
In addition, CISOs should develop procedures to reduce the immediacy of software updates across critical environments and infrastructure. One low-resource method is to categorize all software components into three separate stacks:
If most of your components fall into the second stack, you may need to separate them further into substacks to achieve a more beneficial distribution. You can assess whether delaying Stack 2 updates by four, eight, or 24 hours will increase security or continuity risk.
Diversifying your software solutions will increase resiliency across your entire IT infrastructure and prepare your organization to handle future disruptions effectively. Consider employing the following strategies to increase your IT resilience:
A multi-cloud strategy could significantly reduce the risk concentration of relying on a single Cloud Service Provider (CSP). This approach involves strategically distributing workloads across multiple CSPs, thereby reducing the chances of major operational disruptions due to a single CSP failing.
Some examples of Multi-Cloud Strategies include
Disruption incidents can be devastating but also present opportunities for continued improvement when used to elevate current systems and processes. One takeaway many organizations have had after CrowdStrike is the importance of developing comprehensive incident response and disaster recovery programs.
While you should calibrate your security programs to defend against the broadest array of risks, avoiding every cyber incident is impossible. A dedicated incident response plan helps you identify, mitigate, and remediate unforeseen incidents as efficiently as possible.
The best incident response plans operate across six main phases:
Related reading: How to Create an Incident Response Plan (Detailed Guide)
Outages and disruptions similar to CrowdSrike are powerful reminders of the necessity for robust infrastructure resilience and effective disaster recovery plans. Developing these plans and taking proactive measures are essential to ensure systems remain operational during unforeseen events. Disaster planning involves not only diversifying solutions but also continuously assessing and refining recovery strategies.
Regularly scheduled drills, thorough evaluations, and strategic partnerships with reliable providers can significantly enhance an organization's ability to respond to and recover from disruptions. By implementing these best practices, CIOs can ensure their infrastructure is well-prepared to handle any challenges that may arise:
The CrowdStrike incident demonstrated that even cybersecurity software—which has a reputation for being the most hardened and resilient of all software types—is susceptible to operational failures.
Addressing this underserved risk category will require adjusting your risk management lens to regard all security software components - especially those with a high potential of disrupting critical production workloads - with the same degree of prejudice as Operating Systems and general application updates.
This mindset shift will require assessing all current security components for any immediate significant disabling or disruptive impacts. You should apply these impact tests to a broad range of environments, including server workloads, which handle backend processes, and End-User Computing (EUC) environments, which directly affect user productivity.
Share the findings of your impact analysis with relevant stakeholders. Use their feedback to refine the testing processes and mitigate any identified risks before new security software components come into your production environment.
Use this opportunity to re-evaluate your current Vendor Risk Management tool and its effectiveness in mitigating third-party cyber risk exposure for your entire vendor ecosystem. After all, you're much more likely to experience a critical disruption from a third-party data breach than another faulty security software update.
To encourage threat response agility while minimizing risk exposure, your VRM tool should include integrated workflows that address the entire TPRM lifecycle and leverage automation technology to seamlessly manage vendor risk assessments at scale.
To extend your objective of dispersing risk concentrations to the vendor ecosystem, your VRM tool should also be capable of quickly adapting to new, unexpected supply chain threats, like the CrowdStrike incident, which sent shockwaves to third-party vendors globally.
Of course, the best way you can prevent third-party risks from impacting your organization is to identify and mitigate them before they become problematic. A comprehensive, all-in-one, TPRM solution like UpGuard Vendor Risk helps organizations across industries do exactly that.
The UpGuard toolkit includes automated workflows that empower security teams to better understand the security posture of their third-party ecosystem through the following: