Last updated
December 1, 2025
{x} minute read
Written by
Reviewed by
Table of contents

The 2024 CrowdStrike incident caused blue screens of death (BSOD) on Microsoft Windows devices worldwide, severely disrupting operations across essential industry sectors.

While this incident may have come out of nowhere for some, third-party-related incidents are becoming increasingly common and impactful, especially as businesses continue to increase their reliance on external vendors, products, and services, so much so that a single faulty software update can cause one of the most severe IT disruptions in history. 

Even more alarming, IT disruptions are not the only substantial threat organizations face at the hands of their third-party ecosystems. Recent studies suggest that nearly 30% of all data breaches stem from a third-party attack vector, costing organizations an average of $4.88 million. Despite this, 54% of businesses admit they don’t vet their third-party vendors adequately before onboarding them into their internal systems.  

Now that the dust has settled and the consequences of improper third-party risk management are at the forefront of conversations surrounding operational resilience, many chief information security officers (CISOs) are searching for ways to prevent future third-party disruptions from devastating their IT systems and impacting their business continuity.

This blog explores several strategies CISOs can employ to increase their IT resilience and mitigate third-party risks before they result in operational disruptions or other severe consequences.

Learn how gain holistic insight into your third-party attack surface with UpGuard's Vendor Risk Management tool.

Key Strategies for CISOs to Prevent Future Disruptions

To prevent CrowdStrike-type incidents in the future and significantly decrease their impact, CISOs need to adopt comprehensive strategies that reduce third-party risk and increase the resilience of their IT systems. Here are several strategies CISOs can employ to help in some way: 

1. Establish a 'War Room'

The organizations that reinstated their operations most efficiently in the aftermath of the CrowdStrike incident were those that could quickly bring together key decision-makers in a 'war room.' A war room is a centralized command center where specialists gather to manage a crisis in real time. Many organizations make the mistake of assuming a carefully crafted incident response plan is sufficient enough to reduce operational disruption risks. But -- as the CrowdStrike incident so delicately pointed out - you can't prepare for every possible IT disruption.

A war room is a critical safety measure that can bridge the gap between your response plans and an unplanned IT crisis.

To have the capacity to address the broadest scope of potential disruptions, you need to fill your war room with representatives of your primary risk categories. For medium to large enterprises, the list of specialized personnel should at least include your:

  • CISO - representing cyber risk exposure
  • Information Security Officer - representing IT risk exposure
  • Chief Financial Officer - representing financial risk exposure
  • Chief Risk Officer - representing operational risk exposure

Other personnel that you could include in a war room besides C-Suite members include:

  • Head of compliance - representing compliance risk exposure
  • IT manager - representing IT and data security risk exposure
  • Cybersecurity manager - representing security and third-party risk exposure
  • Legal council - represents legal risk exposure

All your war room members should congregate regardless of the specific risk exposure a given event has inflamed. If a disruption is significant enough to trigger a war room gathering, it will likely have rippled effects across multiple risk categories, requiring collaborative response efforts across multiple business functions.

Whether the gathering occurs in-person or remotely, a war room setup should enable the following:

  • Rapid information sharing: The efficient breakdown of all critical information regarding the active incident, either through impact analysis reports or vendor risk summary reports
  • Decision-making agility: The ability to make swift, informed decisions to mitigate the impact of the outage and expedite recovery efforts
  • Real-time impact and remediation monitoring: All members should have access to a real-time monitoring feed of all affected systems. If remediation action has been deployed, members should have visibility into each task's status.
  • Development and maintenance of a timeline of events: In the heat of a crisis, it can be difficult to track events occurring in near real time and look for causal relationships between them. A detailed timeline is also essential to manage future audit and compliance processes.

In the case of the CrowdStrike incident, UpGuard provided customers with complete awareness of all their impacted third—and even fourth-party vendors.

Get a free trial of UpGuard >

Determine a third-party incident impact threshold for activating a war room gathering, as it's a significant resource maneuver. Your definition of this threshold will be a relationship between a static component (your third-party risk appetite) and a dynamic component (emerging risks in your external attack surface).

third-party incident impact threshold for activating a war room gathering

Your threshold for activating a war room is based on a combination of your third-party risk appetite and your current exposure to emerging third-party risks.

A tool like UpGuard Vendor Risk could support the dynamic component of a war room trigger definition with a real-time news feed of emerging risks in the third and fourth-party networks.

UpGuard's newsfeed confirming vendors impacted by Crowdstrike incident.
UpGuard's newsfeed confirming vendors impacted by the Crowdstrike incident.

2. Develop a vigilant third-party risk management program

While even the most prepared third-party risk management (TPRM) program wouldn’t have prevented the faulty CrowdStrike update from happening, it would have enabled an organization to better understand which of its vendors was affected. By quickly identifying which vendors were impacted by the CrowdStrike outage, an organization could have pursued mitigation as efficiently as possible, limiting the time operations could have been disabled by an out-of-service vendor. 

Also, the next third-party incident your organization faces may not be a software outage. It could be a cyber attack or data breach. By deploying critical TPRM tools and strategies, your organization can better protect itself from the potential risks present across your third-party attack surface. 

The most effective Third-Party Risk Management software include the following components:

  • Vendor Risk Assessments
  • Vendor Security Questionnaires 
  • Continuous security monitoring 
  • Detailed reports and dashboards

Establishing a program with these components will empower your organization to swiftly identify, mitigate, and remediate third-party risks before they damage your organization and improve your response time when unavoidable incidents occur. 

Automated Third-Party Risk Management software also enable organizations to improve their operational resilience and risk management without excessive manual effort. Compared to traditional risk management workflows, Vendor Risk empowers security teams to conduct comprehensive risk assessments in half the time. 

To learn more about how UpGuard can help your organization, book your FREE demo today

3. Don't become too dependent on automation

In a world where we're spoiled for choice in terms of process automation options, it's tempting to become complacent, allowing all knowledge of manual approaches to atrophy. The CrowdStrike incident, however, inverted years of IT progress, suddenly popularizing an old-school approach to incident response.

Because the faulty CrowdStrike update affected the core functioning of impacted systems, most automated remediation tasks were ineffective, necessitating a time-consuming, hands-on approach to purging millions of devices of the problematic update.

To ensure your IT personnel maintain sharp manual problem-solving instincts, consider reintroducing a regular rotation of hackathons. To enhance resilience to vendor ecosystem disruptions similar to the CrowdStrike incident, choose projects that will enhance the impact of your Third-Party Risk Management program. Here are some examples.

Incident Response Simulation

  • Develop and implement comprehensive incident response playbooks that integrate automated response scripts and real-time system telemetry dashboards for large-scale IT outages.

Automated Remediation Tools

  • Create sophisticated automation scripts or software agents that can detect, isolate, and remediate issues caused by faulty updates using machine learning models to predict and prevent similar incidents.

Enhanced Monitoring and Alerting Systems

  • Design and deploy advanced monitoring solutions using AI-driven anomaly detection algorithms and real-time alerting mechanisms and integrate them into SIEM (Security Information and Event Management) systems.

Risk Assessment and Management Framework

  • Build robust risk assessment tools leveraging big data analytics and continuous monitoring capabilities to evaluate and visualize third-party vendor risks dynamically.

Disaster Recovery Plan Development

  • Develop detailed disaster recovery frameworks incorporating automated failover systems, continuous data replication techniques, and orchestration tools for seamless recovery processes.

Security Testing Automation

  • Create and integrate CI/CD pipeline security testing tools that automatically perform static and dynamic code analysis, vulnerability scanning, and penetration testing before deploying updates.

Multi-Cloud Resilience Strategy

  • Develop and implement workload distribution and failover strategies across multiple cloud providers using container orchestration platforms like Kubernetes and multi-cloud management tools.

Real-Time Incident Communication Platform

  • Build and deploy a real-time communication platform with incident tracking, automated notification systems, and integrated collaboration tools for efficient incident management and coordination.

For inspiration for an optimal design of an integrated collaboration project, watch this video to learn how UpGuard streamlines vendor collaborations

Get a free trial of UpGuard >

4. Diversify your tech (and security) stack

A key objective to prevent future disruptions similar to the CrowdStrike incident is eliminating all risk concentrations in your IT ecosystem. This can be achieved by architecting increased diversity into the layers of your production system and technology stacks. Such an approach would aim for software agents, components, or IT subsystems with the potential of causing disruption through faulty updates to safely fail without total disablement of viable service capacity.

Diversifying your tech stack through policy changes or architectural reforms also has the benefit of disrupting cyber attack pathways and supporting your cybersecurity program with an additional layer of data breach protection.

One strategy for achieving a more graceful system degradation rather than a sudden catastrophic failure is implementing separate protective security stacks on different portions of the total workload capacity.

An example of this is structuring your infrastructure such that your web and database servers are protected by their own unique set of security controls. This way, if a faulty security update disrupts your web server operations, your database server controls will continue to operate as normal. This approach reduces the risk of your overall system functionality hinging on a single point of failure.

web and database servers are protected by their own unique set of security controls

The downside of this approach is that it may increase risk management complexity and environmental and operational risk exposures. However, in high-maturity instances (such as Configuration-as-Code, Infrastructure-as-Code, and IT change management scenarios), the additional risk exposure is smaller, making this an attractive option for dispersing risk concentrations in such cases.

If you decide to diversify your security stack, keep the following implications in mind:

  • Be prepared for increased costs due to managing more vendors, purchasing additional licenses, and developing the necessary internal or external capabilities to design, implement, and maintain these new security measures.
  • Every third-party component added to your security stack will expand your attack surface. However, this slight expansion may be necessary to reduce your overall risk exposure.

5. Map your end-to-end dependency chains for critical systems

One of the most vital lessons from the CrowdStrike incident is the importance of understanding your end-to-end dependency chains for critical systems. Such awareness will help risk management teams predict the likely impact of external disruptions and the effort required to reinstate regular operation

Your dependency map should identify all interconnected components and services your critical systems rely upon to function correctly. This effort involves several steps:

  • Step 1 - Inventorize your IT assets: Catalog all hardware, software, and network components of which your critical systems are compromised.
  • Step 2 - Identify Interdependencies: Understand how all critical system components interact with each other. This effort should continue along the dependency chain to your vendor ecosystem, noting external dependencies on third-party services and Managed Service Providers.
  • Step 3 - Document Processes and Workflows: Produce detailed documentation of all the processes and workflows dependent on these systems. This effort will make it easier to visualize the impact of a failure at any point in the dependency chain
  • Step 4 - Assess Criticality: Evaluate the criticality of each component and dependency. Identify which elements are essential for operations and which have redundancies or failover options.

Watch this video for an overview of how to keep track of all IT assets comprising yout attack surface.

6. Establish comprehensive update management procedures

The CrowdStrike incident revealed that even the most innocuous-seeming software updates can cause significant problems to an organization’s IT infrastructure. Moving forward, CISOs need to develop a more comprehensive approach to update management. 

CISOs must implement a rigorous update management program that evaluates and tests each update during pre-deployment and throughout different IT environments to detect issues before they become harmful. Staging environments, sometimes called replica environments, can be used to test the performance of updates without subjecting an organization’s actual IT system to an untested software update. 

In addition, CISOs should develop procedures to reduce the immediacy of software updates across critical environments and infrastructure. One low-resource method is to categorize all software components into three separate stacks: 

  • Stack 3 - Low Disruption Risk: These would include components unlikely to interfere with critical system operations, such as OS kernel operations, TCP/IP, and other higher network layer driver components. Your security team will usually be able to delay updates to components in this category with little risk of disruption.
  • Stack 2 - High Disruption Risk: These components present a higher disruption risk if your personnel delay updates.
  • Stack 1 - Critical Security Updates: These components are necessary for protecting your environments against immediate threats, such as Zero-Days, and you must immediately accept all new updates despite their potential disruption risks.

If most of your components fall into the second stack, you may need to separate them further into substacks to achieve a more beneficial distribution. You can assess whether delaying Stack 2 updates by four, eight, or 24 hours will increase security or continuity risk.

7. Enhance resilience by avoiding single points of failure

Diversifying your software solutions will increase resiliency across your entire IT infrastructure and prepare your organization to handle future disruptions effectively. Consider employing the following strategies to increase your IT resilience: 

  • Diversifying solutions: Implement redundancy and failover mechanisms to ensure critical systems remain operational despite component failures.
  • Hybrid or multi-cloud infrastructure: Adopt hybrid or multi-cloud infrastructure to reduce the risk of single points of failure and distribute workloads across multiple environments to enhance redundancy, flexibility, and disaster recovery capabilities.
  • Load balancing and geographic distribution: Utilize load balancing to distribute traffic evenly across servers and distribute resources across environments to mitigate risks associated with localized failures.

A multi-cloud strategy could significantly reduce the risk concentration of relying on a single Cloud Service Provider (CSP). This approach involves strategically distributing workloads across multiple CSPs, thereby reducing the chances of major operational disruptions due to a single CSP failing.

Some examples of Multi-Cloud Strategies include

  • Strategic workload distribution: The distribution of critical system workloads across multiple CSPs such that a greater weight of critical applications is assigned to CSPs with the least likelihood of failure
  • Redundancy and diversification: This is a more general approach to workload distribution with an emphasis on diversification so that the potential of total system outage due to a single failure CSP is greatly reduced.
  • Failover mechanisms: Failover mechanisms automatically reroute traffic to an alternate CSP when a CPS fails. The effectiveness of this approach is contingent on seamless operation diverting without any discernable effects on service availability. Tools such as Kubernetes or multi-cloud management platforms can monitor the health of services across different CSPs and initiate failovers without manual intervention.
  • Performance optimization: Continuously monitor the performance of applications across different CSPs, utilizing load balancing to ensure optimal resource management.
  • Cost management: Implement FinOps practices to manage and optimize costs associated with multi-cloud deployments. Use cost management tools to monitor spending across different CSPs and make informed decisions about resource allocation efficiency.

8. Continually calibrate your incident response plan

Disruption incidents can be devastating but also present opportunities for continued improvement when used to elevate current systems and processes. One takeaway many organizations have had after CrowdStrike is the importance of developing comprehensive incident response and disaster recovery programs. 

While you should calibrate your security programs to defend against the broadest array of risks, avoiding every cyber incident is impossible. A dedicated incident response plan helps you identify, mitigate, and remediate unforeseen incidents as efficiently as possible. 

The best incident response plans operate across six main phases: 

  • Preparation: Establish the architecture of your incident response plan, draft key policies, and assemble your incident response toolbox
  • Identification: Deciding when to activate the incident response plan after your security team has identified a security incident
  • Containment: Isolating the incident and preventing further damage to other systems or environments
  • Eradication: Remediating the security incident while prioritizing continued containment and protection for critical systems
  • Recovery: Returning all systems to their standard state before the security incident occurred or infected the system
  • Lessons learned: Completing incident documentation and learning how to prevent similar incidents from occurring in the future

Related reading: How to Create an Incident Response Plan (Detailed Guide) 

9. Assess the effectiveness of your disaster recovery program

Outages and disruptions similar to CrowdSrike are powerful reminders of the necessity for robust infrastructure resilience and effective disaster recovery plans. Developing these plans and taking proactive measures are essential to ensure systems remain operational during unforeseen events. Disaster planning involves not only diversifying solutions but also continuously assessing and refining recovery strategies. 

Regularly scheduled drills, thorough evaluations, and strategic partnerships with reliable providers can significantly enhance an organization's ability to respond to and recover from disruptions. By implementing these best practices, CIOs can ensure their infrastructure is well-prepared to handle any challenges that may arise:

  • Proactive assessment: Regularly evaluate infrastructure resilience and disaster recovery plans to ensure preparedness for future disruptions.
  • Simulated drills: Conduct regular simulated drills to test disaster recovery plans, identifying weaknesses and areas for improvement.
  • Partnerships with reliable vendors: Collaborate with reliable providers to enhance preparedness and response capabilities by leveraging their expertise and resources.

10. Comprehensive testing and impact analysis of security software components

The CrowdStrike incident demonstrated that even cybersecurity software—which has a reputation for being the most hardened and resilient of all software types—is susceptible to operational failures.

Addressing this underserved risk category will require adjusting your risk management lens to regard all security software components  - especially those with a high potential of disrupting critical production workloads - with the same degree of prejudice as Operating Systems and general application updates.

This mindset shift will require assessing all current security components for any immediate significant disabling or disruptive impacts. You should apply these impact tests to a broad range of environments, including server workloads, which handle backend processes, and End-User Computing (EUC) environments, which directly affect user productivity.

Share the findings of your impact analysis with relevant stakeholders. Use their feedback to refine the testing processes and mitigate any identified risks before new security software components come into your production environment.

Don't limit your scope to just security vendors.

Use this opportunity to re-evaluate your current Vendor Risk Management tool and its effectiveness in mitigating third-party cyber risk exposure for your entire vendor ecosystem. After all, you're much more likely to experience a critical disruption from a third-party data breach than another faulty security software update.

To encourage threat response agility while minimizing risk exposure, your VRM tool should include integrated workflows that address the entire TPRM lifecycle and leverage automation technology to seamlessly manage vendor risk assessments at scale.

To extend your objective of dispersing risk concentrations to the vendor ecosystem, your VRM tool should also be capable of quickly adapting to new, unexpected supply chain threats, like the CrowdStrike incident, which sent shockwaves to third-party vendors globally.

Improving third-party risk visibility and mitigation with UpGuard

Of course, the best way you can prevent third-party risks from impacting your organization is to identify and mitigate them before they become problematic. A comprehensive, all-in-one, TPRM solution like UpGuard Vendor Risk helps organizations across industries do exactly that. 

The UpGuard toolkit includes automated workflows that empower security teams to better understand the security posture of their third-party ecosystem through the following: 

  • Vendor risk assessments: Fast, accurate, and comprehensive view of your vendors’ security posture
  • Security ratings: Objective, data-driven measurements of an organization’s cyber hygiene
  • Security questionnaires: Flexible questionnaires that accelerate the assessment process using automation and provide deep insights into a vendor’s security
  • Reports library: Tailor-made templates that support security performance communication to executive-level stakeholders 
  • Risk mitigation workflows: Comprehensive workflows to streamline risk management measures and improve overall security posture
  • Integrations: Application integrations for Jira, Slack, ServiceNow, and over 4,000 additional apps with Zapier, plus customizable API calls
  • Data leak protection: Protect your brand, intellectual property, and customer data with timely detection of data leaks and avoid data breaches
  • 24/7 continuous monitoring: Real-time notifications and new risk updates using accurate supplier data
  • Attack surface reduction: Reduce your third and fourth-party attack surface by discovering exploitable vulnerabilities and domains at risk of typosquatting
  • Trust Page: Simplify security posture communication with prospects and win more business partnerships with an UpGuard Trust Page
  • Intuitive design: Easy-to-use first-party dashboards
  • World-class customer service: Plan-based access to professional cybersecurity personnel that can help you get the most out of UpGuard

Related posts

Learn more about the latest issues in cybersecurity.