The aftermath of Crowdstrike: Re-evaluating the importance of severity classifications

The recent fault in a Crowdstrike update on 19 July 2024 resulted in one of the most widespread IT outages in recent history and has continued to dominate newsfeeds globally. This undetected fault caused over 8,5 million Microsoft Windows operating systems to crash and financial damage from this outage is estimated to be more than $10 billion. The disruption affected countless sectors including aviation, banking, hospitality, manufacturing, retail and many others. 

31 Jul 2024 4 min read Combined Corporate & Commercial and Technology & Communications Alert Article

At a glance

  • A recent fault in a Crowdstrike update on 19 July 2024 resulted in one of the most widespread IT outages in recent history. It caused over 8,5 million Microsoft Windows operating systems to crash and financial damage of more than $10 billion.
  • Microsoft addressed the outage by classifying it as a Severity 0 incident.
  • This proactive incident management assisted in the restoration of millions of affected systems which helped to mitigate the reputational damage caused.

The aftermath of the outage will leave many sectors and organisations that were affected rethinking their vulnerabilities and considering a series of risk mitigation measures. One of these will certainly be whether contracts with technology suppliers adequately cater for how material outages are proactively prevented and, when they do happen, how restoration of the system is achieved.

While Crowdstrike has suffered dire reputational damage, there were some swift and calculated interventions from Microsoft which are worth considering. There were also interventions from IT personnel globally who worked tirelessly to get their systems back online.

Severity 0 classification

Microsoft’s approach to addressing this through the classification of a Severity 0 (Sev0) incident is indicative of the importance of a service level matrix that will adopt a co-ordinated remediation plan which will apply in exceptional and catastrophic circumstances.

Sev0 incident classification under the Information Technology Infrastructure Library provides for the steps that need to be taken in high priority, urgent situations where immediate intervention is crucial to mitigate the high impact an event will have on a number of users or an organisation’s critical operations.

The process that follows a Sev0 incident alert seeks to limit the time that it takes for the company to intervene and is aimed at mitigating the reputational damage, financial loss and potential liability of an organisation.

In Microsoft’s case, its Sev0 reaction included contacting certain senior members of the organisation in the middle of the night and getting the on-call engineers to immediately diagnose the problem and find the cause. They were then required to find ways in which this error could be rectified within the shortest period possible (see article for reference).

Microsoft maintained constant lines of communication with all of its customers, with special emphasis on its priority customers such as Amazon and Google. This included hundreds of employees in various capacities maintaining a uniform engagement with enterprise customers. There were numerous workarounds that were continuously being deployed, refined and redeployed. Some of these included manual workarounds to reboot the system in safe mode, which allowed for the deletion of the faulty file.

Limiting reputational damage

This proactive incident management assisted in the restoration of millions of systems affected by the outage and mitigation to the reputational damage caused. Initially, Microsoft appeared to be blamed for the outage, but the active public engagement and communications strategy associated with a Sev0 classification seemed to rectify the perception that the faulty updates emanated from Microsoft. Eventually Microsoft was credited with supporting Crowdstrike in cleaning up the mess which it had created. The $10 voucher that Crowdstrike offered to its affected customers certainly did nothing to address the negative public sentiment caused by the incident.

If this event was not adequately flagged as a Sev0 alert level, and without proactive incident management, far greater damage may have been suffered by Microsoft, Crowdstrike and their customers.

Typical incident management includes severity or priority classifications that set out the different incident levels, how to correctly classify an incident, and what steps must be taken thereafter. It is important for organisations to reconsider whether they need to contract for Sev0 incidents with material IT providers. While some contracts already include a severity classification, we frequently review agreements where incident management measures are not adequately addressed or addressed at all.

This Sev0 classification should include who needs to be notified and their roles and responsibilities in a catastrophic event. The incident response plan should include what is communicated to stakeholders and when this occurs. This communication should allow for transparency with the stakeholders to mitigate the damages suffered by all interested parties. The provision of this enhanced response capability will require a targeted approach and will come at a material cost as more resources and systems will be required to manage catastrophic events. It will also be necessary to ensure that the internal operational mechanics tie into the Sev0 processes, which includes a public communications strategy that can be deployed to manage customer and public perception.  

This kind of catastrophic incident highlights the importance of well documented service levels and severity or priority level definitions. More importantly, there needs to be a co-ordinated operational construct to actually achieve restoration and mitigate the prospect of damages.

Companies are advised to seek legal advice when putting together, reviewing and updating service level matrices and agreements to help plan appropriately for these types of incidents.

The information and material published on this website is provided for general purposes only and does not constitute legal advice. We make every effort to ensure that the content is updated regularly and to offer the most current and accurate information. Please consult one of our lawyers on any specific legal problem or matter. We accept no responsibility for any loss or damage, whether direct or consequential, which may arise from reliance on the information contained in these pages. Please refer to our full terms and conditions. Copyright © 2024 Cliffe Dekker Hofmeyr. All rights reserved. For permission to reproduce an article or publication, please contact us cliffedekkerhofmeyr@cdhlegal.com.