July 19th was a fateful day for CrowdStrike. As you’ll recall, when CrowdStrike launched an update that day, it took down systems across the globe. Nearly every sector was impacted from travel to healthcare to banking. Now that we’re over a month out from the incident, it’s important to look back on what happened and what it means not just for protecting IT from widespread outages, but OT as well.
What Happened?
Essentially, the outage stemmed from “a flawed update to CrowdStrike Falcon, the company’s popular endpoint detection and response (EDR) platform,” according to CIO’s explanation. The update caused Windows machines to stall in a reboot cycle, which was likely allowed to happen because of a missing aspect in the testing software that didn’t catch the flaw.
In order to dive further into what unfolded, Microsoft just recently announced that it will join with CrowdStrike and other security companies to host a conference in September. During the conference, they will discuss ways to prevent such an event in the future as well as ways to have “applications rely more on a part of Windows called user mode instead of the more privileged kernel mode,’ as CNBC reports. Simply put, kernel mode means that all Windows will crash with an issue whereas user mode is more specifically isolated.
Lessons Learned
Regardless of what comes out of that convening, it’s clear that we need to be better prepared to both prevent and handle such situations. That comes with understanding the complexity of the software supply chain and increasingly interconnectedness of systems. While the impact on IT was clearly evident in the case of CrowdStrike, OT was also affected.
“IT systems operating in the OT process control environment such as Windows-based HMIs, Windows-based historians and Windows-based engineering workstations (e.g, FactoryTalk, Inductive Automation Ignition, etc.,) running CrowdStrike endpoint protection also experienced impacts due to the CrowdStrike event,” writes Joe Weiss at Control Global. This emphasizes the need to silo OT from eventually being exposed to weaknesses in IT. While faulty even in IT, relying on automated updates in OT should not be done unless rigorously tested beforehand. In actuality, you don’t want to use the same tools in IT and OT networks, which is why it’s important to work with a vendor such as DYNICS that specifically understands OT needs.
Sources:
- “CrowdStrike failure: What you need to know” – CIO https://www.cio.com/article/3476789/crowdstrike-failure-what-you-need-to-know.html
- “Microsoft plans September cybersecurity event to discuss changes after CrowdStrike outage” – Jordan Novet, CNBC https://www.cnbc.com/2024/08/23/microsoft-plans-september-cybersecurity-event-after-crowdstrike-outage.html
- “Sprawling CrowdStrike Incident Mitigation Showcases Resilience Gaps” – Jai Vijayan, Dark Reading
https://www.darkreading.com/ics-ot-security/sprawling-crowdstrike-incident-mitigation-showcases-resilience-gaps - “CrowdStrike, SolarWinds and Stuxnet demonstrated the cyber fragility of IT and OT systems” – Joe Weiss, Control Global https://www.controlglobal.com/blogs/unfettered/blog/55129634/crowdstrike-solarwinds-and-stuxnet-demonstrated-the-cyber-fragility-of-it-and-ot-systems