On July 19th, 2024, the world experienced a stark reminder of our dependence on technology when a widespread computer outage, dubbed by many as the “blue screen of death,” affected businesses, transportation, and industries globally. The culprit? An update by cybersecurity company CrowdStrike that went awry, causing significant disruptions across Windows machines worldwide.
This incident serves as a critical lesson for professionals in the audiovisual (AV) industry, highlighting the importance of thorough testing, careful implementation, and having robust backup plans for software updates. This was the topic of discussion on a recent episode of a State of Control.
To listen to the full episode click here.
The Incident and Its Aftermath
The CrowdStrike update, which was intended to enhance security, instead caused widespread system failures. Initial reactions ranged from panic to speculation about potential cyberattacks. However, as the dust settled, it became clear that the root cause was much simpler and all too familiar to those in the tech industry: a coding error, possibly even a typo, that slipped through the cracks of the company’s quality assurance process.
Rich Fregosa of Fregosa Design noted, “It was kind of a computer science 101 that everybody went, ‘Whoops, they got complacent.'” This complacency, potentially driven by time pressures, budget constraints, or overconfidence, led to a cascading failure that affected systems globally.
Lessons for the AV Industry
While the scale of the CrowdStrike incident far exceeds typical AV projects, the underlying issues are remarkably similar to challenges faced in our industry. Here are key takeaways for AV professionals:
- Thorough Testing is Crucial: Paul Konikowski from Cronos emphasized the importance of comprehensive testing, including trying to “break your own program” and considering potential “bad actor” scenarios.
- Challenge Leadership When Necessary: James King from UNLV stressed the importance of pushing back against unrealistic deadlines or pressure to release untested updates. “It is our job to challenge leadership,” he stated, emphasizing the need to prioritize thorough vetting over rushed deadlines.
- Implement Gradual Rollouts: The podcast discussion highlighted the importance of phased implementations. Instead of updating all systems simultaneously, consider a staged approach to minimize potential damage.
- Maintain Backup Systems: While not always feasible, having backup systems or the ability to quickly revert changes can be crucial in mitigating the impact of failed updates.
- Document Everything: Proper documentation of system configurations, code versions, and update processes is essential for troubleshooting and maintaining system integrity over time.
- Consider the Risks of Standardization: While standardization can streamline operations, it also increases vulnerability to widespread failures. Balance the benefits of uniformity with the need for resilience.
- Prepare for Failure: As Rich Fregosa pointed out, “It is not a matter of if, it’s a matter of when.” Having a plan in place for when things go wrong is as important as trying to prevent failures in the first place.
The Role of AV Professionals
The CrowdStrike incident underscores the critical role that AV professionals play in maintaining system integrity and reliability. It’s not just about writing code or installing equipment; it’s about being a trusted advisor who can anticipate potential issues and proactively address them.
Steve Greenblatt of Control Concepts raised an important question about whether it’s time for the AV industry to “grow up a little bit” by implementing more rigorous project requirements, including comprehensive test plans and update strategies.
Prepare
The CrowdStrike outage serves as a sobering reminder of the fragility of our technological infrastructure and the far-reaching consequences of even small errors. For AV professionals, it reinforces the need for diligence, thorough testing, and a commitment to best practices in software development and system implementation.
As our industry continues to evolve and integrate more deeply with IT systems, the lessons from this incident become increasingly relevant. By prioritizing thorough testing, gradual rollouts, and robust backup plans, AV professionals can help ensure the reliability and resilience of the systems they design and maintain.
In the end, as Rich Fregosa aptly put it, becoming a respected professional in the AV industry is about more than technical skills—it’s about “the ability to advise and be proactive and protect your clients from themselves.” This incident serves as a powerful reminder of that responsibility.