Understanding the CrowdStrike Incident of July 2024

In July 2024, the digital world was rocked by a significant event: the CrowdStrike incident. In this blog post, we’ll delve into what happened, why it happened, and how the issue is being resolved. This incident, involving CrowdStrike’s Falcon software, caused disruptions to over 8 million Windows computers globally, impacting critical services and daily operations for millions. Let’s explore these aspects in detail.

What Happened?

On July 19, 2024, millions of Windows computers experienced the infamous “Blue Screen of Death” (BSOD). This event didn’t just affect individual users but had widespread ramifications, disrupting businesses, airlines, hospitals, and other critical services worldwide. As a result, many missed flights, appointments, and other important engagements, illustrating the extensive reach of this disruption.

The BSOD is a common indicator of severe system failure in Windows computers, often caused by critical errors at the kernel level, which is the core part of the operating system responsible for managing hardware and system resources.

Why Did It Happen?

To understand why this happened, we can use the analogy of a castle. Imagine a castle with multiple security layers: the outer perimeter (area one) and the innermost secure area (area zero). In a computer system, these areas are analogous to ring levels, with ring zero representing the most secure part of the system (kernel mode), where the operating system and critical drivers run, and ring one representing user mode, where applications operate.

CrowdStrike’s Falcon software, an advanced anti-malware solution, operates at ring zero. This high-level access allows it to effectively monitor and prevent malware but also means that any issue with Falcon can directly impact the core functions of the operating system.

On July 19th, a dynamic update to Falcon included an incorrect or corrupted file. Despite the Falcon software being certified by Microsoft’s Windows Hardware Quality Labs (WHQL), the update led to a critical failure. The incorrect file caused the Falcon driver, running in kernel mode, to malfunction, leading to the widespread BSOD incidents. This highlights a critical issue in software quality assurance (QA) processes, especially for updates that affect core system components.

How Is It Being Resolved?

Resolving this issue involves multiple steps. Initially, CrowdStrike pushed out a corrected update. However, systems that had already experienced the BSOD required more direct intervention. The recommended approach for affected computers is to reboot into safe mode, manually locate and delete the problematic files associated with the Falcon update, and then reboot the system.

For large-scale deployments, such as servers in data centers that may not have direct user interfaces, additional steps and possibly scripting are necessary to manage the recovery process. Furthermore, systems using security features like BitLocker require even more intricate procedures to recover.

Microsoft has also updated its recovery tools to assist IT administrators in expediting the repair process. These tools offer options like booting from a Windows Preinstallation Environment (WinPE) or recovering from safe mode to facilitate the removal of the faulty update.

Avoiding Future Incidents

To prevent such incidents in the future, enhanced QA processes for updates are crucial. This includes thorough testing of all components, not just the core software but also any dynamic updates. Additionally, reconsidering the operational mode of critical security software like Falcon might be necessary. Running such software in user mode rather than kernel mode could mitigate the risk of entire system failures, albeit potentially at the cost of some efficiency in malware detection.

The CrowdStrike incident of July 2024 serves as a stark reminder of the vulnerabilities inherent in our interconnected digital world. While the immediate causes of the incident have been addressed, it raises important questions about how to prevent similar occurrences in the future. Two critical strategies that can enhance overall security and resilience are the adoption of Secure by Design principles and the implementation of network segmentation. Let’s explore how these approaches can mitigate risks and potentially prevent incidents like the CrowdStrike disruption.

Secure by Design Principles

Secure by Design (SbD) is an approach that integrates security from the very beginning of the software development lifecycle. This principle ensures that security considerations are embedded into every stage of development, from initial design to deployment and maintenance. Here’s how SbD could have impacted the CrowdStrike incident:

Early Threat Modeling

Incorporating threat modeling at the design phase helps identify potential vulnerabilities and attack vectors. If CrowdStrike had implemented a thorough threat modeling process, it might have identified the risks associated with running their software in kernel mode (ring zero), where any failure could lead to a system-wide crash.

Code Review and Static Analysis

Regular code reviews and static analysis can catch bugs and vulnerabilities early in the development process. Comprehensive testing, including stress testing and failure mode analysis, could have identified the problematic update before it was released, preventing the blue screen of death (BSOD) incidents.

Continuous Integration and Continuous Deployment (CI/CD) with Security Checks

Integrating automated security checks into the CI/CD pipeline ensures that every code change is tested for security issues before deployment. This approach can significantly reduce the risk of deploying updates with critical vulnerabilities.

Network Segmentation

Network segmentation involves dividing a network into smaller, isolated segments to limit the spread of potential threats and contain breaches. This strategy can significantly enhance the security posture of an organization by minimizing the impact of security incidents. Here’s how network segmentation could have mitigated the effects of the CrowdStrike incident:

Isolation of Critical Systems

By isolating critical systems and services into separate network segments, organizations can prevent the spread of issues from less critical areas. For instance, if critical systems in hospitals or airlines had been segmented away from general-purpose user systems, the BSOD incidents might have been contained, reducing the overall impact.

Minimizing Attack Surfaces

Segmentation reduces the attack surface by limiting access to sensitive systems. If the CrowdStrike Falcon software had been deployed in a segmented manner, with its updates and communications restricted to a controlled environment, the faulty update might have been identified and contained before reaching all systems.

Improved Monitoring and Incident Response

Segmentation allows for more granular monitoring and quicker incident response. Security teams can focus their efforts on specific segments, making it easier to detect anomalies and take corrective actions. This could have sped up the identification and resolution of the faulty Falcon update.

By understanding these key aspects of the CrowdStrike incident, we can appreciate the complexity of maintaining secure and reliable systems in an increasingly interconnected world. Stay vigilant and informed to navigate these challenges effectively.

Reference: https://www.youtube.com/watch?v=2TfM_BF2i-I


Understanding AAA: Authentication, Authorization, and Accounting

Hello friends, today we’ll delve into the concepts of AAA in security. AAA stands for Authentication, Authorization, and Accounting. In this post, we’ll discuss what it means to implement AAA in a system or security policy, define these terms precisely, and provide examples of how AAA is achieved in various systems. We’ll also explore some related concepts to provide a comprehensive understanding.

Introduction to AAA

Authentication

Authentication is the process of verifying the identity of a subject attempting to access a system. It involves proving that the claimed identity of a subject, which can be a user or a service, is genuine. This process can involve various methods, including password verification, biometric checks, or database lookups. For a more detailed understanding, refer to Security Engineering by Ross Anderson (3rd Edition) .

Authorization

Authorization is the subsequent process that defines what an authenticated subject is allowed to do. Once the identity is verified, a set of rights or privileges is assigned to the user or service. These permissions dictate the actions that the subject can perform on certain resources or objects. To explore this further, see Computer Security: Art and Science by Matt Bishop .

Accounting

Accounting involves recording the actions performed by the subject and reviewing these records to ensure compliance and to hold subjects accountable for their actions. This process is crucial for tracking the use of resources and detecting any anomalies. For an in-depth look, refer to Security in Computing by Charles P. Pfleeger and Shari Lawrence Pfleeger (5th Edition) .

Detailed Breakdown of AAA

Identification

Identification is the claim made by a subject to be a specific identity. This could be a user claiming to be a particular individual or a service claiming to represent a specific function. The system responds to this claim by performing checks to validate the identity.

Authentication Process

During authentication, the system verifies the claimed identity by posing questions, checking credentials against a database, or using biometric methods. This ensures that the subject is who they claim to be. Authentication methods and their effectiveness are extensively covered in Applied Cryptography by Bruce Schneier .

Authorization Process

Authorization occurs after successful authentication. It involves assigning permissions to the subject, which dictate the resources and actions they are allowed to access or perform. This step is critical for maintaining security and ensuring that users have appropriate access levels. The principles of authorization are detailed in Access Control Systems: Security, Identity Management and Trust Models by Messaoud Benantar .

Auditing and Accounting

Auditing involves recording the actions performed by subjects within the system. This log of activities is crucial for later review. Accounting is the process of reviewing these logs to ensure compliance and detect any unauthorized activities. This distinction between auditing and accounting is highlighted in the CISSP Official (ISC)2 Practice Tests by Mike Chapple and David Seidl .

Monitoring

Monitoring involves actively looking into the audit logs, understanding them, and executing the process of accounting. It is possible to monitor a system without active auditing, but auditing cannot occur without some form of monitoring. This distinction is essential for effective security management. For further reading, consider The Practice of Network Security Monitoring: Understanding Incident Detection and Response by Richard Bejtlich .

Example Scenario

To illustrate these concepts, consider a user needing access to a computer terminal:

  1. Identification: The user claims their identity, such as by entering a username (e.g., RS123).
  2. Authentication: The system verifies this claim by checking the username against a database and requesting a password.
  3. Authorization: Once authenticated, the system assigns specific permissions to the user, such as access to certain drives or files.
  4. Auditing: The system records the user’s actions in a log file.
  5. Accounting: These logs are reviewed periodically to ensure compliance and detect any violations.

This example aligns with the best practices described in Network Security Essentials: Applications and Standards by William Stallings .

Conclusion

Understanding AAA—Authentication, Authorization, and Accounting—is fundamental for implementing robust security policies in any system. By correctly applying these concepts, organizations can ensure that users are properly identified, authenticated, and authorized, and that their actions are recorded and reviewed for compliance.

If you have any comments or suggestions to improve this content, please let me know. This is my first experiment with online tutoring, and I appreciate any feedback. Thank you very much for reading!


References

  1. Anderson, R. (2020). Security Engineering: A Guide to Building Dependable Distributed Systems. John Wiley & Sons.
  2. Bishop, M. (2003). Computer Security: Art and Science. Addison-Wesley.
  3. Pfleeger, C. P., & Pfleeger, S. L. (2015). Security in Computing. Pearson.
  4. Schneier, B. (1996). Applied Cryptography: Protocols, Algorithms, and Source Code in C. Wiley.
  5. Benantar, M. (2006). Access Control Systems: Security, Identity Management and Trust Models. Springer.
  6. Chapple, M., & Seidl, D. (2018). CISSP Official (ISC)2 Practice Tests. Sybex.
  7. Bejtlich, R. (2013). The Practice of Network Security Monitoring: Understanding Incident Detection and Response. No Starch Press.
  8. Stallings, W. (2017). Network Security Essentials: Applications and Standards. Pearson.