Understanding the CrowdStrike Incident of July 2024

In July 2024, the digital world was rocked by a significant event: the CrowdStrike incident. In this blog post, we’ll delve into what happened, why it happened, and how the issue is being resolved. This incident, involving CrowdStrike’s Falcon software, caused disruptions to over 8 million Windows computers globally, impacting critical services and daily operations for millions. Let’s explore these aspects in detail.

What Happened?

On July 19, 2024, millions of Windows computers experienced the infamous “Blue Screen of Death” (BSOD). This event didn’t just affect individual users but had widespread ramifications, disrupting businesses, airlines, hospitals, and other critical services worldwide. As a result, many missed flights, appointments, and other important engagements, illustrating the extensive reach of this disruption.

The BSOD is a common indicator of severe system failure in Windows computers, often caused by critical errors at the kernel level, which is the core part of the operating system responsible for managing hardware and system resources.

Why Did It Happen?

To understand why this happened, we can use the analogy of a castle. Imagine a castle with multiple security layers: the outer perimeter (area one) and the innermost secure area (area zero). In a computer system, these areas are analogous to ring levels, with ring zero representing the most secure part of the system (kernel mode), where the operating system and critical drivers run, and ring one representing user mode, where applications operate.

CrowdStrike’s Falcon software, an advanced anti-malware solution, operates at ring zero. This high-level access allows it to effectively monitor and prevent malware but also means that any issue with Falcon can directly impact the core functions of the operating system.

On July 19th, a dynamic update to Falcon included an incorrect or corrupted file. Despite the Falcon software being certified by Microsoft’s Windows Hardware Quality Labs (WHQL), the update led to a critical failure. The incorrect file caused the Falcon driver, running in kernel mode, to malfunction, leading to the widespread BSOD incidents. This highlights a critical issue in software quality assurance (QA) processes, especially for updates that affect core system components.

How Is It Being Resolved?

Resolving this issue involves multiple steps. Initially, CrowdStrike pushed out a corrected update. However, systems that had already experienced the BSOD required more direct intervention. The recommended approach for affected computers is to reboot into safe mode, manually locate and delete the problematic files associated with the Falcon update, and then reboot the system.

For large-scale deployments, such as servers in data centers that may not have direct user interfaces, additional steps and possibly scripting are necessary to manage the recovery process. Furthermore, systems using security features like BitLocker require even more intricate procedures to recover.

Microsoft has also updated its recovery tools to assist IT administrators in expediting the repair process. These tools offer options like booting from a Windows Preinstallation Environment (WinPE) or recovering from safe mode to facilitate the removal of the faulty update.

Avoiding Future Incidents

To prevent such incidents in the future, enhanced QA processes for updates are crucial. This includes thorough testing of all components, not just the core software but also any dynamic updates. Additionally, reconsidering the operational mode of critical security software like Falcon might be necessary. Running such software in user mode rather than kernel mode could mitigate the risk of entire system failures, albeit potentially at the cost of some efficiency in malware detection.

The CrowdStrike incident of July 2024 serves as a stark reminder of the vulnerabilities inherent in our interconnected digital world. While the immediate causes of the incident have been addressed, it raises important questions about how to prevent similar occurrences in the future. Two critical strategies that can enhance overall security and resilience are the adoption of Secure by Design principles and the implementation of network segmentation. Let’s explore how these approaches can mitigate risks and potentially prevent incidents like the CrowdStrike disruption.

Secure by Design Principles

Secure by Design (SbD) is an approach that integrates security from the very beginning of the software development lifecycle. This principle ensures that security considerations are embedded into every stage of development, from initial design to deployment and maintenance. Here’s how SbD could have impacted the CrowdStrike incident:

Early Threat Modeling

Incorporating threat modeling at the design phase helps identify potential vulnerabilities and attack vectors. If CrowdStrike had implemented a thorough threat modeling process, it might have identified the risks associated with running their software in kernel mode (ring zero), where any failure could lead to a system-wide crash.

Code Review and Static Analysis

Regular code reviews and static analysis can catch bugs and vulnerabilities early in the development process. Comprehensive testing, including stress testing and failure mode analysis, could have identified the problematic update before it was released, preventing the blue screen of death (BSOD) incidents.

Continuous Integration and Continuous Deployment (CI/CD) with Security Checks

Integrating automated security checks into the CI/CD pipeline ensures that every code change is tested for security issues before deployment. This approach can significantly reduce the risk of deploying updates with critical vulnerabilities.

Network Segmentation

Network segmentation involves dividing a network into smaller, isolated segments to limit the spread of potential threats and contain breaches. This strategy can significantly enhance the security posture of an organization by minimizing the impact of security incidents. Here’s how network segmentation could have mitigated the effects of the CrowdStrike incident:

Isolation of Critical Systems

By isolating critical systems and services into separate network segments, organizations can prevent the spread of issues from less critical areas. For instance, if critical systems in hospitals or airlines had been segmented away from general-purpose user systems, the BSOD incidents might have been contained, reducing the overall impact.

Minimizing Attack Surfaces

Segmentation reduces the attack surface by limiting access to sensitive systems. If the CrowdStrike Falcon software had been deployed in a segmented manner, with its updates and communications restricted to a controlled environment, the faulty update might have been identified and contained before reaching all systems.

Improved Monitoring and Incident Response

Segmentation allows for more granular monitoring and quicker incident response. Security teams can focus their efforts on specific segments, making it easier to detect anomalies and take corrective actions. This could have sped up the identification and resolution of the faulty Falcon update.

By understanding these key aspects of the CrowdStrike incident, we can appreciate the complexity of maintaining secure and reliable systems in an increasingly interconnected world. Stay vigilant and informed to navigate these challenges effectively.

Reference: https://www.youtube.com/watch?v=2TfM_BF2i-I


Understanding Cryptography: A Comprehensive Overview

Cryptography might seem uninteresting or daunting if not properly introduced. For those not involved in networking, network security, or security engineering, this topic can be quite challenging. However, understanding cryptography is crucial in today’s digital world. Drawing from my own experience as an electronics and communication engineering graduate, I know that even with a technical background, grasping this topic takes time and effort.

In this blog post, I will decode cryptography and provide a comprehensive overview. This post will serve as a one-stop guide to understanding the fundamentals of cryptography, including symmetric and asymmetric cryptography, key wrapping, digital signatures, digital envelopes, and public key infrastructure (PKI). Due to the complexity and depth of the topic, I will cover these aspects across multiple posts.

Introduction to Cryptography

Cryptography is the art and science of securing information by transforming it into an unreadable format. The primary goal is to protect data confidentiality, integrity, and availability (CIA triad). To understand these concepts, let’s consider a simple scenario.

Imagine two users, A and B, who want to communicate securely over an insecure public network, such as the Internet. If an adversary, C, intercepts their communication, the confidentiality of the message is compromised. This is where encryption comes in. By encrypting the message, even if C intercepts it, they cannot read its contents without the decryption key.

Encryption: Ensuring Confidentiality

Encryption is a fundamental tool in cryptography used to maintain data confidentiality. It transforms plaintext (readable data) into ciphertext (unreadable data) using an encryption key. Only those with the corresponding decryption key can revert the ciphertext back to plaintext.

Example Scenario:
  1. Plaintext (M): The original message.
  2. Encryption: M is encrypted using an encryption key, resulting in ciphertext.
  3. Transmission: The ciphertext is sent over the insecure network.
  4. Decryption: The intended recipient uses the decryption key to convert the ciphertext back to plaintext.

In this scenario, encryption ensures that even if the message is intercepted by an unauthorized party, the confidentiality remains intact.

Key Concepts in Cryptography

  1. Symmetric Cryptography: Uses the same key for both encryption and decryption. Examples include AES (Advanced Encryption Standard) and DES (Data Encryption Standard).
  2. Asymmetric Cryptography: Uses a pair of keys—a public key for encryption and a private key for decryption. Examples include RSA (Rivest-Shamir-Adleman) and ECC (Elliptic Curve Cryptography).
  3. Key Wrapping: A technique to securely encrypt encryption keys.
  4. Digital Signatures: Provide authenticity and integrity by allowing the recipient to verify the sender’s identity and ensure the message has not been altered.
  5. Digital Envelopes: Combine symmetric and asymmetric encryption to provide efficient and secure message transmission.
  6. Public Key Infrastructure (PKI): A framework that manages digital certificates and public-key encryption to secure communications.

Practical Applications and Future Posts

In the next posts, we will dive deeper into these concepts and explore their practical applications. Understanding cryptography is essential for securing digital communications and protecting sensitive information from unauthorized access.

Stay tuned as we continue to unravel the complexities of cryptography. Best of luck with your CSSP exams. If you have any questions, comments, feedback, or suggestions, feel free to leave them below.

References

Books:

    • “Cryptography and Network Security: Principles and Practice” by William Stallings. This book provides a comprehensive introduction to the principles and practice of cryptography and network security.
    • “Applied Cryptography: Protocols, Algorithms, and Source Code in C” by Bruce Schneier. This book is a practical guide to modern cryptography and covers a wide range of cryptographic techniques and applications.

    Research Papers:

      • Diffie, W., & Hellman, M. (1976). “New Directions in Cryptography.” This seminal paper introduced the concept of public-key cryptography.
      • Rivest, R. L., Shamir, A., & Adleman, L. (1978). “A Method for Obtaining Digital Signatures and Public-Key Cryptosystems.” This paper introduced the RSA algorithm, a widely used asymmetric encryption technique.

      Articles:

        • “The History of Cryptography” by Paul M. Garrett. This article provides an overview of the historical development of cryptographic techniques.
        • “Understanding the CIA Triad” by Jonathan S. Weissman. This article explains the importance of confidentiality, integrity, and availability in information security.

        By leveraging these resources, you can gain a deeper understanding of cryptography and its essential role in securing modern communications.

        OTP tools and the risk of DLL Sideloading

        Recently i was doing some research around OTP softwares much like Google Authenticator or MS Authenticator and came across the topic of DLL Sideloading. Though this topic is quite old, i thought it is good to share me learning outcome.

        Okay, in simple terms, imagine you have a secret code that can open a magical door in a castle. But instead of keeping this code safe, you leave it lying around where someone naughty can find it. Now, that naughty person uses your code to open the magical door and sneak into the castle, causing mischief.

        In computer terms, a DLL (Dynamic Link Library) is like a special code that helps programs run smoothly. Now, a DLL Sideloading attack is when a sneaky person tricks a computer into using a bad DLL instead of the good one. Just like using the wrong key for the magical door, this bad DLL can let naughty things happen on the computer, like letting viruses or bad software sneak in. So, it’s important to keep our computer’s keys (DLLs) safe and not let any sneaky tricks happen!

        DLL sideloading is an attack technique where a malicious DLL (Dynamic Link Library) file is placed in a directory that is trusted or commonly accessed by a legitimate application. When the application runs, it inadvertently loads and executes the malicious DLL instead of the legitimate one.

        Reasons Why It Is Difficult to Deal With:

        1. Automatic Loading: The runtime DLL required for the one-time password (OTP) tool is automatically loaded by Windows, which means the system expects and trusts certain DLLs to be present and executable without user intervention.
        2. Fixed DLL Specification: The OTP tool does not allow the user to specify which DLLs to load, relying instead on default system behavior to find and load the necessary libraries.
        3. Security Environment: Ensuring that the device running the OTP tool is in an up-to-date security environment can reduce the risk. This includes maintaining the latest security patches, antivirus definitions, and security configurations.

        Mitigations:

        • Keep Software and OS Updated: Regularly update the operating system and all software to patch known vulnerabilities.
        • Antivirus/Antimalware Tools: Use reliable antivirus and antimalware tools to detect and remove malicious DLLs.
        • Application Whitelisting: Implement application whitelisting to prevent unauthorized DLLs from being loaded.
        • Directory Permissions: Restrict write permissions to directories where legitimate DLLs are stored to prevent unauthorized modifications.
        • Monitoring and Logging: Continuously monitor and log application behavior to detect and respond to abnormal DLL loading activities.

        The difference between path-based and signature-based DLL loading methods lies in how the operating system or application identifies and loads the required Dynamic Link Libraries (DLLs).

        Path-Based DLL Loading

        Description:

        • Method: The operating system or application loads a DLL based on its file path. This means the system will search for the DLL in specific directories in a predetermined order until it finds a matching file name.
        • Search Order: Typically, the search order might include the application’s directory, system directories (like System32), the Windows directory, and directories listed in the system’s PATH environment variable.
        • Risks: Path-based loading is susceptible to DLL hijacking or sideloading attacks. If a malicious DLL with the same name as a legitimate DLL is placed in a directory that is searched earlier in the order, the malicious DLL will be loaded instead of the legitimate one.

        Example: If an application needs a DLL called example.dll, it might look in:

        1. The application’s own directory.
        2. The system directory (e.g., C:\Windows\System32).
        3. The Windows directory (e.g., C:\Windows).
        4. Any directories listed in the PATH environment variable.

        Signature-Based DLL Loading

        Description:

        • Method: The operating system or application loads a DLL based on a digital signature that verifies the identity and integrity of the DLL. This involves using cryptographic methods to ensure that the DLL has not been tampered with and is from a trusted source.
        • Verification Process: The system checks the digital signature against a trusted certificate authority (CA). If the signature is valid and the DLL is from a trusted source, the DLL is loaded.
        • Advantages: This method enhances security by ensuring that only DLLs from trusted sources are loaded, mitigating risks from malicious or tampered DLLs.

        Example: An application might require a DLL to have a specific digital signature from a trusted CA. Before loading example.dll, the system checks its signature against the trusted CA. If the signature is valid and trusted, the DLL is loaded; otherwise, it is rejected.

        Comparison

        Path-Based DLL Loading:

        • Pros:
          • Simpler and faster, as it relies on the file path and name.
          • No need for complex verification processes.
        • Cons:
          • Vulnerable to attacks such as DLL hijacking or sideloading.
          • Relies heavily on the correct configuration of directory paths.

        Signature-Based DLL Loading:

        • Pros:
          • More secure as it ensures the integrity and authenticity of the DLL.
          • Reduces the risk of loading malicious or tampered DLLs.
        • Cons:
          • Requires a valid digital signature and access to a trusted CA.
          • Slightly more complex and resource-intensive due to the need for cryptographic verification.

        Mitigation Strategies

        To mitigate the risks associated with path-based DLL loading:

        • Use Absolute Paths: Specify absolute paths to DLLs whenever possible to avoid ambiguity.
        • Directory Permissions: Secure directories by restricting write permissions to prevent unauthorized placement of malicious DLLs.
        • Application Whitelisting: Implement whitelisting to allow only known and trusted DLLs to be loaded.

        For signature-based DLL loading:

        • Regular Updates: Ensure that certificates and signatures are kept up-to-date.
        • Trusted Sources: Only use DLLs from trusted and verified sources.
        • Monitor and Audit: Regularly monitor and audit DLL usage and loading processes to detect any anomalies.

        By understanding and implementing these methods appropriately, organizations can significantly enhance their application’s security against DLL-related threats.

        If laptops are secured and properly controlled for antivirus and patches, the likelihood of exploitation through DLL sideloading vulnerabilities is significantly reduced. However, it is essential to understand that while these measures provide a robust defense, they do not entirely eliminate the risk. Here’s why:

        Factors Reducing the Risk

        1. Antivirus and Antimalware Protection:
          • Real-Time Protection: Modern antivirus and antimalware solutions offer real-time protection that can detect and block known malicious DLLs before they can be executed.
          • Heuristic Analysis: These tools use heuristic and behavioral analysis to detect suspicious activities that might indicate a DLL sideloading attempt, even if the specific malware is not in their signature database.
        2. Regular Patching and Updates:
          • Operating System Updates: Regularly updating the operating system ensures that known vulnerabilities, including those that might facilitate DLL sideloading, are patched.
          • Application Updates: Keeping applications up-to-date helps close security loopholes that could be exploited by malicious DLLs.
        3. Controlled Environment:
          • Restricted Administrative Access: Limiting administrative privileges can prevent unauthorized installation of malicious software that might place a malicious DLL in the system.
          • Application Whitelisting: Implementing application whitelisting can ensure that only approved and trusted applications and their DLLs are executed.

        Remaining Risk Factors

        1. Zero-Day Exploits:
          • Unknown Vulnerabilities: Even with up-to-date systems and antivirus software, zero-day vulnerabilities (previously unknown security flaws) can be exploited by sophisticated attackers to bypass these defenses.
        2. User Behavior:
          • Phishing and Social Engineering: Users might inadvertently download and execute malicious files if they are tricked by phishing attacks or other forms of social engineering.
        3. Sophisticated Malware:
          • Advanced Persistent Threats (APTs): Some malware is specifically designed to evade detection by antivirus software and can employ advanced techniques to achieve DLL sideloading.

        Overall Likelihood

        Given the strong security measures in place (antivirus, patches, controlled environment), the likelihood of exploitation through DLL sideloading is low but not zero. The effectiveness of these measures largely depends on their consistent and proper implementation.

        Mitigations to Further Reduce Risk

        • Enhanced Monitoring: Implementing advanced endpoint detection and response (EDR) tools can provide deeper insights into system activities and potential threats.
        • User Education: Regular training for users on recognizing phishing attempts and other social engineering tactics can reduce the likelihood of accidental malware execution.
        • Regular Security Audits: Conducting periodic security audits can help identify and mitigate potential vulnerabilities that might have been overlooked.

        By maintaining a vigilant and layered security approach, the risk of DLL sideloading exploitation can be minimized to a very low level.

        Optus Outage Incident – Root Cause Analysis

        There were four breaches, one hacking and the recent outage believed to be some configuration mishap while doing a software upgrade, all in past 5 years making big news for Optus (see reference1-5). Around 4.05am on Wednesday, November 8, 2023, Optus experienced a widespread service outage, affecting a significant number of its customers. The disruption impacted various services, including mobile data, internet, and voice calls, leaving users frustrated and businesses grappling with operational challenges. The outage not only underscored the importance of robust telecommunications infrastructure but also shed light on the vulnerabilities that can arise in even the most advanced networks.

        This pose a question, what makes a big giant so vulnerable to Cybersecurity?

        Big telecommunication companies can be vulnerable to cyber attacks due to various factors. Some of the key reasons include:

        1. Complex Networks: Telecommunication companies typically have complex and extensive networks with numerous interconnected systems. This complexity can create vulnerabilities, and managing such vast networks can be challenging, making it easier for attackers to find and exploit weaknesses.
        2. Interconnected Infrastructure: Telecommunication systems rely on interconnected infrastructure, including routers, switches, and other critical components. If one part of the infrastructure is compromised, it can potentially impact the entire network, leading to widespread disruptions.
        3. Dependence on Technology: Telecommunication companies heavily rely on technology to provide their services. This dependence on technology means that any vulnerabilities in the underlying software or hardware can be exploited by cyber attackers to gain unauthorized access or disrupt services.
        4. High-Value Targets: Due to the critical nature of their services, telecommunication companies are attractive targets for cybercriminals, hacktivists, or even state-sponsored attackers. Disrupting telecommunications services can have significant economic and social consequences, making these companies high-value targets.
        5. Data Sensitivity: Telecommunication companies handle vast amounts of sensitive customer data, including personal information and communication records. This makes them attractive targets for cybercriminals seeking to steal and exploit valuable data for financial gain or other malicious purposes.
        6. Increasing Connectivity: As telecommunication networks become more integrated with other industries and technologies (such as the Internet of Things), the attack surface for potential threats expands. This increased connectivity can expose telecommunication companies to new and evolving cyber threats.
        7. Legacy Systems: Some telecommunication companies may still be using legacy systems that were implemented before the current cybersecurity landscape evolved. These older systems might have known vulnerabilities that have not been adequately addressed or patched, making them susceptible to attacks.
        8. Supply Chain Risks: Telecommunication companies often rely on a complex supply chain for hardware and software components. If any of these components have vulnerabilities, it can introduce risks into the overall system, especially if security measures are not rigorously enforced throughout the supply chain.
        9. Human Factors: Insider threats or human error can also contribute to vulnerabilities. Employees with access to critical systems may inadvertently introduce security risks through actions such as falling for phishing attacks, using weak passwords, or mishandling sensitive information.

        To mitigate these vulnerabilities, telecommunication companies must invest in robust cybersecurity measures, conduct regular risk assessments, stay updated on the latest threats, and implement best practices for network security. This includes employee training, regular system patching and updates, and the adoption of advanced security technologies.

        We believe Optus and like companies are aware and abreast of all measures it should take to safeguard against listed vulnerabilities to cyber attack. Most organisations now a days invest heavily on tools and technologies. What else is important?

        Cybersecurity program to my opinion is like a big aircraft (or more) ready to land to an airport. We should equally focus on the runway and related on ground safety. In an organisation it translate to a focused leadership and efficient management. No matter how sophisticated tools and technology we deploy, unless we have a leadership foreseeing challenges and efficient management stack to make best use of deployed tools and technologies, there will still exist a gap, no matter how small it is, when compromised will result in big losses.

        Potential Root Causes of the Outage: Though Optus announced this to be a software upgrade failure, it is hard to believe so. Primary reason for my disagreement over such a conclusion is the span of outage. The outage was for voice, text and internet. It is highly unlikely that any one upgrade will touch all these three domains which are domain-isloated with layer-2 and layer-3 redundancies. Following broad conclusion can be drawn.

        1. Technical Glitch or Human Error? The first question on everyone’s mind during a network outage is whether it was caused by a technical glitch or human error. Optus, like any other telecommunications giant, relies on a complex network of hardware, software, and personnel to keep its services running smoothly. Initial investigations suggested that the outage might have originated from a technical malfunction in one of the critical components of the network. However, the possibility of human error, such as misconfigurations or oversight during routine maintenance, cannot be ruled out.
        2. Network Overload and Capacity Issues: With the ever-increasing demand for data and connectivity, telecommunications networks face the constant challenge of expanding their capacity to meet user needs. The Optus outage could have been exacerbated by a sudden surge in network traffic or an unexpected overload on specific components, causing a strain on the infrastructure.
        3. Security Concerns: In an era where cybersecurity threats are on the rise, the outage raised questions about the role of security in safeguarding critical infrastructure. While initial reports did not indicate a cyberattack, the incident prompted a reassessment of the security measures in place to protect against potential threats that could compromise the network’s integrity.
        4. Supply Chain Vulnerabilities: Telecommunications providers often rely on a vast supply chain for their equipment and software. The outage might have been linked to vulnerabilities in components supplied by third-party vendors, highlighting the importance of rigorous vetting and security protocols throughout the supply chain.

        Learning from the Outage: The Optus outage serves as a wake-up call for both telecommunications providers and consumers. It emphasizes the need for continuous investment in robust infrastructure, regular system audits, and comprehensive cybersecurity measures. As technology evolves, so do the challenges, and proactive steps must be taken to stay ahead of potential disruptions.

        Conclusion: The recent Optus outage is a stark reminder that even industry giants are not immune to technical hiccups and unexpected disruptions. As we navigate the intricate web of modern telecommunications, it becomes imperative for providers to prioritize resilience, security, and adaptability in the face of an ever-changing digital landscape. Only through continuous improvement and investment in cutting-edge technologies can we hope to build a telecommunications infrastructure that stands the test of time.

        Reference:

        1. https://www.cyberdaily.au/commercial/9263-deja-vu-optus-suffers-data-breach-from-major-cyber-attack
        2. https://www.itnews.com.au/news/optus-cyber-attack-exposes-customer-information-585567
        3. https://itwire.com/security/optus-hit-by-huge-data-breach,-up-to-9m-customers-claimed-affected.html
        4. https://www.databreaches.net/au-optus-under-investigation-for-white-pages-privacy-breach/
        5. https://www.smh.com.au/business/companies/i-could-access-everything-optus-customers-worried-after-logging-in-as-vladmir-20190214-p50xx6.html