Ten most expensive bugs in history (part 2)

Check out more stories and lessons about the costliest software errors of our era

June 19, 2024

Testing

In our previous article, we analyzed some of history's most expensive software bugs, examining their impact and the lessons learned, in chronological order. Now, let's check out more notorious software errors that have happened in the last decade or so and cost companies and users a fortune, often changing the industry long-term.

 

1. From heartbeat to Heartbleed (2014)

OpenSSL is a widely used open-source technology that provides security for websites by encrypting the data exchanged between a user's browser and the website, ensuring that sensitive information remains private. It is fundamental to internet security, and it’s used by millions of websites to safeguard online transactions and user data.

The Heartbleed bug was caused by an error in the OpenSSL cryptography library that required urgent fixes across countless systems worldwide, costing hundreds of millions of dollars in the process.

It was all caused by a programming error in the implementation of the TLS heartbeat extension, which is designed to keep a communication channel open even when no data is being transmitted. Due to an oversight in code, an attacker could send a heartbeat request that would trick the server into responding with up to 64KB of its memory contents, which could be repeated multiple times to collect more data. This memory could contain random nonsense, but also all sorts of sensitive data – usernames, passwords, or encryption keys. (For a short, funny, and accurate depiction of what exactly happened, check out this great comic)

The Heartbleed bug was resolved by releasing a patch for the vulnerability in OpenSSL. However, since system admins, website owners, and service providers had no way of knowing if their keys had already been compromised (unless their websites had already been attacked), most of them decided to revoke and replace their certificates to get new keys.

Total costs of the bug are estimated at around $500 million. Apart from reissuing certificates, expenses included legal settlements, fines, damage control, labor costs associated with the urgent need to audit networks for signs of exploitation, a rush of security upgrades, and the deployment of more reliable encryption technologies. The Heartbleed bug notably led to numerous security breaches, such as the theft of personal data from up to 4.5 million patients at Community Health Systems, which is a perfect example of its massive real-world implications.

2. Catastrophic data breach at Equifax (2017)

One of the most expensive data breaches in history, the Equifax incident exposed the personal information of approximately 147 million people, leading to widespread identity theft and fraud concerns.

Since we've covered this incident extensively in some of the earlier articles, we won't go into it again here. You can find out more about the Equifax breach, what caused it and how it could’ve been avoided here and here.

 

3. Wannacry attacks and the Eternal Blue exploit (2017)

Now we’re entering what looks like action movie domain, with some elements of the story resembling a Hollywood scenario more than a story about a software bug. The U.S. national security agency (NSA) developed a powerful cyber weapon– Eternal Blue, based on a Microsoft security vulnerability that they chose not to disclose to Microsoft, but rather use it for their strategic objectives.

The story takes a dramatic turn when Eternal Blue is stolen by the Shadow Brokers hacker group in 2016. Only then did the NSA alert Microsoft, and this move drew criticism from some very notable figures, including Edward Snowden.

Microsoft acted quickly, developing a patch to fix the vulnerability, but for many, that was already too late. Trouble started in May 2017, when a group of hackers used the exploit to launch the infamous WannaCry ransomware attack. This malware encrypted data on affected computers worldwide and demanded ransom for decryption.

Even though Microsoft released the patch, many users didn’t have time or couldn’t be bothered to quickly update their systems, making the impact of the ransomware more severe. The situation could’ve been even worse if a cybersecurity researcher, Marcus Hutchins, hadn’t discovered a "kill switch” on the very first day of attacks. By registering a specific domain that the malware queried, Hutchins found that it stopped the spread of WannaCry. This domain acted as a check; if the malware could not connect to it, it would activate. Registering the domain effectively stopped the malware from inflicting further damage.

However, systems that were already infected remained locked. The data on these machines could not be decrypted without meeting the ransom demands, leaving many users at a loss.

The financial damage from WannaCry was significant but very hard to calculate, with the best estimates ranging from $4 billion to $8 billion. The malware affected more than 200,000 computers in 150 countries, but even more importantly, the incident highlighted two critical realities in the digital age. Firstly, state actors need to show maximum accountability and transparency, particularly when their decisions have extensive social and economic consequences.

Secondly, it highlighted the key importance of regular system updates. Sometimes, the simple act of updating software makes all the difference - in a world where digital security is non-negotiable, staying updated is not just recommended; it's crucial.

4. Moving up a gear: the NotPetya attacks (2017)

2017 was a tough year for cybersecurity experts around the world, marked by the far-reaching consequences of the EternalBlue exploit. Another highly destructive malware, NotPetya, used the very same exploit as Wannacry, but made it even more dangerous and viral.

The initial attack vector was the Ukrainian tax preparation software called M.E.Doc, whose servers were hacked to distribute NotPetya under the guise of a legitimate software update. It encrypted the master boot record (MBR) of infected computers, making them unbootable and effectively locking users out of their systems. However, unlike typical ransomware, NotPetya did not actually allow for data recovery, even if the ransom was paid.

Systems that had timely installed the Microsoft patch to fix the EternalBlue vulnerability (initially released to combat WannaCry) seemed largely protected at first. However, NotPetya had other methods of spreading that could compromise even patched systems.

NotPetya was designed to use multiple techniques to propagate within networks. This means that even if a network contained machines that were patched against the vulnerability, NotPetya could still spread to those machines if it managed to infect an unpatched machine within the same network. Networks were not entirely safe unless all machines were patched.

Given the extent of damage and the fact that it was inflicted even after ransoms were paid, the motive behind NotPetya appears to extend beyond financial gain. Since the initial target was the Ukrainian tax preparation software, the NotPetya attacks are often connected to the Russian military and political authorities.

The “collateral damage” was enormous. Several multinational companies, such as Merck, Maersk, and FedEx subsidiary TNT Express, were affected by the breach. Only in Maersk, NotPetya destroyed all 49,000 end-user devices, including laptops and printers, leaving thousands of applications inaccessible and thousands of servers inoperable. The total estimated global cost of the damage approaches $10 billion.

5. Boeing 737 MAX crisis and recall (2018-19)

Now, after a string of thriller-like scenarios where government agencies, militaries, and cybercriminals go against each other to pursue financial and political gains, let’s go back to the good old engineering blunders.

The Boeing 737 MAX software issue was centered around the Maneuvering Characteristics Augmentation System (MCAS), a flight control system designed to improve the stability of the airplane so that it feels and flies like other 737s. The MCAS was introduced to compensate for the plane's larger engines, positioned further forward and higher on the wing compared to previous models, affecting the aircraft's aerodynamic behavior.

But the MCAS had a critical flaw: it could be activated by a single angle of attack (AoA) sensor reading. Without going into too many details, it basically meant that the system relied on data from these sensors to assess whether the nose of the aircraft was at too high of an angle, which could lead to a stall. If just one of the AoA sensors falsely indicated a high angle, the MCAS would erroneously lower the nose of the plane to prevent a stall.

On two different occasions, after receiving incorrect data from a broken AoA sensor, the MCAS mistakenly indicated that the nose of the aircraft was too high, leading it to forcefully and repeatedly push the nose down. Pilots struggled to regain control as the system persistently overrode their inputs. This ended in two tragic accidents, resulting in a total of 346 deaths.

The response was the worldwide grounding of the Boeing 737 MAX fleet for nearly two years, extensive investigations, and calls for reforms in the aircraft certification process. Boeing faced an estimated $20 billion in financial losses, with more than $60 billion dollars of additional lost sales due to more than 1,000 canceled orders. To address the issue, Boeing implemented several changes to the MCAS after an extensive review and testing process before it was allowed to fly again in late 2020.

Facing constant digital threats, companies need to be vigilant about their business software’s stability and reliability. Serious issues can come out of seemingly “nowhere”, the consequences can be truly devastating, and the best way to avoid disasters is to truly prioritize software quality.

Drop us a line to find out how our software testing services can help.