Hardest QA Fails of 2023

This time of year everyone likes to recap all the wonderful things that happened in the year. At Testery, we are lifelong students of complex technical systems and preventing failures. So instead, we're going to recap some of our favorite system and testing failures of 2023.

IKEA Pays $24M For Printing Credit Card Numbers on Receipts

It wasn't clear whether or not this happened due to requirements or by mistake, but earlier this year, Ikea got stuck paying a $24M settlement for printing too many digits of the credit card number on their receipts.

This is a helpful reminder that a well-seasoned product owners and testers should be familiar with the laws that apply to their companies and be on the lookout for features that don't respect those laws.

You Could Be Owed up to $60 From Ikea. Here’s How to Claim It
The Swedish furniture company has agreed to pay $24 million to settle claims it violated consumer protection laws.

Rough Year for Southwest Airlines

In December of 2022, Southwest Airlines faced a major disruption. While weather was a significant cause, other airlines weren't impacted. It's not clear the extent to which software was to blame, but industry experts suggest that "the company's uniquely complex flight coordination model and its antiquated internal scheduling systems" played a big factor.

Regardless, it's good reminder of what happens when complex systems fail big.

"The holiday disruption led to over 16,700 flight cancellations, costing the company more than $1 billion" (https://abcnews.go.com/Business/causing-flights-meltdown-southwest-airlines/story?id=95888949)

Four months later, however, Southwest faced another major issue which appeared to be more software-related in nature.

“Southwest has resumed operations after temporarily pausing flight activity this morning to work through data connection issues resulting from a firewall failure,” said Dan Landson, a spokesperson for Southwest Airlines. “Early this morning, a vendor-supplied firewall went down and connection to some operational data was unexpectedly lost.”

https://www.forbes.com/sites/suzannerowankelleher/2023/04/18/1700-delayed-southwest-flights-as-faa-lifts-national-ground-stop/?sh=67fbb05527f4

Data Dog $5M Outage

You have to give Data Dog a lot of credit for this one. They did a lot of things right. They were actually applying updates to their systems. They had rolling deployments. They had multiple regions.

The Pragmatic Engineer does a better job explaining what happened than I could ever hope to, so please read the article.

Inside DataDog’s $5M Outage (Real-World Engineering Challenges #8)
The observability provider was down for more than a day in March. What went wrong, how did the engineering team respond, and what can businesses learn from the incident? Exclusive.

ASUS Routers Stopped Routing

For several days this year, a configuration file change resulted in thousands of users losing their internet access for 48 hours.

"The mass outage, the company said, was the result of “an error in the configuration of our server settings file.”

It took 48 hours, but the mystery of the mass Asus router outage is solved
Asus finally responds after being castigated by users.

Janitor Turns Off Beeping Noise Causing $1M in Damages

Earlier this year, a janitor caused $1M in damages by unplugging a freezer that was constantly making a beeping sound.

It's a helpful reminder that the physical security of our devices has an impact on their ability to function properly. This holds true not only for freezers, but also for servers.

"A janitor cleaning in a laboratory at a university in Troy, New York, is accused of damaging at least $1 million in scientific research after shutting off the storage freezer while trying to turn off a constant beeping noise"

Janitor attempting to turn off beeping noise destroys decades of scientific research, causes $1M in damages
The Rensselaer Polytechnic Institute is suing a cleaning service after a janitor turned off a beeping sound on a super-cold freezer, leading to damage to research material.

U.S. Banks Fail To Process 850,000 Transactions On Time

Data masking is a helpful technique for protecting production data. In many systems, there are configuration files or switches that will turn on / off settings like whether or not the data is being masked.

"... instructions were sent to financial institutions 'with the account number and names of customers masked.'"

In this particular case, data masking was accidentally turned on in production — and probably not caught in QA because it was supposed to be on in QA — resulting in thousands of bank transactions not getting processed right away.

Be sure to keep a close eye on any settings that are different between dev, test, and production environments!

Why some US bank deposits are held up days after ‘processing error’ delayed 850,000 payments
The private company that processes many bank-to-bank electronic transfers said a ‘processing error’ last week led to payment delays on roughly 850,000 transactions.