AWS Outage Exposes the Fragility of Cloud-Scale Dependence

The cloud disruption started in the US early Monday and rapidly expanded across the globe, impacting gaming, banking, and entertainment services.

MITSloan ME Editorial October 23, 2025 Reading Time: 3 Min

Topics

News

[Image source: Chetan Jha/MITSMR Middle East]

On October 20, the global cloud ecosystem faced a stark test of resilience. Websites timed out, dashboards froze, and workflows across logistics, streaming, and retail abruptly stalled as Amazon Web Services (AWS) suffered another major disruption, this time linked to what experts suspect was a Domain Name System (DNS)-level failure in the company’s US-East-1 region.

The outage rippled through the digital economy. Major consumer platforms including Snapchat, Duolingo, and Roblox were among the first to report downtime, while outage monitor Downdetector later noted issues extending to Grok, Lyft, and Hulu. Some applications remained offline for more than six hours, and more than 1,000 companies across sectors, from finance and healthcare to retail and entertainment, experienced cascading system failures.

Payment gateways stalled, internal dashboards froze, and customer-facing applications went dark, underscoring how interdependent cloud infrastructure has become.

According to Amazon’s preliminary updates, the issue originated in its US-East-1 region, a critical hub for global AWS traffic, and appeared to stem from a DNS failure, the core system that translates web addresses into IP connections. When DNS falters, service discovery fails, halting connectivity across dependent systems.

“The AWS outage echoes past incidents like the 2021 US-East-1 and even the 2016 Dyn DNS attack,” said Santiago Pontiroli, Lead TRU Researcher at Acronis. “If the root cause was indeed a DNS failure, it shows how fragile the internet can be when discovery and routing depend on a single control plane. Most affected services had no soft-degradation path, so instead of slowing down, they went dark. From a resilience and threat perspective, this is a reminder that infrastructure risks are not limited to cyberattacks. Hybrid and multi-cloud strategies, along with DNS isolation and designing for graceful failure, are now essential to keep critical services available.”

The outage reignited discussion about digital visibility, a persistent challenge for enterprises operating across complex, distributed ecosystems.

“Global incidents like this are a clear reminder of how dependent our world has become on software and digital systems operating as expected,” said Rob van Lubek, EMEA Vice President at Dynatrace. “When an outage occurs, the ripple effects can quickly spread across industries and into people’s daily lives. The difference between disruption and recovery often comes down to how fast an organisation can pinpoint what’s gone wrong, understand why, and act to restore service continuity.”

He added that as AI continues to reshape operations, maintaining visibility across dynamic cloud environments will be critical. The most resilient organisations, he said, will be those that can “see across their environment, anticipate risks, and adapt quickly when the unexpected happens.”

In its latest update, Amazon Web Services said it had fixed the underlying problem and that all services had “returned to normal operations.”

Still, the event leaves a lasting imprint on the conversation around cloud resilience. As enterprises and governments worldwide accelerate their migration and digital transformation strategies, this outage underscores a fundamental truth: the global economy increasingly depends on a small number of cloud regions and routing layers. A single DNS failure, buried deep within one provider’s infrastructure was sufficient to disrupt over a thousand companies for hours.

For leaders, the lesson is both technical and strategic. Resilience must evolve as fast as innovation. Building redundant architectures, adopting multi-cloud approaches, and ensuring proactive visibility into performance anomalies are no longer optional. They are the foundation for ensuring that when the next disruption comes, and it will, business continuity doesn’t depend on a single point of failure.

AWS Outage Exposes the Fragility of Cloud-Scale Dependence

Topics

News

Topics

About the Author

Tags:

Topics

Share