☁️ If It Can Happen to AWS, It Can Happen to Anyone This week’s AWS outage reminded everyone that no cloud provider is immune to disruption. Whether your environment runs on AWS, Azure, or Google Cloud, the same rule applies: availability is a shared responsibility. Many teams assume that moving to the cloud automatically means redundancy. It doesn’t. You have to design it into your architecture. Here are a few takeaways for Azure environments: ✅ Use Availability Zones and paired regions. Azure offers zone-redundant services for a reason. If your workloads sit in one zone, you are one incident away from downtime. ✅ Plan for regional failover. Test how long it would take to shift from East US to Central US if one region fails. Document it, automate it, and test it regularly. ✅ Design for isolation. Separate production, DR, and critical services across regions so that one outage cannot take down your entire environment. ✅ Monitor your dependencies. Even if your app stays online, a third-party service or Azure dependency in another region can still cause disruption. Cloud outages are not rare events anymore. They are reminders that resilience belongs to the architect, not the provider. #Azure #CloudComputing #Infrastructure #DisasterRecovery #Resilience #TechLeadership #BusinessContinuity #ReliabilityEngineering #CloudArchitecture
Shawn D.’s Post
More Relevant Posts
-
AWS went down. 𝗧𝗵𝗲 𝗿𝗲𝗮𝗹 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 𝗶𝘀 𝘄𝗵𝘆 𝘀𝗼 𝗺𝗮𝗻𝘆 𝘄𝗲𝗻𝘁 𝗱𝗼𝘄𝗻 𝘄𝗶𝘁𝗵 𝗶𝘁. Here are the facts: yesterday’s AWS outage hit the US East 1 region, disrupting millions of users across industries. It was not global, but it exposed a bigger issue. Too many systems rely on a single region with no redundancy. Every provider has outages. 𝗗𝗼𝘄𝗻𝘁𝗶𝗺𝗲 𝗶𝘀 𝗶𝗻𝗲𝘃𝗶𝘁𝗮𝗯𝗹𝗲 𝘁𝗼𝗼. The difference is how well you prepare for it. At EIS, we help clients define what acceptable unplanned downtime looks like based on their risk tolerance, then build the strategies and architecture to meet that goal. On Azure, that means multi region, hybrid, and failover setups that keep operations running when others go dark. 𝗛𝗼𝘄 𝗰𝗼𝗻𝗳𝗶𝗱𝗲𝗻𝘁 𝗮𝗿𝗲 𝘆𝗼𝘂 𝘁𝗵𝗮𝘁 𝘆𝗼𝘂𝗿 𝘀𝘆𝘀𝘁𝗲𝗺𝘀 𝗰𝗼𝘂𝗹𝗱 𝘀𝘁𝗮𝘆 𝗼𝗻𝗹𝗶𝗻𝗲 𝗶𝗳 𝘆𝗼𝘂𝗿 𝗽𝗿𝗶𝗺𝗮𝗿𝘆 𝗿𝗲𝗴𝗶𝗼𝗻 𝘄𝗲𝗻𝘁 𝗱𝗼𝘄𝗻 𝘁𝗼𝗺𝗼𝗿𝗿𝗼𝘄? Reliability is not promised by a platform. It is engineered by design. #Azure #Cloud #Resilience #DigitalTransformation #AWS
To view or add a comment, sign in
-
-
🌧️ It’s a rainy cloud today. This morning, AWS services in the US-EAST-1 region went down. It took about three hours to get the main DNS issue under control, with some services still showing impact closer to the eight-hour mark. Let’s put that in perspective - 99% uptime = about 3 days of downtime per year - 99.9% uptime = about 8.76 hours - 99.99% uptime = about 52 minutes, 36 seconds If you’re paying for services with 99% or 99.9% availability, this level of downtime is technically within acceptable range. Some businesses were affected largely, some not so much. But if your business can’t afford 3 -9 hours of outage, then you should be asking: 1. Are you relying on a single region? 2. Do you have multi-AZ or multi-region failover configured? 3. Are your SLAs and architecture aligned with your actual business risk? These are some of the questions I always pose to my clients when helping them prepare for outages like this. If you’d like to discuss how to build resilience into your cloud setup, especially for high-availability in AWS or Azure, let's connect, I’d be happy to help. #AWS #CloudArchitecture #HighAvailability #BusinessContinuity #ITLeadership #RiskManagement
To view or add a comment, sign in
-
Major AWS Outage Resolved: A Reminder on Cloud Resilience ⚠️☁️ A significant AWS outage impacted major services and websites yesterday, causing widespread connectivity issues. Here’s a quick breakdown of what happened: 🛑 The problem originated in the US-EAST-1 region, a core part of AWS's global infrastructure. 🔧 It was linked to AWS's Lambda service and its API, which affected a vast number of dependent applications and services. 🌐 Many popular online platforms experienced errors, demonstrating how interconnected our digital ecosystem has become. ✅ AWS engineering teams identified the root cause and restored full service after several hours. This event serves as a powerful reminder that even the most robust cloud infrastructures are not infallible. How is your organization building resilience to mitigate the impact of a single cloud provider's outage? 🤔 #CloudComputing #AWS #Outage #DigitalTransformation #BusinessContinuity #TechNews Link:https://lnkd.in/dfUagN9d
To view or add a comment, sign in
-
-
Two Unrelated Outages, One Clear Message: Build for Resilience In the past two weeks, both AWS and Microsoft Azure, the two largest players in cloud infrastructure, experienced significant outages: * October 20: AWS’s US-East-1 region went down for several hours due to an internal network monitoring malfunction. * October 29: Microsoft Azure’s “Front Door” service was disrupted by a configuration change that affected portal access and latency across multiple regions. These events were unrelated: separate causes, separate systems but the takeaway is the same: our digital world is highly dependent on a small number of infrastructure providers. When either experiences issues, the effects ripple across industries. This is a reminder to prioritize operational resilience as much as innovation: * Architect for redundancy and failover, not just scalability. * Understand your dependency map both direct and indirect. * Test recovery processes regularly, not just document them. Cloud remains the foundation of digital transformation, but resilience is its cornerstone. Question for peers: How are you strengthening resilience in your cloud strategy? multi-region, multi-cloud, or otherwise? #Leadership #CloudStrategy #Resilience #AWS #Azure #DigitalTransformation #Infrastructure #EnterpriseIT #BusinessContinuity
To view or add a comment, sign in
-
-
AWS us-east-1 was having issues again today and it’s another reminder why centralizing the entire internet in one region is a terrible idea. Too many organizations still treat us-east-1 as “the cloud,” when in reality it’s just a region. When it sneezes, half the internet catches a cold. This isn’t new outages in that region have taken down major services time and time again. The lesson is simple: ☁️ Redundancy isn’t optional. 🌎 Regions exist for a reason. Personally, I deploy workloads to other AWS regions, and in some cases, across multiple providers entirely. At the very least, have redundancy outside of us-east-1 so your infrastructure doesn’t go dark when everyone else’s does. Cloud reliability isn’t just about uptime, it’s about resilience. CNN Article here: https://lnkd.in/g_9_ei7H #AWS #CloudComputing #DevOps #Infrastructure #HighAvailability #Resilience #Engineering #CloudArchitecture
To view or add a comment, sign in
-
The Real Risk Behind the AWS Outage When AWS goes down, the world stops — and that’s the problem. This outage wasn’t just a DNS failure. It was risk concentration. Too many businesses assume cloud redundancy within one provider equals resilience. It doesn’t. AWS, Azure, and Google Cloud are incredible platforms, but they’re still single ecosystems. If your entire stack lives under one cloud’s control plane, you’ve accepted a shared risk that’s not shared — it’s inherited. Yes, multi-cloud or cross-region design costs more. Yes, it adds complexity. But so does downtime when customers can’t transact and operations halt. True resilience isn’t about who hosts your systems — it’s about how you design for failure. Spread workloads across providers or regions so one outage doesn’t become your company’s headline. At TDY IT, we help organizations understand that difference. Because understanding your IT means understanding where your real risks live. #AWS #Azure #GoogleCloud #CloudResilience #RiskManagement #TDYIT #UnderstandingIT
To view or add a comment, sign in
-
-
Before the cloud, outages came from backbone routing failures, bad BGP updates, or software patches that didn’t go as planned. Ask anyone who remembers the 1996 Sprint BGP meltdown — one bad routing table and half the Internet went dark. Different decade, same problem. Azure authentication failures, Google Cloud DNS issues, the CrowdStrike patch event, and now AWS. The faces change, but the fragility doesn’t. This isn’t about blaming providers. It’s about understanding how much control — and risk — we’ve centralized. When redundancy and responsibility live in the same place, it’s only a matter of time. TDY IT captured it perfectly: the risk isn’t downtime — it’s dependency. #AWS #Azure #GoogleCloud #CrowdStrike #RiskManagement #TDYIT #UnderstandingIT #Networking #BGP
The Real Risk Behind the AWS Outage When AWS goes down, the world stops — and that’s the problem. This outage wasn’t just a DNS failure. It was risk concentration. Too many businesses assume cloud redundancy within one provider equals resilience. It doesn’t. AWS, Azure, and Google Cloud are incredible platforms, but they’re still single ecosystems. If your entire stack lives under one cloud’s control plane, you’ve accepted a shared risk that’s not shared — it’s inherited. Yes, multi-cloud or cross-region design costs more. Yes, it adds complexity. But so does downtime when customers can’t transact and operations halt. True resilience isn’t about who hosts your systems — it’s about how you design for failure. Spread workloads across providers or regions so one outage doesn’t become your company’s headline. At TDY IT, we help organizations understand that difference. Because understanding your IT means understanding where your real risks live. #AWS #Azure #GoogleCloud #CloudResilience #RiskManagement #TDYIT #UnderstandingIT
To view or add a comment, sign in
-
-
📣 AWS Outage: It's time to rethink cloud resilience The recent outage in the AWS US‑East region wasn’t just another downtime event — it was a wake‑up call for every tech leader who builds on the cloud. Our latest blog post, “AWS US‑East Outage: A Wake‑Up Call for Cloud Resilience”, dives into the implications and what must change now. 👉 What technology leaders should be doing now: 🔹Raise your infrastructure observability game. Don’t wait until you’re reading a vendor blog post about the outage — see it coming. 🔹Automate responses. When your cloud provider goes dark, you need fallback paths already in motion. 🔹Don’t build all your apps in one cloud region or with one provider — architect for portability and multi‑region/multi‑cloud. 🔹Map out all your dependencies — SaaS, cloud, undetected service chains. Hidden concentration = hidden risk. Outages like this one aren’t just an operational headache — they’re strategic inflection points. Get ready to move from “I hope the cloud stays up” to “I know the cloud will keep us up.” 🔗 Read the blog for more tips on what you can, and should, do now here: https://forr.com/4njlHZD #CloudResilience #AWS
To view or add a comment, sign in
-
-
The AWS outage is a perfect example of why you never put all your eggs in one basket. Too many companies rely on one cloud, one region, one provider, and then act surprised when everything goes down. I’ve been saying this for years: resilience isn’t about uptime promises, it’s about smart design. I’ve worked across Microsoft 365, Azure, and hybrid environments and I’ve seen what happens when there’s no backup plan. One DNS issue or regional failure can knock out entire systems if there’s no redundancy. That’s why I always push for: • Multi-cloud or hybrid setups • Cross-region failover • Automation and regular testing instead of blind trust in SLAs The cloud is great, but it’s not magic. If you don’t build for failure, you’re eventually going to experience it. #AWS #Azure #Cloud #Intune #Automation #EndpointManagement #Architecture #IT
To view or add a comment, sign in
Director of Sales | GTS Technology Solutions
1moWell said, Shawn