Challenges in Disaster Recovery Planning

Explore top LinkedIn content from expert professionals.

Summary

Disaster recovery planning involves preparing strategies to restore critical systems, data, and business functions after disruptions like natural disasters, cyberattacks, or technical failures. While essential for business continuity, organizations often face challenges in ensuring readiness and resilience when disaster strikes.

  • Test your recovery plan: Regularly simulate incidents, including complete failures, to identify gaps and improve your disaster recovery strategies before an actual crisis occurs.
  • Prioritize data redundancy: Protect your most critical data by creating backups across multiple locations and test restoration processes to guarantee accessibility during downtimes.
  • Prepare for partial disruptions: Plan for scenarios where systems may only operate at reduced capacity and develop operational strategies to maintain essential functions under degraded conditions.
Summarized by AI based on LinkedIn member posts
  • View profile for Hiren Dhaduk

    I empower Engineering Leaders with Cloud, Gen AI, & Product Engineering.

    8,893 followers

    Your cloud provider just went dark. What's your next move? If you're scrambling for answers, you need to read this: Reflecting on the AWS outage in the winter of 2021, it’s clear that no cloud provider is immune to downtime. A single power loss took down a data center, leading to widespread disruption and delayed recovery due to network issues. If your business wasn’t impacted, consider yourself fortunate. But luck isn’t a strategy. The question is—do you have a robust contingency plan for when your cloud services fail? Here's my proven strategy to safeguard your business against cloud disruptions: ⬇️ 1. Architect for resilience  - Conduct a comprehensive infrastructure assessment - Identify cloud-ready applications - Design a multi-regional, high-availability architecture This approach minimizes single points of failure, ensuring business continuity even during regional outages. 2. Implement robust disaster recovery - Develop a detailed crisis response plan - Establish clear communication protocols - Conduct regular disaster recovery drills As the saying goes, "Hope for the best, prepare for the worst." Your disaster recovery plan is your business's lifeline during cloud crises. 3. Prioritize data redundancy - Implement systematic, frequent backups - Utilize multi-region data replication - Regularly test data restoration processes Remember: Your data is your most valuable asset. Protect it vigilantly. As Melissa Palmer, Independent Technology Analyst & Ransomware Resiliency Architect, emphasizes, “Proper setup, including having backups in the cloud and testing recovery processes, is crucial to ensure quick and successful recovery during a disaster.” 4. Leverage multi-cloud strategies - Distribute workloads across multiple cloud providers - Implement cloud-agnostic architectures - Utilize containerization for portability This approach not only mitigates provider-specific risks but also optimizes performance and cost-efficiency. 5. Continuous monitoring and optimization - Implement real-time performance monitoring - Utilize predictive analytics for proactive issue resolution - Regularly review and optimize your cloud infrastructure Remember, in the world of cloud computing, complacency is the enemy of resilience. Stay vigilant, stay prepared. P.S. How are you preparing your organization to handle cloud outages? I would love to read your responses. #cloud #cloudmigration #cloudstrategy #simform PS. Visit my profile, Hiren, & subscribe to my weekly newsletter: - Get product engineering insights. - Catch up on the latest software trends. - Discover successful development strategies.

  • View profile for Tom Le

    Unconventional Security Thinking | Follow me. It’s cheaper than therapy and twice as amusing.

    9,825 followers

    I think 90% of companies couldn't pull their own plug.  Here are some ideas... ICYMI, Co-op avoided a more severe cyber attack by disconnecting its own network and choosing a self-imposed short-term disruption to prevent a longer-term one caused by criminals. We've all read stories about that "critical moment at 2 AM" when some security leader has to make the call to take the entire company offline to apply a digital tourniquet. But how many companies could "pull the plug" even if they wanted to? The interconnected "plugs" are all virtual in today's IT landscape. And what else do you need to do quickly when faced with impending cyber doom? Here are some quick tips to ponder: 1⃣ Practice "pulling the plug" as a part of your BCDR preparedness. • What is the business disruption impact? • How do you notify users? • Can you still log in? • How are customers affected? • What middleware comms will function? • Do you need out-of-band comms? 2⃣ Consider using access control instead of a full disconnect. • Can you block all egress or ingress with a few firewall or router rules? • What about SaaS and cloud? • Could you push some ready-to-go emergency endpoint hardening rules instantly (assume your endpoint management/orchestration platform is not compromised, and if it was, you could switch to a backup method, such as using EDR command & control). 3⃣ Think about identity - lots of ways to slow an attacker or prevent new login sessions using identity controls. • Would blocking all user logins except a few designated, safe logins all for a more limited disconnect? • Maybe you only need to block egress, or some egress. • Maybe you only need to block RDP and NetBIOS internally. • Do you have a trusted business-critical allowlist that could have precedence above an all-block rule?   • If yes, is the allowlist translated into discrete source/destination/protocol access policies that could be deployed quickly? 4⃣ Can you reset all privileged credentials quickly? • Most companies do this manually, but you need to be able to do it with push-button automation. • What if access was obtained via API keys? Can you reset API keys quickly? • What about currently active sessions? • What about SaaS and cloud? "Pulling the plug" is a lot more complicated than most realize until you start planning and practicing for scenarios that may require it. My message to all is not only to practice pulling the plug, but to define the different scenarios and degrees of emergency access changes to deploy so you can be more surgical and limit business impact. This list is just the tip of the iceberg. What am I missing? 

  • View profile for Paula Fontana

    Board Director | CMO | Founder

    6,925 followers

    The Bank of England’s latest cyber stress test exposed interesting fault lines in how firms think about resilience. Here are 8 key insights: 1. Resilience is as much about confidence as capability Many firms underestimate the equally critical role of confidence among customers, partners, and the market. Implication: Resilience is storytelling under stress - how confident others feel about your recovery matters as much as the actual recovery. 2. Misaligned incentives between fairness and financial stability There’s a fundamental tension between procedural fairness and systemic triage. Implication: Regulations may need reframing to empower firms to act decisively in crises. 3. Capital does not equal resilience Participants relied on capital buffers as a safety net but didn't anticipate that capital does not mitigate operational disruption. Implication: Traditional financial health metrics don’t translate to resilience. Firms may need new performance indicators tied to operational continuity and response capacity. 4. Disconnection decisions are a strategic blind spot Decisions to disconnect from financial market infrastructure were often made in technical silos, without understanding the business or systemic consequences. Implication: Disconnection is is a macroeconomic event. Firms need joint business-IT rehearsals for disconnection thresholds, impacts, and contingencies. 5. Liquidity chains break faster than expected Even short disruptions to transaction settlement caused liquidity shortfalls across customer firms, triggering unintended cascading effects. Implication: Most resilience planning overlooks second-order liquidity needs. Firms should simulate what happens when settlement failures ripple into other business lines. 6. Scenario testing underfocuses on persistence Most firms model how to recover, but not how to persist under degraded conditions. Few had tested processing critical transactions manually in disconnected states. Implication: Firms must build “graceful degradation” plans - how to operate at reduced capacity with integrity, not just how to bounce back. 7. Reconnection Is riskier than disconnection Firms assumed reconnection would be straightforward post-incident. But reconnection required third-party assurances that could take longer than Impact Tolerance allows. Implication: Resilience planning must include reconnection latency modeling and stakeholder communications for prolonged isolation scenarios. 8. Failure patterns need to be shared, not hidden Firms are encouraged to share lessons. But in practice, few firms have mechanisms or culture to share near-misses and failures constructively. Implication: Industry-wide resilience depends on a psychological safety net for sharing mistakes and uncertainty. Thanks for sharing, Orlando Fernández Ruiz. Check out the full set of findings: https://lnkd.in/g9nW-PiT

  • View profile for Brian Levine

    Cybersecurity & Data Privacy Leader • Founder & Executive Director of Former Gov • Speaker • Former DOJ Cybercrime Prosecutor • NYAG Regulator • Civil Litigator • Posts reflect my own views.

    14,738 followers

    Waiting until you have an incident to understand which of your systems are critical can have serious consequences, sometimes even life or death consequences. Here is an unusual example: It was recently reported that hackers launched a ransomware attack on a Swiss farmer's computer system, disrupting the flow of vital data from a milking robot. See https://lnkd.in/eVhzu429. The farmer apparently did not want to pay a $10K ransom, and thought he didn't really need data on the amount of milk produced in the short term. In addition, the milking robot also worked without a computer or network connection. The cows could therefore continue to be milked. The farmer, however, apparently didn't account for the fact that the data at issue was particularly important for pregnant animals. As a result of the attack, the farmer was unable to recognize that one calf was dying in the womb, and in the end, this lack of data may have prevented the famer from saving the calf. While most of us will hopefully not find themselves in this exact situation, the takeaways are the same for all of us: 1. CONDUCT A BIA: Consider conducting a business impact assessment (BIA) to understand the criticality and maximum tolerable downtime (MTD) of all your systems, processes, and activities, from a business or commercial standpoint. Of course, such analysis should include the health and safety impact of downtime. 2. VENDORS: As part of the BIA, consider assessing the MTD for each vendor as well. This will help you decide which primary vendors require a secondary, as well as define the terms of your contract with the secondary vendors. More details on backup vendors can be found here: https://lnkd.in/e-eVNvQz. 3. UPDATE YOUR BC/DR PLAN: Once you have conducted a BIA, update your business continuity and disaster recovery (BC/DR) plan to ensure that that your recovery time objective (RTO) and recovery point objective (RPO) are consistent with the MTD determined through your BIA. 4. PRACTICE: Conduct regular incident response (IR) and BC/DR tabletop exercises, as well as full failover exercises, to test and improve your ability to respond to a real event. Advice on conducting successful tabletop exercises can be found here: https://lnkd.in/eKrgV9Cg. Stay safe out there!

  • View profile for Christian Hyatt

    CEO & Co-Founder @ risk3sixty | Compliance, Cybersecurity, and Agentic AI for GRC Teams

    46,925 followers

    We have found 1750 gaps related to business continuity. Here are four important opportunities I'm seeing: 𝟭. 𝗣𝗼𝗹𝗶𝗰𝗶𝗲𝘀 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: Either companies lack policies, they are completely out of date, or they are so complex they are not useful. 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: Policies are an opportunity to clearly state their intent and hold people accountable. Try to make your BCP policy as clear as possible. Avoid conflating your policy with detailed procedures or event plans. If you are looking for framework guidance, consider ISO 22301. 𝟮. 𝗘𝘃𝗲𝗻𝘁 𝗣𝗹𝗮𝗻𝘀 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: As a rule, most companies have not considered and documented likely scenarios that could impact their business. 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: Spend some time considering the most likely scenarios that may impact your business and document how the company would respond. Almost every company deals with common events like ransomware, business email compromise, accidental data disclosure, and stolen laptops. 𝟯. 𝗧𝗮𝗯𝗹𝗲𝘁𝗼𝗽 𝗘𝘅𝗲𝗿𝗰𝗶𝘀𝗲𝘀 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: Organizations aren't doing good tabletop exercises and are missing an opportunity to drive organizational change and awareness. 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: Running a solid business continuity tabletop exercise is a golden opportunity to get cybersecurity at the top of mind of leadership. Get everyone in the same room thinking about risks. It will fast track your team's understanding that there are important risks at play. And suddenly, you have their ear. They suddenly understand why cybersecurity is critical to building enterprise value for your organization. Don't miss you chance to get leadership buy-in. 𝟰. 𝗕𝗮𝗰𝗸𝘂𝗽𝘀 𝗮𝗻𝗱 𝗥𝗲𝗰𝗼𝘃𝗲𝗿𝘆 𝗣𝗿𝗼𝗯𝗹𝗲𝗺: Companies aren't doing full backups or they aren't testing their ability to recover from backups. 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: I really don't want to pretend this is easy. It is a multistep process. First, considering what you need to back up (e.g., critical data). Second, consider how you should back it up (e.g., different cloud regions vs. off-site backups vs. endpoint restoration vs. something else). Third, doing the hard work of testing your ability to fully recover from backups. 𝗔𝗰𝗸𝗻𝗼𝘄𝗹𝗲𝗱𝗴𝗶𝗻𝗴 𝗧𝗵𝗶𝘀 𝗶𝘀 𝗮 𝗟𝗼𝘁 𝗼𝗳 𝗪𝗼𝗿𝗸 I just threw a few recommendations at you, but I need to acknowledge that this is a lot of work and you are going to need to right-size it for your organizations. A start-up is probably going to have a lighter and agile program. An enterprise company will likely have a team dedicated to business continuity and resilience. Let me offer this encouragement: The efforts put into resilience result in a return on investment when it comes to protecting and building enterprise value. 𝗢𝘂𝗿 𝗢𝘄𝗻 𝗕𝗖𝗣 𝗣𝗿𝗼𝗴𝗿𝗮𝗺 𝗮𝘁 𝗿𝗶𝘀𝗸𝟯𝘀𝗶𝘅𝘁𝘆: Risk3sixty is ISO 22301 (business continuity) certified. We have also helped dozens of companies build their program. Happy to answer questions. #cybersecurity #business #technology

  • View profile for Nicole Hoyle

    AJUVO | ServiceNow CRM for Retail 💚 | Closing the Gaps Between Stores, HQ & Field Ops | Enterprise Success at Scale

    8,612 followers

    11 threats to review for business resilience in 2025 😱 (most people don’t have the basics down) As seasoned risk & resilience professionals, our road is bumpy, but we're used to that. From… ⚠️ Slashed funding to cyber resilience gaps ⚠️ Massive data gaps to geopolitical disruptions ⚠️ Regulatory findings to cheezy DR tests We’ve seen it all. Here’s what the experts say are the top 11 challenges with simple tips to improve: ➜ 1. Weak Executive Sponsorship ↳ Misaligned overarching organizational objectives ↳ Inadequate funding 💡 Present data on prior impacts to companies & conduct regular status reports & risk mitigation plans ➜ 2. Regulatory Pressure ↳ New operational resilience requirements ↳ Complex self-attestation demands 💡 Build a regulatory change monitoring system with quarterly review cycles ➜ 3. Cyber Resilience Gap ↳ Sophisticated attacks outpacing defense capabilities ↳ Ransomware recovery more challenging than ever 💡 Implement automated backup verification checks & quarterly recovery testing ➜ 4. Data Overwhelm ↳ Manual processes can't keep up ↳ Need for AI-driven solutions is urgent 💡 Identify top 3 manual processes consuming most time & automate one within 90 days ➜ 5. BCM Integration Challenges ↳ Siloed risk & continuity approaches ↳ Lack of unified response strategy 💡 Start monthly cross-functional resilience meetings with key stakeholders ➜ 6. Outdated Dependency Maps Analysis ↳ Static maps fail in dynamic environments ↳ Real-time assessment needed 💡 Implement quarterly review triggers based on change management data ➜ 7. Information Silos ↳ Disconnected systems ↳ Lack of complete risk visibility 💡 Create a simple dashboard combining top 5 risk metrics from different systems ➜ 8. Test & Exercise Inadequacies ↳ Limited testing scenarios & frequency ↳ Lack of meaningful improvement 💡 Add “surprise element” scenarios to your next two exercises ➜ 9. Geopolitical Disruptions ↳ Supply chain instability ↳ Rising nation-state cyber threats 💡 Map critical suppliers' dependencies & establish backup vendors for top 5 critical services ➜ 10. Climate Impact Reality ↳ More frequent natural disasters ↳ Traditional DR plans becoming obsolete 💡 Review & update emergency notification procedures with geo-diverse backup communication channels ➜ 11. Remote Work Vulnerabilities ↳ Complex security perimeters ↳ Communication breakdowns during incidents 💡 Create role-based incident response playbooks specifically for remote teams Though these challenges aren’t going away, small consistent steps lead to big improvements. Start with one area this week & keep chipping away at progress. What’s your biggest resiliency program challenge & how are you addressing it?👇 ♻️ Share this to help your network strengthen their resilience program. ➕ Follow me Nicole Hoyle for more operational resilience insights.

  • View profile for Deepak Gupta

    Building the world’s first AI-powered GTM Engineer for B2B SaaS (Cybersecurity, IAM) | Co-founder/CEO | SaaS, AI, B2B Product-Led SEO for PLG

    5,705 followers

    Today's Google Cloud outage affecting Cloudflare, Spotify, Discord, Snapchat, Shopify, and countless other services is a stark reminder of a fundamental truth in our interconnected digital world: no single point of failure is acceptable in enterprise architecture. Having built SaaS to serve millions of users globally, I've learned that resilience isn't just about choosing the "best" cloud provider—it's about designing systems that can gracefully handle the unexpected. Three critical takeaways for B2B SaaS leaders: 🔄 Multi-cloud isn't paranoia, it's prudence. The most sophisticated companies aren't just backing up data—they're architecting for true redundancy across providers. ⚡ Your disaster recovery plan is only as good as your last test. When did you last simulate a complete cloud provider failure? If you can't answer that immediately, you know what your next sprint planning session needs to include. 🎯 Customer communication during outages defines your brand. Notice how quickly companies like Cloudflare and GitHub communicated? That's not accident—it's preparation. The reality is that even Google's enterprise-grade infrastructure can experience disruptions. The question isn't whether outages will happen—it's whether your architecture and incident response can maintain customer trust when they do. As we continue advancing AI integration in cybersecurity and beyond, building resilient systems becomes even more critical. The cost of downtime isn't just revenue—it's the competitive advantage you lose while your systems are dark. Read about outage: https://lnkd.in/gtt4RDj5 #CloudResilience #DisasterRecovery #B2BSaaS #Cybersecurity #EnterpriseArchitecture

  • View profile for Vasu Maganti

    𝗖𝗘𝗢 @ Zelarsoft | Driving Profitability and Innovation Through Technology | Cloud Native Infrastructure and Product Development Expert | Proven Track Record in Tech Transformation and Growth

    23,309 followers

    Lived through enough disasters to know this truth: Production is where optimism goes to die. Deployments WILL break. Systems WILL crash. You NEED to have a Disaster Recovery plan prepped. Most organizations spend $$ on fancy tech stacks but don’t realize how critical DR really is until something goes wrong. And that’s where the trouble starts. Here are a few pain points I see decision-makers miss: 👉 𝗕𝗮𝗰𝗸𝘂𝗽𝘀 ≠ 𝗗𝗶𝘀𝗮𝘀𝘁𝗲𝗿 𝗥𝗲𝗰𝗼𝘃𝗲𝗿𝘆. Sure, you’ve got backups—but what about your Recovery Point Objective (RPO)? How much data are you actually okay losing? Or your Recovery Time Objective (RTO)—how long can you afford to be down? 👉 "𝗦𝗲𝘁 𝗜𝘁 𝗮𝗻𝗱 𝗙𝗼𝗿𝗴𝗲𝘁 𝗜𝘁” 𝗗𝗥 𝗣𝗹𝗮𝗻𝘀. The app changes, infrastructure evolves, but you’re running on a DR plan you wrote two years ago? 👉 𝗜𝗱𝗹𝗲 𝗕𝗮𝗰𝗸𝘂𝗽 𝗘𝗻𝘃𝗶𝗿𝗼𝗻𝗺𝗲𝗻𝘁𝘀. Most teams have “hot spares” (idle infrastructure) sitting around waiting for the next big disaster. Disasters aren’t IF, they’re WHEN. Build DR testing into your CI/CD pipeline. If you’re shipping code daily, your recovery strategy should be just as active. Turn those idle backups into active DevOps workspaces. Load test them, stress test them, break them before production does. Stop relying on manual backups or failovers. Tools like AWS Backup, Route 53, and Elastic Load Balancers exist for a reason. Automate your snapshots, automate your failovers, automate 𝗲𝘃𝗲𝗿𝘆𝘁𝗵𝗶𝗻𝗴. Don’t wait for a disaster to test your DR strategy. Test it now, fail fast, and fix faster. What about you—what’s your top DR strategy tip? 💬 #DisasterRecovery #CloudComputing #DevOps #Infrastructure Zelar - Secure and innovate your cloud-native journey. Follow me for insights on DevOps and tech innovation.

Explore categories