Beyond availability: Building cloud resilience by design

By Nicolas Sekkaki, Global Practice Leader, Cloud at Kyndryl, and Kris Lovejoy, Global Practice Leader, Cyber Resilience at Kyndryl

In the last week, the acknowledgment of the breach of F5’s source code and the 15-hour AWS outage have reminded enterprises just how dependent modern business is on the invisible architecture of our global economy. When the systems that power collaboration, commerce, and communication falter, the ripple effects are immediate and global.

And yet, enterprise leaders across the globe are grappling with a paradox, according to the recently-published Kyndryl Readiness Report. While 90% of business leaders believe their IT infrastructure is best in class, only 39% feel it’s prepared for future disruption — a gap that underscores how even the most advanced systems can be both indispensable and fragile.

In the case of AWS, does this mean the cloud model is fundamentally flawed? Of course not. Hyperscalers deliver exceptional and consistent systems availability at global scale. What the AWS and past (and future) disruptions highlight is that the continuity of any system is a shared responsibility. These obligations begin at the co-creation and design stages and must include provisions for resilience and flexibility.

In short, the organizations that recovered from the AWS outage did so by design — not by chance. This is a defining trait of true cloud maturity. Some design approaches:

Develop multi-region architectures

Running applications across multiple physical data centers in different regions is the most direct defense against catastrophic failure. There are several ways to do this, each with their own considerations. The Active-Active approach helps ensure near-zero downtime by running applications simultaneously in two or more regions, albeit at higher cost and architectural complexity. In an Active-Passive configuration, a primary region handles all system traffic with a replica standing by as backup, so the design is easier but still requires investment. Finally, an Active-Selective – the more pragmatic approach – focuses resilience efforts on an organization’s most critical systems and functions, with an understanding that less-critical systems and workloads can be rebuilt easily. This approach offers continuity where it matters most, while minimizing duplication

Employ multi-cloud strategies

Kyndryl’s 2025 Cloud Innovation Survey reveals that the most advanced organizations are multicloud by choice. By distributing workloads across multiple hyperscaler systems, these operators reduce their exposure to any single provider’s systemic dependencies. Other strategic advantages to this approach include being able to innovate faster by using best-of-breed services and simplifying operations via a platform-engineering approach that unifies system observability, security and operations across multiple clouds.

Use hybrid cloud to control continuity

Running a hybrid cloud environment can help ensure that organizations deploy the right workloads to the right platforms. By combining the agility of public clouds with the data governance capabilities of modern private clouds, operators can maintain full control over how to protect and recover their data. In addition, software-defined, API-driven modern private clouds can provide a “public-cloud experience”, where capacity and workloads flow seamlessly between environments. The elasticity of this approach gives operators the best of both worlds, as highly dynamic, scalable services thrive in public cloud, while regulated, latency-sensitive, or mission-critical workloads remain anchored in private, secure environments. Together, these models allow enterprises to scale innovation without sacrificing control.

Don’t overlook SaaS

Most enterprises rely heavily on SaaS (Software as a Service) applications that function outside their direct operational control. So when a SaaS provider experiences an incident, the business continuity of their customers may be jeopardized. Managing this eventuality begins with transparent cooperation and shared accountability. All partners must understand and integrate the recovery models, data retention policies and export capabilities of the other players. This means establishing independent data backups, alternate access channels, or contingency workflows to minimize disruption when critical systems fail.

Accepting the inevitability of system failure is the first step to building a resilience mindset. Only then can partners collaborate effectively to govern dependency and help ensure end-to-end operational continuity.

Each approach to system resiliency has benefits and challenges that require technical expertise, business insights and cross-industry data to guide enterprises to the best decisions. “Resilience by Design” is the new mantra for the cloud era. By choosing systems and solutions wisely, managing expectations and preparing for the inevitability of disruption in a challenging world, enterprises can turn that mindset into practice, protecting their data and their bottom lines in ways that still allow for innovation and growth.

Nicolas Sekkaki

Global Practice Leader, Cloud

Follow Nicolas on LinkedIn

Kris Lovejoy

Global Practice Leader, Cyber Resilience

Follow Kris on LinkedIn

Beyond availability: Building cloud resilience by design

Nicolas Sekkaki

Kris Lovejoy

Recommended Content

Kyndryl and Aptiv partner to power mission-critical systems globally

RIMAC accelerates cloud modernization with Kyndryl and Oracle

Kyndryl partners with Microsoft to expand Sovereignty Solutioning

Mainframe Modernization

Kyndryl Bridge

Investor Relations

Sustainability

Beyond availability: Building cloud resilience by design

Nicolas Sekkaki

Kris Lovejoy

Recommended Content

Kyndryl and Aptiv partner to power mission-critical systems globally

RIMAC accelerates cloud modernization with Kyndryl and Oracle

Kyndryl partners with Microsoft to expand Sovereignty Solutioning