Azure Disaster Recovery for Financial Institutions: FFIEC BCM Guide

Written by Justin Kirsch | Fri, Jun 26, 2026

In This Article

What Examiners and Your Board Expect for Continuity
The Real Cost of an Outage You Cannot Recover From
The Microsoft Shared-Responsibility Reality
What Azure Provides for Disaster Recovery
How ABT Operates Your Azure Recovery Environment
Connecting Recovery to Monitoring and Examiner Evidence
Frequently Asked Questions

Every examiner who walks into a bank, credit union, or mortgage company asks a version of the same question: if your core systems went dark right now, how long until you are back, and how much data would you lose getting there? The answer is supposed to be written down, tied to a business impact analysis, and proven by a test you ran recently. For a lot of institutions, the honest answer is a shrug and a backup job nobody has restored from in eighteen months.

Azure disaster recovery for financial institutions is the part of that answer that lives in cloud infrastructure rather than in a server closet. Microsoft operates a resilient platform, but the platform does not design your recovery objectives, configure your failover, or test your runbooks for you. That work belongs to the institution and to whoever operates its Azure environment as partner of record. This guide covers what supervisory guidance expects, why an untested recovery plan is the most expensive gap a regulated institution can carry, and how Azure capabilities map to recovery time and recovery point objectives when they are designed, configured, and tested correctly.

The framing matters because the words get muddled in vendor pitches. Backup is not disaster recovery. Storage durability is not availability. Retention is not a restore plan. We will keep those concepts distinct throughout, because examiners do too, and because the difference between them is exactly where institutions get caught.

What Examiners and Your Board Expect for Continuity

Start with the supervisory expectation, because it sets the bar everything else has to clear. The Federal Financial Institutions Examination Council publishes the IT Examination Handbook "Business Continuity Management" booklet, the current edition of which was issued in November 2019. It is examination guidance, not a statute, but FFIEC member agencies use the handbook as examination guidance for evaluating how an institution prepares for and recovers from disruption.

The booklet asks institutions to derive recovery objectives from a business impact analysis rather than picking round numbers that sound reassuring. For each prioritized business process, you define a recovery time objective, the maximum tolerable duration of an outage, and a recovery point objective, the maximum amount of data you can afford to lose measured in time. Those two numbers drive every architectural decision that follows. Then the booklet expects you to test the plan on a regular cadence and to track and remediate the issues those tests surface, the same documented, evidence-first discipline we cover in our guide to preparing for an FFIEC IT examination.

FFIEC IT Examination Handbook, Business Continuity Management (November 2019)

Business continuity management includes the continued maintenance of systems and controls for the resilience and continuity of operations, with a focus on technology, business operations, testing, and communication strategies.

Examination focus: recovery objectives derived from the business impact analysis, with documented testing and issue remediation

This is not a banking-only conversation. The National Credit Union Administration implemented its Information Security Examination procedures in 2023 to standardize how examiners evaluate a credit union's information security and resilience, aligning with FFIEC guidance and broader examination expectations. The Federal Deposit Insurance Corporation, also an FFIEC member, examines banks' business continuity using the same handbook. Whatever charter you hold, the shared backbone is the FFIEC booklet, and the shared expectation is documented recovery objectives plus evidence that you test against them.

Why Examiners Care About the Test

A written continuity plan with untested recovery objectives is a document, not a capability. The gap between "we have a plan" and "we proved the plan works" is where institutions fail during a real event. The booklet's emphasis on regular testing and issue remediation exists because a recovery objective you have never validated is a guess, and a guess is what turns a contained outage into a reportable incident.

For the board, the obligation is oversight. Directors are expected to understand the recovery posture in plain terms: how long until critical member or borrower services come back, how much data exposure that implies, when the plan was last tested, and what the test found. Translating those technical answers into the metrics a board actually tracks is a discipline of its own, one we walk through in our guide to reporting IT and security posture to your board and examiners. When those answers are vague, the board is carrying a risk it cannot quantify, which is exactly the situation supervisory guidance is designed to prevent.

The Real Cost of an Outage You Cannot Recover From

Examiner pressure is the floor. The business case sits well above it, because the cost of a recovery that fails has grown faster than most continuity budgets. Downtime is no longer a nuisance measured in inconvenience. For a meaningful share of organizations, it is measured in seven figures per hour.

41%

of enterprises surveyed reported that a single hour of downtime now costs their business between $1 million and more than $5 million

Source: ITIC 2024 Hourly Cost of Downtime survey, a self-reported online survey of more than 1,000 organizations

Those figures come from a commissioned, self-reported industry survey rather than from a regulator, so treat them as directional. The direction is unambiguous: for mid-size and large organizations, the per-hour cost of an outage has climbed past thresholds that make a slow or failed recovery a balance-sheet event, not a help-desk ticket. For a financial institution, add the regulatory reporting, member or borrower trust damage, and examiner scrutiny that follow a prolonged disruption, and the number gets worse. How fast you come back often hinges on a question most institutions never ask until the worst day, which is why it is worth understanding how much your recovery really depends on who sold you Microsoft 365 and whether they can actually help you recover.

One of the threats that can turn a recoverable outage into a catastrophe is ransomware, and the financial sector is squarely in its sights. What separates institutions that recover in days from those that recover in weeks is rarely the ransom decision. It is whether their backups survived the attack.

The Situation

Ransomware reaches your environment over a weekend. Before encrypting production, the attacker spends hours hunting for and corrupting your backup repositories, because they know intact backups are the one thing that lets you refuse the demand.

The Consequence

If the backups were online, flat, and reachable from the same identity plane as production, they are gone too. Recovery now means rebuilding from whatever survived, on a timeline measured in weeks, at a cost that industry data puts at roughly eight times the cost of recovering with backups intact.

That eight-times figure is not hyperbole. In a 2024 Sophos survey, vendor-published, self-reported research conducted by the independent firm Vanson Bourne covering the prior twelve months, the data showed that organizations whose backups were compromised reported a median recovery cost, excluding ransom payments, of $3 million, eight times the $375,000 median for organizations whose backups stayed intact. Just 26 percent of the compromised-backup group recovered within a week, versus 46 percent of those with usable backups. The same research found that across ransomware victims overall, 94 percent said attackers attempted to compromise their backups during the attack, which is why a recovery layer that cannot survive that attempt is the gap that matters.

Key Takeaway

The decisive factor in ransomware recovery is not whether you were hit. It is whether your recovery layer was designed to survive the same attack that took down production. Tested, isolated, validated backups paired with a rehearsed failover plan are what turn a bad week into a contained event, and they are exactly what the FFIEC testing expectation is pointing at.

The Microsoft Shared-Responsibility Reality

Here is where the architecture conversation usually goes sideways. Institutions that run on Microsoft assume that because the platform is resilient, recovery is handled. It is a reasonable assumption and a dangerous one, because it confuses Microsoft's responsibility for the platform with the institution's responsibility for its own data and recovery design.

Microsoft is explicit about the split. Under the shared responsibility model documented on Microsoft Learn, the customer always owns their data and identities and is responsible for protecting them, across every cloud deployment type. Microsoft keeps the underlying services running and durable. Deciding how your data is recovered, how fast, and from what point in time is your obligation, not the platform's.

The Gap Most Institutions Miss: Retention Is Not Backup

Microsoft 365's native retention policies and recycle bins are content-lifecycle and short-term recovery features. They govern how long an item persists before deletion and give you a window to undelete. That is genuinely useful, but it is not infrastructure-level backup and it is not disaster recovery. A short recoverable-items window will not rebuild a tenant after a destructive event, and it was never designed to. Treating retention as your recovery strategy is a consequence of misreading the shared-responsibility model, and it is a gap examiners assess.

This distinction is the pivot of the whole article. ABT manages your Microsoft 365 tenant, because Microsoft hosts that infrastructure and ABT operates the tenant through delegated administration. The actual disaster recovery architecture, the replication, the failover, the geo-redundant copies, lives in Azure, where the subscription is customer-controlled cloud infrastructure that a partner operates on your behalf. Microsoft 365 retention recovering a deleted email is not the same capability as an Azure recovery design bringing a workload back in a different region after a regional failure. Different layer, different tool, different responsibility.

Once you accept that the platform's resilience does not remove your duty to design, configure, test, and document recovery, the next question is the productive one: what does Azure actually give you to build that recovery posture, and what does each capability really promise?

What Azure Provides for Disaster Recovery

Azure offers a set of capabilities that, when properly designed and tested, support defined recovery time and recovery point objectives. The important word is "support." None of these is a magic switch. Each one is a building block that has to be configured for your workloads and proven by testing before you can put its number in front of a board or an examiner.

The primary failover service is Azure Site Recovery, which replicates workloads running on Azure virtual machines and on-premises physical and virtual machines to a secondary location for failover during an outage. Azure Site Recovery protects Azure virtual machines and supported on-premises workloads; it is not a backup of Microsoft 365 SaaS data such as Exchange Online, SharePoint, OneDrive, and Teams, which is a separate data-protection concern. Site Recovery supports configured recovery objectives, with the actual recovery time depending on the workload, the recovery design, and how regularly you test; Azure-to-Azure virtual machine failover is often completed quickly. It creates crash-consistent recovery points as frequently as every five minutes, with application-consistent points on a configurable cadence. Read that as a configurable capability you tune to your recovery point objective, not as a universal guarantee of five-minute data loss for every workload.

Underneath the failover service sit two reliability primitives that often get conflated. The first is physical separation within a region.

Availability Zones (in-region resilience)

Physically separate datacenters within a single Azure region
Independent power, cooling, and networking per zone
Protects against a datacenter-level failure inside a region
This is about availability, keeping a workload running, not about backup

Geo-Redundant Storage (cross-region durability)

Asynchronously copies data to a paired region hundreds of miles away
Microsoft states at least 99.99999999999999 percent (16 nines) durability of objects over a given year
Because replication is asynchronous, recent writes can be lost if the primary region is unrecoverable, so it implies a small, non-zero recovery point
This is about durability of stored objects, distinct from availability and from a managed backup

Microsoft's own phrasing on storage redundancy is worth quoting carefully: geo-redundant storage "offers at least 99.99999999999999% (16 9's) durability of objects over a given year." Durability is the probability that an object survives intact. It is not availability, which is whether you can reach the object right now, and it is not a backup you can restore a clean copy from after a corruption or ransomware event. An institution that hears "sixteen nines" and concludes "so we are covered for disaster recovery" has collapsed three distinct concepts, durability, availability, and backup, into one, and that conflation is the kind of imprecision an examiner will pull on.

Keeping the Concepts Distinct

Durability answers "will my stored object survive?" Availability answers "can I reach my system right now?" Backup answers "can I restore a known-good copy from a point before something went wrong?" Disaster recovery is the orchestrated plan that uses all three to bring prioritized workloads back inside your defined recovery objectives. Azure provides strong primitives for each, but the plan that ties them together, and the testing that proves it, is yours to build.

So Azure gives you replication and failover through Site Recovery, in-region resilience through availability zones, and cross-region durability through geo-redundant storage. Assembled and tested correctly, those map to real recovery objectives. For institutions that still run core workloads on-premises while extending into the cloud, those same primitives anchor a deliberate hybrid cloud architecture for financial institutions rather than a one-off failover bolt-on. Left unconfigured, or configured once and never tested, they map to a false sense of security. The work of assembling and testing them is where a partner of record earns its place.

The three Azure recovery primitives and what each actually protects. Durability, availability, backup, and disaster recovery are distinct; the plan that ties them together is yours to build and test.

How ABT Operates Your Azure Recovery Environment

Access Business Technologies is a Tier-1 Microsoft Cloud Solution Provider that has served more than 750 financial institutions since 1999. In the Azure context, that means ABT operates and runs your dedicated Azure environment as your Microsoft partner of record: Microsoft provides the underlying platform, and ABT operates the workloads inside your own Azure subscription. ABT designs, implements, monitors, documents, and tests the recovery posture inside that subscription, while the subscription itself stays customer-controlled.

The mechanics of how a CSP partner operates a customer's Azure environment are Microsoft-defined and built around least privilege. Access to the Azure resources that make up your recovery posture, the subscription, the resource groups, the virtual machines, Azure Site Recovery, and the storage accounts, is governed by Azure role-based access control (Azure RBAC) on your Azure subscription under the Azure plan, which lets the partner maintain the access needed to operate subscriptions and resource groups while the customer retains the ability to manage their own access. Distinct from that, delegated administrative access at the Microsoft 365 and tenant level is governed by Granular Delegated Admin Privileges (GDAP), which support least-privileged, time-bound partner access aligned with Zero Trust principles. The two are separate mechanisms: Azure RBAC scopes access to the Azure resources behind your recovery design, while GDAP scopes delegated admin at the tenant level. GDAP delegated admin access is least-privileged and time-bound when configured appropriately. Azure RBAC assignments on your subscription are least-privileged and scoped, and they can be made time-bound with Microsoft Entra Privileged Identity Management (PIM), though they are not inherently time-bound by default. In both, the customer explicitly grants the access and it is scoped to what operating the environment requires.

Tier-1 Cloud Solution Provider (CSP) ABT Partner Insight

As partner of record, ABT configures Azure Site Recovery replication policies and recovery-point retention to your recovery objectives, provisions failover targets across availability zones or a paired region, sets storage redundancy to match your durability and recovery-point needs, and operates it under least-privileged, scoped access rather than standing administrative rights. The recovery design is built to your business impact analysis, documented for examiners, and validated through scheduled failover testing.

Source: Microsoft Learn, Azure role-based access control and Granular Delegated Admin Privileges for CSP partners

What that buys a financial institution is the difference between owning Azure capabilities and operating them well. Plenty of institutions have an Azure subscription with some replication switched on. Far fewer have recovery objectives derived from a current business impact analysis, a failover design that matches those objectives, a documented runbook, a scheduled test, and a tracked list of issues the last test produced. The capability list is the easy part. The operating discipline is what holds up when an examiner asks for evidence or when a real event arrives.

ABT positions its value around exactly that discipline: governance, least-privileged access, design rigor, regular testing, and managed monitoring. ABT does not present itself as a replacement for backups or for the underlying failover architecture. It presents itself as the partner that designs, operates, tests, and documents that architecture so the recovery objectives you put in front of your board are real.

Connecting Recovery to Monitoring and Examiner Evidence

A recovery design that is never watched and never evidenced is only half a control. The other half is continuous monitoring and the examiner-ready paper trail that proves the posture held. This is where the Azure recovery layer connects to the broader security and governance work ABT performs through the Guardian operating model and Guardian MxDR.

The FFIEC Business Continuity Management examiner-readiness checklist for Azure disaster recovery. A defensible posture is documented, tested, and evidenced, not aspirational.

Identity is the first connection. The attack that destroys backups usually starts with a compromised identity that has too much standing access. ABT's operating model leans hard on least-privileged, time-bound access, both for the partner operating your environment and for the privileged accounts inside it, because reducing standing privilege is one of the most direct ways to keep an attacker from reaching the recovery layer. Detecting and responding to suspicious sign-ins, and watching for the behaviors that precede a backup-targeting attack, is monitoring work, not a one-time configuration, which is why we treat continuous security monitoring for financial institutions as an operating discipline rather than a dashboard you check after an incident.

The Evidence an Examiner Actually Wants

A defensible continuity posture produces artifacts: a recovery time and recovery point objective matrix tied to the business impact analysis, failover test reports with dates and outcomes, restore evidence, privileged-access logs, runbooks, monitoring reports, and a tracked list of corrective actions from the last test. ABT's managed monitoring and documentation discipline is built to generate that evidence as a byproduct of operating the environment, so examiner requests are a retrieval task rather than a fire drill.

The throughline is simple. FFIEC guidance expects recovery objectives, testing, and remediation. Azure provides the capabilities that, properly designed and tested, meet those objectives. ABT operates that Azure environment as partner of record under least-privileged access, then monitors, tests, and documents it so the institution can show its work. A capability without a tested design is a guess, a tested design without monitoring is a snapshot, and monitoring without documentation is invisible to the examiner who needs to see it.

For a bank, credit union, or mortgage company weighing all of this, the practical next step is not to buy more technology. It is to find out whether the recovery objectives you would put in front of a board today are designed, configured, tested, and documented, or whether they are aspirational. That answer is usually clarifying.

Find out whether your recovery objectives are real or aspirational

ABT runs an Azure resilience and recovery review for financial institutions: we map your recovery time and recovery point objectives to your business impact analysis, assess how your current Azure and Microsoft 365 environment supports them, and identify the gaps an examiner would flag. You get a clear picture of where your continuity posture actually stands.

Request an Azure resilience review Get your security grade

Frequently Asked Questions

The recovery time objective, or RTO, is the maximum tolerable length of an outage, how long a system can be down before the impact becomes unacceptable. The recovery point objective, or RPO, is the maximum amount of data you can afford to lose, measured in time, which determines how recent your last usable recovery point must be. The FFIEC Business Continuity Management booklet expects financial institutions to define both for each prioritized business process, derived from a business impact analysis rather than chosen arbitrarily.

Not in the disaster-recovery sense. Microsoft 365 provides native retention policies and recycle bins, which are content-lifecycle and short-term recovery features that govern how long items persist and give you a window to undelete them. Under Microsoft's shared responsibility model, the customer owns and is responsible for protecting their own data. Retention is useful, but it is not infrastructure-level backup and it will not rebuild a tenant after a destructive event. A complete recovery strategy treats backup, retention, and disaster recovery as distinct requirements.

The FFIEC IT Examination Handbook Business Continuity Management booklet, current edition issued November 2019, is supervisory examination guidance rather than a statute. It expects institutions to perform a business impact analysis, define recovery time and recovery point objectives for prioritized business processes, maintain systems and controls for resilience, test the continuity plan on a regular cadence, and track and remediate the issues that testing surfaces. FFIEC member agencies, including the FDIC, and the NCUA through its 2023 Information Security Examination procedures, use this framework to evaluate an institution's recovery readiness.

Azure Site Recovery replicates workloads running on Azure virtual machines and on-premises physical and virtual machines to a secondary location, so they can be failed over during an outage. It supports configured recovery objectives, with the actual recovery time depending on the workload, the recovery design, and regular testing; Azure-to-Azure virtual machine failover is often completed quickly. It creates crash-consistent recovery points as frequently as every five minutes, with application-consistent points on a configurable cadence, which you tune to your recovery point objective. It is a configurable capability that has to be designed and tested for your specific workloads, not an automatic guarantee.

There is no single universal number, because recovery time depends on which workloads are in scope, how the failover is designed, and how regularly the plan is tested. Azure Site Recovery supports configured recovery objectives, with the actual recovery time depending on workload, design, and testing, and Azure-to-Azure virtual machine failover is often completed quickly when the design is sound. The right approach for a credit union is to derive recovery objectives from a business impact analysis, configure Azure capabilities to support those objectives, and prove the actual recovery time through scheduled failover testing, so the number you report to your board and examiners is validated rather than assumed.

Intact backups are what let an organization restore its own systems instead of paying a ransom, so attackers actively hunt for and corrupt backups before encrypting production. In a 2024 Sophos survey, organizations whose backups were compromised reported a median recovery cost, excluding ransom payments, of $3 million, about eight times the $375,000 median for organizations with intact backups, and only 26 percent recovered within a week versus 46 percent of those with usable backups. The lesson for financial institutions is that backups must be isolated, validated, and tested, so they survive the same attack that hits production, which is also exactly what the FFIEC testing expectation is driving at.

Justin Kirsch

CEO, Access Business Technologies

Justin Kirsch has helped financial institutions design and operate resilient Microsoft cloud environments since 1999. As CEO of Access Business Technologies, a Tier-1 Microsoft Cloud Solution Provider serving more than 750 banks, credit unions, and mortgage companies, he leads the team that operates clients' dedicated Azure environments as their Microsoft partner of record, designing, testing, and documenting the disaster recovery posture that examiners and boards expect.

View full post