What data do mortgage lenders need for predictive analytics?

Mortgage lenders need loan origination data, borrower credit and employment history, payment behavior records, property valuation data, local economic indicators, and secondary market performance data. Data quality and standardization are prerequisites for accurate models. Most lenders start by connecting their loan origination system data through APIs before adding external data feeds for market conditions and economic indicators.

CFPB

The Role of Predictive Analytics in Mortgage Risk Assessment

Justin Kirsch

↻ Updated July 02, 2026 | 14 min read Originally published March 2026

Predictive analytics dashboard for mortgage risk assessment showing Microsoft AI-powered default prediction, fraud detection, and LLM document analysis

CFPB mortgage industry Risk Management Microsoft 365

What You'll Learn

How Predictive Analytics Works in Mortgage Lending
Default Prediction and Early Warning Models
LLM-Powered Risk Models: The 2025-2026 Shift
AI-Powered Fraud Detection
Speed and Accuracy Gains for Underwriting
Regulatory Compliance for AI Risk Models
Prepayment and Refinance Risk Modeling
Real-Time Market Data Integration
How Mortgage BI and MortgageExchange Power the Risk Stack
M365 Guardian: The Operating Model for AI Risk Governance
Building a Predictive Analytics Strategy
Frequently Asked Questions

A February 2025 study published on arXiv demonstrated that machine learning models now predict mortgage defaults with over 90% accuracy when trained on comprehensive borrower datasets. That is a dramatic improvement over traditional underwriting models, which rely on a handful of variables and miss patterns that algorithms catch instantly.

Predictive analytics is reshaping how mortgage lenders assess risk. Not by replacing human judgment, but by giving underwriters and risk managers data-driven confidence in every decision. The accuracy gains are only half the story for community banks, credit unions, and independent mortgage banks. The other half is the plumbing: where the data lives, who governs the models, and whether the dashboards a loan officer uses in front of a borrower hold up under examination. This guide walks through both halves, then shows how a Tier-1 Microsoft Cloud Solution Provider for 750+ financial institutions pulls the data plumbing, the BI surface, and the AI governance layer into a single operating model.

90%+

Accuracy rate for machine learning mortgage default prediction models trained on comprehensive borrower datasets

Source: arXiv Research Study, February 2025

Tier-1 Cloud Solution Provider (CSP) ABT Partner Insight

A predictive analytics program for mortgage risk has four moving parts that have to talk to each other. MortgageExchange unifies loan-origination feeds, core banking servicing data, escrow disbursements, and external market data into a single queryable layer. Mortgage BI turns that layer into the delinquency, prepayment, default-risk, and fraud dashboards that risk managers and loan officers actually use. Microsoft Azure AI Foundry and Microsoft Fabric host the model training and real-time inference. M365 Guardian is ABT's operating model on top of Microsoft Entra ID, Microsoft Purview, Microsoft Defender, and Microsoft Sentinel that holds the whole stack to SR 11-7, CFPB Circular 2023-03, and FFIEC IT Examination expectations. ABT manages the Microsoft 365 tenants that the BI surface and model governance run inside.

Source: Access Business Technologies, Mortgage AI and Risk Analytics operating model, 2026.

Regulatory Landscape Shift: CFPB and OCC AI Decisioning Guidance

Since this article was originally published in October 2024, regulators have sharpened their focus on AI-driven lending decisions. The CFPB issued guidance requiring that creditors using AI or complex algorithms provide specific and accurate reasons for adverse actions, not broad categories. The OCC approved Quality Control Standards for Automated Valuation Models, requiring AI-powered property valuations to meet five quality control standards. The Federal Reserve confirmed SR 11-7 model risk management guidance applies to all AI and machine learning models, requiring governance, validation, and effective challenge. Every predictive model in your mortgage operation now falls under these requirements.

How Predictive Analytics Works in Mortgage Lending

Predictive analytics uses historical data, statistical algorithms, and machine learning to forecast future outcomes. In mortgage lending, that means analyzing thousands of variables per loan to estimate probability of default, prepayment risk, and fraud likelihood.

Modern models go far beyond FICO scores and LTV ratios. They incorporate employment stability trends, geographic economic indicators, payment behavior patterns, and market condition data. The models learn from millions of historical loans and improve as they process more data. See also our breakdown of Visualizing Combined Tax and Mortgage Payment Trends for Financial Insti.

Fannie Mae's 2025 lender sentiment survey found that 55% of mortgage lenders plan to pilot or expand AI and machine learning tools this year. The majority target underwriting and risk assessment as their first use case. That is not a coincidence. Risk is where predictive analytics delivers the clearest ROI.

Current leading models use XGBoost, LightGBM, Random Forest, and deep learning neural networks. The choice between them depends on your explainability requirements. Gradient boosting models (XGBoost, LightGBM) offer strong accuracy with reasonable interpretability through SHAP values. Deep learning models achieve the highest accuracy but are harder to explain to regulators.

Your Mortgage Technology Stack Has Gaps

ABT evaluates your mortgage technology stack, from Encompass to core banking integrations, against the specific threats targeting lenders. See your gaps in 48 hours.

Get Your Security Grade Talk to an ABT mortgage cybersecurity specialist

Default Prediction and Early Warning Models

The core application of predictive analytics in mortgage risk is default prediction. The MBA reported that mortgage delinquency rates reached 3.99% of all outstanding loans in Q3 2025, with the FHA delinquency rate climbing to 10.78%. FHA seriously delinquent loans increased nearly 50 basis points year-over-year. For servicers, catching early signs of distress can mean the difference between a workout and a foreclosure.

Predictive models identify borrowers at elevated risk by analyzing:

Payment behavior trends: Not just whether payments are current, but whether the pattern is deteriorating
Employment and income stability: Job changes, industry risk factors, and income volatility signals
Local market conditions: Property values, unemployment rates, and economic indicators in the borrower's MSA
Credit utilization changes: Rising credit card balances or new account openings that suggest financial stress

Early warning models give servicers time to offer loss mitigation options before loans become seriously delinquent. That is better for borrowers, better for investors, and better for your default rates. The harder problem for most community banks and credit unions is not picking a model. It is feeding the model the data. Servicing tapes, loss-mitigation case files, and core banking deposit histories usually live in three systems that do not natively talk to each other. MortgageExchange is the integration layer that turns those silos into one feed the risk model can score against.

"Lenders who integrate AI-driven predictive analytics into their workflows gain decisive competitive advantages through superior risk assessment, faster approvals, and better portfolio performance."

Finsolutia, Predictive Analytics in Mortgages Report, 2025

LLM-Powered Risk Models: The 2025-2026 Shift

Traditional predictive models process structured data: credit scores, income numbers, LTV ratios. Large language models change that equation by analyzing unstructured data that traditional models cannot touch.

LLM-powered risk assessment adds new data dimensions to mortgage risk models:

Document analysis at scale: LLMs read and interpret complex legal documents, title commitments, and appraisal narratives, flagging inconsistencies that structured models miss
Borrower communication patterns: Analyzing the content and tone of borrower correspondence to detect early distress signals before they appear in payment data
Market narrative processing: Ingesting regional economic reports, housing market commentary, and employment trend narratives to inform geographic risk adjustments
Regulatory change tracking: Monitoring GSE bulletins, CFPB guidance, and state regulatory updates to flag compliance implications for existing portfolio positions

The combination of structured prediction models (XGBoost, Random Forest) with LLM-driven unstructured analysis creates risk assessments that capture both the quantitative and qualitative dimensions of mortgage default probability. Lenders implementing these hybrid approaches report more accurate early-warning detection, particularly for borrowers who maintain current payments while showing stress signals in other data.

AI-Powered Fraud Detection

Machine learning models excel at fraud detection because they process thousands of data points simultaneously. Human underwriters reviewing an application might catch obvious red flags. Algorithms catch subtle ones.

Current AI fraud detection capabilities include:

Document anomaly detection: Identifying altered pay stubs, tax returns, and bank statements based on formatting patterns, font inconsistencies, and metadata analysis
Identity verification: Cross-referencing application data against multiple databases to detect synthetic identities
Collusion pattern recognition: Identifying networks of related applications that suggest organized fraud rings
Occupancy fraud signals: Analyzing data patterns that indicate a property will be used as an investment rather than a primary residence

The ROI on AI fraud detection is straightforward. One prevented fraudulent loan can save $100,000 or more. The technology pays for itself after catching a single case.

Microsoft AI predictive analytics pipeline for mortgage risk: data ingestion through Microsoft Fabric, feature engineering, model training on Azure Machine Learning, real-time inference via Azure AI Foundry, and Power BI risk dashboards — Microsoft AI mortgage risk assessment pipeline: Fabric ingestion to Power BI dashboards. Source: ABT, 2026.

Speed and Accuracy Gains for Underwriting

Predictive analytics does not just improve accuracy. It makes the entire underwriting process faster. AI-powered risk assessment tools can pre-screen applications in seconds, routing low-risk loans to streamlined processing and flagging complex cases for experienced underwriters.

29%

Reduction in total time underwriters spend per file when using AI-powered pre-screening and risk assessment tools

Source: Ocrolus, AI in Mortgage Lending, 2025

The speed advantage matters in competitive markets. When borrowers are shopping rates and lenders are competing on turn times, the ability to provide a preliminary risk assessment within minutes rather than days changes outcomes. Lenders implementing AI report operational expense reductions of 30-50%, with some achieving loan closures 2.5 times faster than industry averages.

For borrowers, faster assessments mean quicker approvals. They can lock favorable rates and close on properties before competing offers beat them. For lenders, faster processing means higher pull-through rates and lower cost per loan.

The percentage of fully automated loan decisions is expected to increase from today's single digits to 30-40% of volume as models mature and regulatory frameworks catch up. Low-risk conforming loans with clean documentation are the first candidates for full automation. Exception-heavy files will continue to require experienced human underwriters for the foreseeable future.

Regulatory Compliance for AI Risk Models

Deploying predictive analytics in mortgage risk assessment creates regulatory obligations that did not exist when you were using traditional scorecards. Every model that influences a lending decision falls under regulatory scrutiny.

CFPB Adverse Action Requirements

The CFPB's Circular 2023-03 made clear that creditors using AI cannot use "black-box" models when doing so prevents them from providing specific and accurate reasons for adverse actions. If your model declines a borrower, you need to explain exactly why in terms the borrower can understand. Broad categories like "credit risk score" are not sufficient. The CFPB will hold lenders accountable under ECOA regardless of how complex their technology is.

SR 11-7 Model Risk Management

The Federal Reserve's SR 11-7 guidance, jointly issued with the OCC, applies to all AI and machine learning models used in lending decisions. The guidance requires model governance, independent validation, and effective challenge. For community banks and mortgage companies, the OCC's 2025 bulletin clarified that institutions can tailor their model risk management practices to their size, but the core requirements remain. Every predictive model needs documentation, periodic validation, and a clear escalation path when model performance degrades.

Fair Lending and Explainability

A January 2025 CFPB supervisory highlights report found disproportionately high adverse outcomes from AI models using more than a thousand variables. Models that overfit on large variable sets can create fair lending risk even when protected classes are excluded from inputs. The remedy: use explainable model architectures (SHAP, LIME) that can demonstrate which variables drive each decision, and regularly test for disparate impact across protected classes.

For guidance on managing AI vendor risk in regulated mortgage environments, see FHFA Drops Anthropic: What AI Vendor Risk Means for Mortgage Lenders.

Prepayment and Refinance Risk Modeling

For lenders and servicers who hold or service mortgage-backed securities, prepayment risk directly affects portfolio performance. Predictive models forecast which borrowers are likely to refinance based on rate differentials, remaining term, and borrower characteristics.

This modeling helps with:

Hedging decisions: More accurate prepayment forecasts improve hedge performance
Portfolio valuation: Better prepayment models lead to more accurate mark-to-market pricing
Retention strategies: Identifying borrowers at high refinance risk lets servicers proactively offer competitive retention options

Analysts expect 2026 to bring modest recovery in refinancing volume as rates stabilize. Lenders with strong prepayment models will navigate that shift more profitably than those relying on broad assumptions.

Real-Time Market Data Integration

Static risk models that recalculate quarterly are becoming obsolete. The shift toward real-time data integration means predictive models now ingest live market feeds and adjust risk scores continuously.

Real-time data sources changing mortgage risk assessment include:

Live property value feeds: Automated valuation models pull comparable sales data daily rather than relying on appraisals that are 30-60 days old at closing
Employment verification APIs: Direct connections to payroll providers verify employment status in real-time rather than relying on static VOE letters
Economic indicator streams: Regional unemployment data, consumer spending patterns, and housing starts feed directly into risk models
Rate environment monitoring: Prepayment models adjust in real-time as rate markets move, improving hedge accuracy

The OCC's new Automated Valuation Model quality control standards require that AI-driven property valuations meet five specific standards: confidence score reporting, nondiscrimination testing, model validation, data integrity checks, and compliance with FIRREA. Lenders using real-time AVM feeds need to ensure their data pipelines meet these standards.

3.99%

Overall mortgage delinquency rate in Q3 2025, with FHA loans at 10.78%, underscoring the need for better predictive risk models

Source: MBA National Delinquency Survey, Q3 2025

How Mortgage BI and MortgageExchange Power the Risk Stack

Most community banks, credit unions, and independent mortgage banks discover the same thing when they try to stand up a predictive analytics program. The hard part is not selecting XGBoost over LightGBM. The hard part is feeding the model. A risk score that aggregates loan-origination data, servicing tape, escrow disbursements, county tax data, employment verification feeds, and consumer credit signals has to read from six or seven systems whose schemas were never designed to be joined. Most lenders stall here. The model team builds a notebook that works on a 6-month-old extract and never gets to production because production needs a live data pipeline nobody owns.

MortgageExchange is ABT's purpose-built integration layer that solves the plumbing problem. It connects the loan-origination platform (Encompass, Calyx, Mortgagebot, or a proprietary LOS) to the institution's core banking system (Fiserv DNA, Symitar Episys, Jack Henry SilverLake, Corelation KeyStone), to servicing platforms, to escrow disbursement records, and to the external market data the risk model depends on. The output is a single queryable layer that does not require a borrower's data to be re-keyed or batch-exported. Mortgage BI reads from that layer to produce the dashboards the risk team and the C-suite actually consume.

Mortgage BI is ABT's business intelligence layer for mortgage portfolios, built on Microsoft Power BI inside the institution's governed Microsoft 365 tenant. For a predictive analytics program, the dashboards that matter day to day include:

Delinquency heat maps: A geographic and time-series view of current, 30-day, 60-day, and 90+ day delinquency rates broken down by MSA, loan vintage, product type, and origination channel. The risk team can drill from the national rate down to a specific branch's portfolio inside two clicks.
Default-risk scoring distribution: The portfolio plotted across the risk-score distribution that the predictive model produces, with overlays that show concentration by LTV band, vintage, and channel. New loans that move the portfolio's risk profile show up on the dashboard before the next loan committee meeting, not three quarters later.
Prepayment and refinance-risk dashboards: Live rate-differential exposure across the portfolio, with retention-flag overlays for borrowers most likely to refinance away. Servicers use the same view to prioritize proactive retention outreach.
Fraud-signal cohorts: Cohort views of applications that triggered AI fraud signals, with audit trail back to the original data points so a fraud team can validate without re-running the model from scratch.
Model-performance monitoring: A dashboard the risk team uses to satisfy SR 11-7 periodic validation, with accuracy decay alerts and quarterly disparate-impact testing built in.

Those dashboards are not generic Power BI reports. They are configured against the specific data shape MortgageExchange produces for ABT's 750+ financial-institution customers, which means a new bank or credit union does not start from a blank canvas. The first conversation is which dashboards your risk team actually needs, and the data plumbing is already mapped. Our guide to FHFA Drops Anthropic goes deeper on this.

M365 Guardian: The Operating Model for AI Risk Governance

The technology stack solves the data and visualization problem. SR 11-7, CFPB Circular 2023-03, and the OCC AVM Quality Control Standards solve nothing on their own. The governance layer is the operating model on top of the Microsoft and ABT products that produces the documentation, audit trails, access controls, and validation evidence an examiner expects to see.

That layer has a name: M365 Guardian. It is ABT's operating model that runs on top of Microsoft Entra ID, Microsoft Defender, Microsoft Purview, Microsoft Sentinel, and Microsoft Intune for regulated financial institutions. For a mortgage lender running predictive analytics, the Guardian layer covers:

Model access governance: Microsoft Entra ID Conditional Access policies that scope who can train, tune, validate, or push a risk model into production. Privileged Identity Management (PIM) for time-bound elevation when a data scientist needs to retrain. The audit trail of every elevation is the evidence SR 11-7 effective challenge expects.
Data residency and lineage: Microsoft Purview Information Protection labels applied to the borrower data MortgageExchange ingests, so NPI never lands in a dev environment, never leaves the institution's Microsoft 365 tenant, and is traceable from the source system to the dashboard the loan officer sees.
Model lifecycle audit: Microsoft Purview Audit Premium captures every create, modify, and delete on the model artifacts, the Power BI datasets that feed Mortgage BI, and the configurations of the inference endpoints. The same audit log that satisfies SEA Rule 17a-4 for broker-dealer recordkeeping satisfies SR 11-7 model risk documentation for mortgage lenders.
Adverse action explainability evidence: The CFPB Circular 2023-03 requirement to provide specific reasons is operationally a workflow problem, not a model problem. Guardian wires the SHAP or LIME output for each declined application into the loan-decision packet, so the documentation an ECOA auditor asks for is the documentation the loan officer already produced.
Fair-lending testing schedule: Quarterly disparate-impact testing built into the BI surface, with the test outputs preserved under Purview retention and the results escalated to the chief compliance officer through a Microsoft Teams approval flow. The cadence is the cadence the OCC expects for AVM quality control as well.
Threat detection on the AI surface: Microsoft Defender for Cloud Apps monitors the model endpoints, training data stores, and BI workspaces for anomalous access. Microsoft Sentinel correlates those signals across the institution's Microsoft 365 footprint and ABT's 24x7 security operations center watches the queue.

The Guardian layer is what turns a working predictive analytics program from a model team experiment into a production capability that survives an examination. It is also the difference between hiring three more data scientists to build governance plumbing and using one ABT has already built across 750+ FI customers.

Building a Predictive Analytics Strategy

Implementing predictive analytics for risk assessment requires three things: clean data, the right models, and people who know how to act on the results.

Start with data quality. Predictive models are only as good as the data they consume. Invest in data standardization and cleansing before building models. MortgageExchange covers most of this work for the loan-origination-to-core-banking-to-servicing seams that consume the most engineering time at most institutions.
Choose models that fit your use case. Default prediction, fraud detection, and prepayment modeling each require different approaches. XGBoost and LightGBM offer the best balance of accuracy and explainability for most mortgage applications.
Build explainable models. Regulators require that lending decisions be explainable. Black-box models that cannot articulate why they flagged a loan create compliance risk. Use SHAP or LIME for model interpretability, and wire the output into the loan-decision packet through the Guardian adverse-action workflow.
Train your team. The best model in the world is worthless if your underwriters do not trust or understand its output. Mortgage BI dashboards designed to surface the variables the model weighted most heavily help here.
Validate continuously. SR 11-7 requires periodic model validation. Set up automated model performance monitoring that flags accuracy degradation before it becomes a compliance issue. Mortgage BI ships with model-performance views; Guardian routes the validation evidence into the regulatory file.
Test for fair lending impact. Run disparate impact analysis across protected classes before deployment and on a regular schedule after. Document everything for regulatory examination.

Get a Mortgage Risk Analytics Readiness Review

ABT runs predictive analytics, Mortgage BI dashboards, MortgageExchange data plumbing, and the M365 Guardian operating model for 750+ financial institutions. A 30-minute conversation maps your current risk-analytics surface, surfaces the SR 11-7 and CFPB gaps your next examiner is most likely to find, and outlines what an ABT-managed deployment would cover. No commitment, no quote, no obligation.

Talk to an ABT Specialist Grade Your M365 Posture

Frequently Asked Questions

How does predictive analytics improve mortgage risk assessment?

Predictive analytics improves mortgage risk assessment by analyzing thousands of variables per loan application using machine learning algorithms. These models incorporate payment behavior trends, employment stability data, local market conditions, and credit utilization patterns to produce default probability scores that are significantly more accurate than traditional underwriting methods relying on a few standard metrics. For community banks, credit unions, and independent mortgage banks, the operational lift is rarely the model selection itself. It is the data plumbing that feeds the model, which is what integration layers like ABT's MortgageExchange are built to solve.

What machine learning models are used for mortgage default prediction?

Common machine learning models for mortgage default prediction include XGBoost, LightGBM, Random Forest, deep learning neural networks, and logistic regression ensembles. A February 2025 study demonstrated these models achieve over 90% accuracy on comprehensive borrower datasets. Model selection depends on explainability requirements, since the CFPB requires lenders to provide specific reasons for adverse lending decisions. XGBoost and LightGBM are the most common production choices because they balance accuracy with SHAP-based interpretability.

Can AI detect mortgage fraud during underwriting?

AI detects mortgage fraud during underwriting by analyzing document metadata for alterations, cross-referencing application data against identity databases, recognizing collusion patterns across related applications, and identifying occupancy fraud signals. Machine learning processes thousands of data points simultaneously, catching subtle inconsistencies that human reviewers typically miss in manual document review.

What does Mortgage BI actually show a risk team day to day?

Mortgage BI is ABT's business intelligence layer built on Microsoft Power BI inside the institution's Microsoft 365 tenant. A typical risk team uses delinquency heat maps broken down by MSA, loan vintage, and origination channel; default-risk score distribution views with LTV-band and vintage overlays; prepayment-risk dashboards with retention flags; fraud-signal cohort views with audit trail back to the source data; and a model-performance monitor that supports SR 11-7 periodic validation. The dashboards are pre-configured against the data shape MortgageExchange produces for ABT's 750+ financial-institution customers, so a new bank or credit union does not start from a blank canvas.

How does M365 Guardian satisfy SR 11-7 and CFPB Circular 2023-03 for AI risk models?

M365 Guardian is the operating model ABT layers on top of Microsoft Entra ID, Microsoft Purview, Microsoft Defender, and Microsoft Sentinel for financial institutions. For a predictive analytics program, Guardian wires Microsoft Entra ID Conditional Access and Privileged Identity Management to control who can train or tune risk models, Microsoft Purview Audit Premium for the model-lifecycle audit trail that SR 11-7 documentation expects, Microsoft Purview Information Protection labels so NPI never lands in a dev environment, the SHAP or LIME explainability output piped into the loan-decision packet to satisfy CFPB Circular 2023-03 adverse-action specificity, quarterly disparate-impact testing scheduled and preserved under Purview retention, and Microsoft Defender for Cloud Apps plus Microsoft Sentinel watching the model endpoints and BI workspaces. ABT's 24x7 security operations center owns the alerts.

What regulatory requirements apply to AI risk models in mortgage lending?

AI risk models in mortgage lending must comply with the Federal Reserve's SR 11-7 model risk management guidance, which requires model governance, independent validation, and effective challenge. The CFPB requires specific and accurate adverse action reasons under ECOA and prohibits black-box models that cannot explain decisions. The OCC mandates quality control standards for automated valuation models. Lenders must also conduct regular fair lending testing to ensure AI models do not produce disparate impact across protected classes.

How do LLMs change mortgage risk assessment compared to traditional models?

Traditional predictive models process structured data like credit scores and LTV ratios. Large language models analyze unstructured data that traditional models cannot process, including legal documents, appraisal narratives, borrower correspondence, and regional economic reports. The combination of structured prediction models with LLM-driven unstructured analysis creates risk assessments that capture both quantitative metrics and qualitative signals, improving early-warning detection for borrower distress.

Justin Kirsch

Co-Founder & CEO, Access Business Technologies

Justin Kirsch leads ABT's data infrastructure and AI strategy for the 750+ financial institutions on the firm's Microsoft 365 footprint. He oversees Mortgage BI, MortgageExchange, and the M365 Guardian operating model that mortgage companies, credit unions, and community banks use to run predictive analytics under SR 11-7, CFPB, and OCC expectations.

The Role of Predictive Analytics in Mortgage Risk Assessment