Model Risk Management Frameworks (SR 11-7 and Beyond)

A regional bank hired an AI agency to build a commercial real estate loan pricing model. The agency's data scientists built an excellent model—strong predictive performance, clean code, well-documented. When the bank submitted the model for internal review before deployment, the model risk management team rejected it. Not because the model was bad, but because the documentation did not meet the bank's SR 11-7 requirements. The model development documentation lacked a conceptual soundness assessment. The validation was performed by the same team that built the model (not independent). There was no ongoing monitoring plan. The limitations section was a single paragraph instead of the detailed analysis the bank's regulators expected. The agency spent eight additional weeks—$95,000 in unbilled work—rewriting documentation, conducting independent validation, and building a monitoring plan. The model itself did not change. Only the governance around it changed. That experience taught the agency a fundamental lesson: in regulated industries, the governance around a model is as important as the model itself.

SR 11-7 is the Federal Reserve's "Guidance on Model Risk Management," issued jointly by the OCC and the Federal Reserve in 2011. Despite being over a decade old, it remains the definitive framework for model risk management in financial services—and its principles apply far beyond banking. Understanding SR 11-7 is essential for any AI agency serving regulated industries, and the framework's concepts are increasingly relevant for AI governance in all sectors.

What SR 11-7 Requires

SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." This definition clearly encompasses AI and machine learning models.

The guidance establishes three pillars of model risk management.

Pillar 1: Model Development, Implementation, and Use

Conceptual soundness. The model must be built on a sound theoretical and empirical basis. For AI models, this means:

The choice of modeling approach must be justified. Why was a gradient boosted tree used instead of a neural network, or vice versa? What are the trade-offs?
The model's assumptions must be documented and their validity assessed
The features used must be justified. Why was each feature included? Are there features that were excluded, and why?
The model's limitations must be documented—what the model cannot do, where it is likely to fail, and what conditions would invalidate its use

Data quality. The data used to develop and run the model must be accurate, complete, and appropriate:

Data sources must be documented, including their reliability and any known quality issues
Data preparation and transformation steps must be documented
Data quality assessments must be performed and documented
Ongoing data quality monitoring must be in place

Testing. The model must be thoroughly tested before deployment:

In-sample testing (performance on training data)
Out-of-sample testing (performance on holdout data not used in training)
Out-of-time testing (performance on data from a different time period)
Sensitivity analysis (how do changes in inputs affect outputs?)
Stress testing (how does the model perform under extreme conditions?)
Benchmarking (how does the model perform compared to simpler alternatives?)

Documentation. All aspects of model development must be documented at a level sufficient for a knowledgeable third party to understand, replicate, and evaluate the model:

Model purpose and intended use
Data description and preparation
Methodology description and justification
Testing results and analysis
Limitations and conditions for use
Implementation details

Pillar 2: Model Validation

Model validation is the independent assessment of model quality and appropriateness. SR 11-7 requires validation to be performed by parties independent of the development team.

Independence. Validators must be independent of the model development team. They should have no vested interest in the model's approval. For AI agencies, this means:

The team that validates the model should not be the team that built it
If the agency is too small for separate teams, use external validators
Independence must be both actual and perceived

Validation scope. Validation must cover:

Conceptual soundness review: Is the modeling approach appropriate? Are the assumptions reasonable? Are the features justified?
Process verification: Was the model developed according to documented procedures? Were development standards followed?
Outcomes analysis: Does the model perform as expected? Are performance metrics within acceptable ranges? How does the model perform across different segments?
Benchmarking: How does the model compare to alternative approaches? Would a simpler model perform comparably?
Sensitivity analysis: How sensitive is the model to changes in inputs, parameters, and assumptions?
Stability analysis: Is the model's performance stable over time? Does it degrade under different conditions?

Validation report. The validation must produce a written report that documents:

The scope and approach of the validation
Findings, both positive and negative
An overall assessment of model fitness for use
Conditions or restrictions on use (if applicable)
Required remediations before deployment (if applicable)
Recommended monitoring and ongoing validation activities

Pillar 3: Ongoing Monitoring

Models are not static. Their performance changes over time as the world changes around them. SR 11-7 requires ongoing monitoring to detect and address model degradation.

Performance monitoring. Track model accuracy, error rates, and other performance metrics in production:

Compare production performance to development and validation benchmarks
Track performance over time to identify trends and degradation
Break performance down by relevant segments (geography, customer type, product line)
Set thresholds that trigger action when performance degrades

Outcomes analysis. Compare model predictions to actual outcomes:

Back-test predictions against realized results
Track prediction accuracy by segment and over time
Investigate systematic over- or under-prediction

Stability monitoring. Monitor for changes in model inputs and behavior:

Track input data distributions for drift
Monitor feature importance stability
Track output distribution changes
Identify changes in the model's operating environment that may affect validity

Periodic revalidation. Conduct full revalidation on a regular schedule:

Annual revalidation for high-risk models
Revalidation triggered by material changes (new data, model updates, use case changes)
Revalidation triggered by monitoring alerts (significant performance degradation, drift)

Applying SR 11-7 to AI Models

SR 11-7 was written before modern AI and machine learning were widely used in financial services. Applying its principles to AI models requires interpretation and extension.

Explainability Challenge

Traditional statistical models (linear regression, logistic regression) are inherently interpretable. You can look at the coefficients and understand how each input affects the output. Complex AI models (deep neural networks, large ensemble models) are not inherently interpretable. This creates tension with SR 11-7's requirements for conceptual soundness and documentation.

Practical approaches:

Use inherently interpretable models where regulatory requirements demand it
Where complex models are used, implement post-hoc explainability methods (SHAP, LIME, integrated gradients)
Document the trade-off between model complexity and interpretability, and justify the choice
Provide both global explanations (how the model generally works) and local explanations (why the model made a specific decision)
Acknowledge explainability limitations honestly in documentation

Testing AI Models

AI models require additional testing beyond what SR 11-7 originally contemplated.

Bias testing. Test for disparate impact across protected demographic categories. This is not explicitly required by SR 11-7 but is required by fair lending regulations and is expected by examiners.

Adversarial testing. Test the model's robustness to adversarial inputs—deliberately crafted inputs designed to fool the model. This is particularly important for models that process external data.

Edge case testing. AI models can behave unpredictably at the boundaries of their training data. Test extensively with edge cases, extreme values, and unusual combinations.

Stability testing. Test how the model's behavior changes across time periods, market conditions, and population segments. AI models can be more sensitive to distribution shift than traditional models.

Validation of AI Models

Validating AI models requires specialized skills and approaches.

Replication challenges. AI model training can be non-deterministic (producing slightly different results each time). Validation must account for this by assessing whether differences from replication are within acceptable bounds.

Data leakage assessment. Validate that the model does not benefit from data leakage—information that would not be available at the time of prediction. Data leakage is a common issue in AI model development and can dramatically inflate apparent performance.

Overfitting assessment. Evaluate whether the model is overfitting to training data. Compare training performance to validation performance, use cross-validation, and assess performance on truly out-of-sample data.

Feature importance validation. Validate that the features the model relies on are conceptually sound and that the model's feature importance aligns with domain expertise. A model that produces good predictions for the wrong reasons will fail when conditions change.

Extending Beyond SR 11-7

While SR 11-7 provides an excellent foundation, AI governance requires additional dimensions that the guidance does not fully address.

Fairness and Ethics

SR 11-7 focuses on model accuracy and reliability but does not explicitly address fairness and ethics. Modern AI governance requires:

Systematic bias testing across protected categories
Fairness metrics integrated into model validation
Ethical review of model use cases
Ongoing fairness monitoring in production

Data Lineage and Provenance

SR 11-7 addresses data quality but does not specifically require comprehensive data lineage. For AI models, especially those using large and complex datasets, data lineage is essential for:

Regulatory compliance (demonstrating data provenance)
Debugging (tracing problems to their data source)
Impact analysis (understanding what changes when data changes)

Third-Party Model Governance

SR 11-7 addresses vendor model risk but was written before the era of third-party AI APIs and foundation models. Modern governance must additionally address:

Governance of models you do not own or control
Model behavior changes from provider updates
Data privacy in third-party model interactions
License and intellectual property considerations

Automated Decision-Making

SR 11-7 focuses on models as decision-support tools but does not fully address the governance of automated decision-making. When AI models make decisions without human intervention, additional governance is needed:

Clear criteria for when automated decisions are appropriate
Human override mechanisms
Appeal processes for affected individuals
Enhanced monitoring for automated systems

Implementing Model Risk Management in Your Agency

For Agencies Serving Financial Services

If your clients are banks, insurance companies, or other regulated financial institutions, SR 11-7 compliance is not optional. Build it into your standard delivery.

Development standards. Create development standards that produce SR 11-7-compliant documentation as a natural byproduct of the development process. If your engineers document as they go, using standard templates, the final documentation package comes together with minimal additional effort.

Validation capability. Build or access independent validation capability. This might mean:

A separate validation team within your agency (for larger agencies)
Partnerships with independent validation firms
Clear processes that ensure development and validation independence

Monitoring packages. Include ongoing monitoring as a standard deliverable, not an optional add-on. Design monitoring dashboards, define thresholds, and document monitoring procedures as part of every model delivery.

For Agencies Serving Non-Financial Clients

Even if your clients are not in financial services, the SR 11-7 framework is valuable. Adapt it by:

Scaling proportionally. Apply the full framework to high-risk AI systems and a simplified version to lower-risk systems. Every AI system benefits from documentation, testing, and monitoring—the rigor should match the risk.

Focusing on what matters. Not every element of SR 11-7 applies to every AI system. Focus on the elements that create the most value: documentation, independent review, and ongoing monitoring.

Using it as a differentiator. If your non-financial clients are not expecting SR 11-7-level governance, delivering it anyway differentiates your agency and prepares clients for future regulatory requirements.

Your Next Step

Take the model documentation for your most recent AI project and evaluate it against SR 11-7's three pillars. Does the development documentation include conceptual soundness justification, data quality assessment, and comprehensive testing? Was the model validated independently? Is there an ongoing monitoring plan? Score each dimension as green (meets the standard), yellow (partially meets), or red (does not meet). The reds are your immediate priorities. The yellows are your near-term priorities. Address them before your next delivery, and build the standards into your development process so that future projects meet the bar from the start.

What SR 11-7 Requires

The guidance establishes three pillars of model risk management.

Pillar 1: Model Development, Implementation, and Use

Conceptual soundness. The model must be built on a sound theoretical and empirical basis. For AI models, this means:

The choice of modeling approach must be justified. Why was a gradient boosted tree used instead of a neural network, or vice versa? What are the trade-offs?
The model's assumptions must be documented and their validity assessed
The features used must be justified. Why was each feature included? Are there features that were excluded, and why?
The model's limitations must be documented—what the model cannot do, where it is likely to fail, and what conditions would invalidate its use

Data quality. The data used to develop and run the model must be accurate, complete, and appropriate:

Data sources must be documented, including their reliability and any known quality issues
Data preparation and transformation steps must be documented
Data quality assessments must be performed and documented
Ongoing data quality monitoring must be in place

Testing. The model must be thoroughly tested before deployment:

In-sample testing (performance on training data)
Out-of-sample testing (performance on holdout data not used in training)
Out-of-time testing (performance on data from a different time period)
Sensitivity analysis (how do changes in inputs affect outputs?)
Stress testing (how does the model perform under extreme conditions?)
Benchmarking (how does the model perform compared to simpler alternatives?)

Documentation. All aspects of model development must be documented at a level sufficient for a knowledgeable third party to understand, replicate, and evaluate the model:

Model purpose and intended use
Data description and preparation
Methodology description and justification
Testing results and analysis
Limitations and conditions for use
Implementation details

Pillar 2: Model Validation

Model validation is the independent assessment of model quality and appropriateness. SR 11-7 requires validation to be performed by parties independent of the development team.

Independence. Validators must be independent of the model development team. They should have no vested interest in the model's approval. For AI agencies, this means:

The team that validates the model should not be the team that built it
If the agency is too small for separate teams, use external validators
Independence must be both actual and perceived

Validation scope. Validation must cover:

Conceptual soundness review: Is the modeling approach appropriate? Are the assumptions reasonable? Are the features justified?
Process verification: Was the model developed according to documented procedures? Were development standards followed?
Outcomes analysis: Does the model perform as expected? Are performance metrics within acceptable ranges? How does the model perform across different segments?
Benchmarking: How does the model compare to alternative approaches? Would a simpler model perform comparably?
Sensitivity analysis: How sensitive is the model to changes in inputs, parameters, and assumptions?
Stability analysis: Is the model's performance stable over time? Does it degrade under different conditions?

Validation report. The validation must produce a written report that documents:

The scope and approach of the validation
Findings, both positive and negative
An overall assessment of model fitness for use
Conditions or restrictions on use (if applicable)
Required remediations before deployment (if applicable)
Recommended monitoring and ongoing validation activities

Pillar 3: Ongoing Monitoring

Models are not static. Their performance changes over time as the world changes around them. SR 11-7 requires ongoing monitoring to detect and address model degradation.

Performance monitoring. Track model accuracy, error rates, and other performance metrics in production:

Compare production performance to development and validation benchmarks
Track performance over time to identify trends and degradation
Break performance down by relevant segments (geography, customer type, product line)
Set thresholds that trigger action when performance degrades

Outcomes analysis. Compare model predictions to actual outcomes:

Back-test predictions against realized results
Track prediction accuracy by segment and over time
Investigate systematic over- or under-prediction

Stability monitoring. Monitor for changes in model inputs and behavior:

Track input data distributions for drift
Monitor feature importance stability
Track output distribution changes
Identify changes in the model's operating environment that may affect validity

Periodic revalidation. Conduct full revalidation on a regular schedule:

Annual revalidation for high-risk models
Revalidation triggered by material changes (new data, model updates, use case changes)
Revalidation triggered by monitoring alerts (significant performance degradation, drift)

Applying SR 11-7 to AI Models

SR 11-7 was written before modern AI and machine learning were widely used in financial services. Applying its principles to AI models requires interpretation and extension.

Explainability Challenge

Practical approaches:

Use inherently interpretable models where regulatory requirements demand it
Where complex models are used, implement post-hoc explainability methods (SHAP, LIME, integrated gradients)
Document the trade-off between model complexity and interpretability, and justify the choice
Provide both global explanations (how the model generally works) and local explanations (why the model made a specific decision)
Acknowledge explainability limitations honestly in documentation

Testing AI Models

AI models require additional testing beyond what SR 11-7 originally contemplated.

Edge case testing. AI models can behave unpredictably at the boundaries of their training data. Test extensively with edge cases, extreme values, and unusual combinations.

Validation of AI Models

Validating AI models requires specialized skills and approaches.

Extending Beyond SR 11-7

While SR 11-7 provides an excellent foundation, AI governance requires additional dimensions that the guidance does not fully address.

Fairness and Ethics

SR 11-7 focuses on model accuracy and reliability but does not explicitly address fairness and ethics. Modern AI governance requires:

Systematic bias testing across protected categories
Fairness metrics integrated into model validation
Ethical review of model use cases
Ongoing fairness monitoring in production

Data Lineage and Provenance

SR 11-7 addresses data quality but does not specifically require comprehensive data lineage. For AI models, especially those using large and complex datasets, data lineage is essential for:

Regulatory compliance (demonstrating data provenance)
Debugging (tracing problems to their data source)
Impact analysis (understanding what changes when data changes)

Third-Party Model Governance

SR 11-7 addresses vendor model risk but was written before the era of third-party AI APIs and foundation models. Modern governance must additionally address:

Governance of models you do not own or control
Model behavior changes from provider updates
Data privacy in third-party model interactions
License and intellectual property considerations

Automated Decision-Making

Clear criteria for when automated decisions are appropriate
Human override mechanisms
Appeal processes for affected individuals
Enhanced monitoring for automated systems

Implementing Model Risk Management in Your Agency

For Agencies Serving Financial Services

If your clients are banks, insurance companies, or other regulated financial institutions, SR 11-7 compliance is not optional. Build it into your standard delivery.

Validation capability. Build or access independent validation capability. This might mean:

A separate validation team within your agency (for larger agencies)
Partnerships with independent validation firms
Clear processes that ensure development and validation independence

For Agencies Serving Non-Financial Clients

Even if your clients are not in financial services, the SR 11-7 framework is valuable. Adapt it by:

Focusing on what matters. Not every element of SR 11-7 applies to every AI system. Focus on the elements that create the most value: documentation, independent review, and ongoing monitoring.

Model Risk Management Frameworks (SR 11-7 and Beyond)

What SR 11-7 Requires

Pillar 1: Model Development, Implementation, and Use

Pillar 2: Model Validation

Pillar 3: Ongoing Monitoring

Applying SR 11-7 to AI Models

Explainability Challenge

Testing AI Models

Validation of AI Models

Extending Beyond SR 11-7

Fairness and Ethics

Data Lineage and Provenance

Third-Party Model Governance

Automated Decision-Making

Implementing Model Risk Management in Your Agency

For Agencies Serving Financial Services

For Agencies Serving Non-Financial Clients

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?

Model Risk Management Frameworks (SR 11-7 and Beyond)

What SR 11-7 Requires

Pillar 1: Model Development, Implementation, and Use

Pillar 2: Model Validation

Pillar 3: Ongoing Monitoring

Applying SR 11-7 to AI Models

Explainability Challenge

Testing AI Models

Validation of AI Models

Extending Beyond SR 11-7

Fairness and Ethics

Data Lineage and Provenance

Third-Party Model Governance

Automated Decision-Making

Implementing Model Risk Management in Your Agency

For Agencies Serving Financial Services

For Agencies Serving Non-Financial Clients

Your Next Step

Agency Script Editorial

Related Articles

SOC 2 Compliance for AI Service Providers — The Complete Trust Services Guide

SOX Compliance for AI in Financial Reporting — Ensuring Auditability in Every Algorithm

Complete Model Risk Management Guide — Controlling Risk Across the Model Lifecycle

Ready to certify your AI capability?