A regional bank hired an AI agency to build a commercial real estate loan pricing model. The agency's data scientists built an excellent model—strong predictive performance, clean code, well-documented. When the bank submitted the model for internal review before deployment, the model risk management team rejected it. Not because the model was bad, but because the documentation did not meet the bank's SR 11-7 requirements. The model development documentation lacked a conceptual soundness assessment. The validation was performed by the same team that built the model (not independent). There was no ongoing monitoring plan. The limitations section was a single paragraph instead of the detailed analysis the bank's regulators expected. The agency spent eight additional weeks—$95,000 in unbilled work—rewriting documentation, conducting independent validation, and building a monitoring plan. The model itself did not change. Only the governance around it changed. That experience taught the agency a fundamental lesson: in regulated industries, the governance around a model is as important as the model itself.
SR 11-7 is the Federal Reserve's "Guidance on Model Risk Management," issued jointly by the OCC and the Federal Reserve in 2011. Despite being over a decade old, it remains the definitive framework for model risk management in financial services—and its principles apply far beyond banking. Understanding SR 11-7 is essential for any AI agency serving regulated industries, and the framework's concepts are increasingly relevant for AI governance in all sectors.
What SR 11-7 Requires
SR 11-7 defines a model as "a quantitative method, system, or approach that applies statistical, economic, financial, or mathematical theories, techniques, and assumptions to process input data into quantitative estimates." This definition clearly encompasses AI and machine learning models.
The guidance establishes three pillars of model risk management.
Pillar 1: Model Development, Implementation, and Use
Conceptual soundness. The model must be built on a sound theoretical and empirical basis. For AI models, this means:
- The choice of modeling approach must be justified. Why was a gradient boosted tree used instead of a neural network, or vice versa? What are the trade-offs?
- The model's assumptions must be documented and their validity assessed
- The features used must be justified. Why was each feature included? Are there features that were excluded, and why?
- The model's limitations must be documented—what the model cannot do, where it is likely to fail, and what conditions would invalidate its use
Data quality. The data used to develop and run the model must be accurate, complete, and appropriate:
- Data sources must be documented, including their reliability and any known quality issues
- Data preparation and transformation steps must be documented
- Data quality assessments must be performed and documented
- Ongoing data quality monitoring must be in place
Testing. The model must be thoroughly tested before deployment:
- In-sample testing (performance on training data)
- Out-of-sample testing (performance on holdout data not used in training)
- Out-of-time testing (performance on data from a different time period)
- Sensitivity analysis (how do changes in inputs affect outputs?)
- Stress testing (how does the model perform under extreme conditions?)
- Benchmarking (how does the model perform compared to simpler alternatives?)
Documentation. All aspects of model development must be documented at a level sufficient for a knowledgeable third party to understand, replicate, and evaluate the model:
- Model purpose and intended use
- Data description and preparation
- Methodology description and justification
- Testing results and analysis
- Limitations and conditions for use
- Implementation details
Pillar 2: Model Validation
Model validation is the independent assessment of model quality and appropriateness. SR 11-7 requires validation to be performed by parties independent of the development team.
Independence. Validators must be independent of the model development team. They should have no vested interest in the model's approval. For AI agencies, this means:
- The team that validates the model should not be the team that built it
- If the agency is too small for separate teams, use external validators
- Independence must be both actual and perceived
Validation scope. Validation must cover:
- Conceptual soundness review: Is the modeling approach appropriate? Are the assumptions reasonable? Are the features justified?
- Process verification: Was the model developed according to documented procedures? Were development standards followed?
- Outcomes analysis: Does the model perform as expected? Are performance metrics within acceptable ranges? How does the model perform across different segments?
- Benchmarking: How does the model compare to alternative approaches? Would a simpler model perform comparably?
- Sensitivity analysis: How sensitive is the model to changes in inputs, parameters, and assumptions?
- Stability analysis: Is the model's performance stable over time? Does it degrade under different conditions?
Validation report. The validation must produce a written report that documents:
- The scope and approach of the validation
- Findings, both positive and negative
- An overall assessment of model fitness for use
- Conditions or restrictions on use (if applicable)
- Required remediations before deployment (if applicable)
- Recommended monitoring and ongoing validation activities
Pillar 3: Ongoing Monitoring
Models are not static. Their performance changes over time as the world changes around them. SR 11-7 requires ongoing monitoring to detect and address model degradation.
Performance monitoring. Track model accuracy, error rates, and other performance metrics in production:
- Compare production performance to development and validation benchmarks
- Track performance over time to identify trends and degradation
- Break performance down by relevant segments (geography, customer type, product line)
- Set thresholds that trigger action when performance degrades
Outcomes analysis. Compare model predictions to actual outcomes:
- Back-test predictions against realized results
- Track prediction accuracy by segment and over time
- Investigate systematic over- or under-prediction
Stability monitoring. Monitor for changes in model inputs and behavior:
- Track input data distributions for drift
- Monitor feature importance stability
- Track output distribution changes
- Identify changes in the model's operating environment that may affect validity
Periodic revalidation. Conduct full revalidation on a regular schedule:
- Annual revalidation for high-risk models
- Revalidation triggered by material changes (new data, model updates, use case changes)
- Revalidation triggered by monitoring alerts (significant performance degradation, drift)
Applying SR 11-7 to AI Models
SR 11-7 was written before modern AI and machine learning were widely used in financial services. Applying its principles to AI models requires interpretation and extension.
Explainability Challenge
Traditional statistical models (linear regression, logistic regression) are inherently interpretable. You can look at the coefficients and understand how each input affects the output. Complex AI models (deep neural networks, large ensemble models) are not inherently interpretable. This creates tension with SR 11-7's requirements for conceptual soundness and documentation.
Practical approaches:
- Use inherently interpretable models where regulatory requirements demand it
- Where complex models are used, implement post-hoc explainability methods (SHAP, LIME, integrated gradients)
- Document the trade-off between model complexity and interpretability, and justify the choice
- Provide both global explanations (how the model generally works) and local explanations (why the model made a specific decision)
- Acknowledge explainability limitations honestly in documentation
Testing AI Models
AI models require additional testing beyond what SR 11-7 originally contemplated.
Bias testing. Test for disparate impact across protected demographic categories. This is not explicitly required by SR 11-7 but is required by fair lending regulations and is expected by examiners.
Adversarial testing. Test the model's robustness to adversarial inputs—deliberately crafted inputs designed to fool the model. This is particularly important for models that process external data.
Edge case testing. AI models can behave unpredictably at the boundaries of their training data. Test extensively with edge cases, extreme values, and unusual combinations.
Stability testing. Test how the model's behavior changes across time periods, market conditions, and population segments. AI models can be more sensitive to distribution shift than traditional models.
Validation of AI Models
Validating AI models requires specialized skills and approaches.
Replication challenges. AI model training can be non-deterministic (producing slightly different results each time). Validation must account for this by assessing whether differences from replication are within acceptable bounds.
Data leakage assessment. Validate that the model does not benefit from data leakage—information that would not be available at the time of prediction. Data leakage is a common issue in AI model development and can dramatically inflate apparent performance.
Overfitting assessment. Evaluate whether the model is overfitting to training data. Compare training performance to validation performance, use cross-validation, and assess performance on truly out-of-sample data.
Feature importance validation. Validate that the features the model relies on are conceptually sound and that the model's feature importance aligns with domain expertise. A model that produces good predictions for the wrong reasons will fail when conditions change.
Extending Beyond SR 11-7
While SR 11-7 provides an excellent foundation, AI governance requires additional dimensions that the guidance does not fully address.
Fairness and Ethics
SR 11-7 focuses on model accuracy and reliability but does not explicitly address fairness and ethics. Modern AI governance requires:
- Systematic bias testing across protected categories
- Fairness metrics integrated into model validation
- Ethical review of model use cases
- Ongoing fairness monitoring in production
Data Lineage and Provenance
SR 11-7 addresses data quality but does not specifically require comprehensive data lineage. For AI models, especially those using large and complex datasets, data lineage is essential for:
- Regulatory compliance (demonstrating data provenance)
- Debugging (tracing problems to their data source)
- Impact analysis (understanding what changes when data changes)
Third-Party Model Governance
SR 11-7 addresses vendor model risk but was written before the era of third-party AI APIs and foundation models. Modern governance must additionally address:
- Governance of models you do not own or control
- Model behavior changes from provider updates
- Data privacy in third-party model interactions
- License and intellectual property considerations
Automated Decision-Making
SR 11-7 focuses on models as decision-support tools but does not fully address the governance of automated decision-making. When AI models make decisions without human intervention, additional governance is needed:
- Clear criteria for when automated decisions are appropriate
- Human override mechanisms
- Appeal processes for affected individuals
- Enhanced monitoring for automated systems
Implementing Model Risk Management in Your Agency
For Agencies Serving Financial Services
If your clients are banks, insurance companies, or other regulated financial institutions, SR 11-7 compliance is not optional. Build it into your standard delivery.
Development standards. Create development standards that produce SR 11-7-compliant documentation as a natural byproduct of the development process. If your engineers document as they go, using standard templates, the final documentation package comes together with minimal additional effort.
Validation capability. Build or access independent validation capability. This might mean:
- A separate validation team within your agency (for larger agencies)
- Partnerships with independent validation firms
- Clear processes that ensure development and validation independence
Monitoring packages. Include ongoing monitoring as a standard deliverable, not an optional add-on. Design monitoring dashboards, define thresholds, and document monitoring procedures as part of every model delivery.
For Agencies Serving Non-Financial Clients
Even if your clients are not in financial services, the SR 11-7 framework is valuable. Adapt it by:
Scaling proportionally. Apply the full framework to high-risk AI systems and a simplified version to lower-risk systems. Every AI system benefits from documentation, testing, and monitoring—the rigor should match the risk.
Focusing on what matters. Not every element of SR 11-7 applies to every AI system. Focus on the elements that create the most value: documentation, independent review, and ongoing monitoring.
Using it as a differentiator. If your non-financial clients are not expecting SR 11-7-level governance, delivering it anyway differentiates your agency and prepares clients for future regulatory requirements.
Your Next Step
Take the model documentation for your most recent AI project and evaluate it against SR 11-7's three pillars. Does the development documentation include conceptual soundness justification, data quality assessment, and comprehensive testing? Was the model validated independently? Is there an ongoing monitoring plan? Score each dimension as green (meets the standard), yellow (partially meets), or red (does not meet). The reds are your immediate priorities. The yellows are your near-term priorities. Address them before your next delivery, and build the standards into your development process so that future projects meet the bar from the start.