AGENCYSCRIPT
CoursesEnterpriseBlog
๐Ÿ‘‘FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
ยฉ 2026 Agency Script, Inc.ยท
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Why Agencies Need Model RegistriesThe Model Management ProblemThe Business CaseModel Registry ArchitectureCore ComponentsModel Versioning StrategyWhat to Store in the RegistryChoosing a Registry SolutionMLflow Model RegistryWeights and Biases Model RegistryCloud-Native RegistriesCustom RegistriesImplementing Registry WorkflowsModel Registration WorkflowModel Promotion WorkflowModel Rollback WorkflowMulti-Client Registry ManagementOrganization StructureNaming ConventionsAccess ControlRegistry Maintenance and OperationsArtifact Lifecycle ManagementRegistry MonitoringDisaster RecoveryYour Next Step
Home/Blog/Setting Up and Managing ML Model Registries โ€” Version Control for Production Intelligence
Delivery

Setting Up and Managing ML Model Registries โ€” Version Control for Production Intelligence

A

Agency Script Editorial

Editorial Team

ยทMarch 20, 2026ยท11 min read
model registrymlopsmodel managementproduction ml

An AI agency in Seattle managed 23 production models across 8 clients. Every model existed as a file on a shared drive with naming conventions like "fraudmodelv3finalFINAL_fixed.pkl." When a client's recommendation model started producing wildly incorrect suggestions at 2 AM on a Tuesday, the on-call engineer needed to roll back to the previous model version. They spent 11 hours finding the right model file, figuring out which preprocessing pipeline it required, confirming which configuration it expected, and deploying it โ€” all while the client's system served bad recommendations to 200,000 users. The post-mortem identified the root cause not as a bad model but as the absence of a model registry. There was no systematic way to track which model version was in production, what its dependencies were, or how to deploy a previous version. The agency spent the next month building a model registry that would have reduced that 11-hour incident to a 3-minute rollback.

A model registry is a centralized system for storing, versioning, and managing machine learning model artifacts throughout their lifecycle โ€” from training through staging, production, and retirement. For AI agencies managing multiple models across multiple clients, a model registry is not a nice-to-have โ€” it is the foundation of reliable ML operations.

Why Agencies Need Model Registries

The Model Management Problem

Without a registry, agencies accumulate model management debt that compounds with every new model and every new client.

Common symptoms of missing model management:

  • Nobody knows which model version is currently in production for a given client
  • Rolling back to a previous model version requires reverse-engineering the deployment
  • Training artifacts are scattered across laptops, shared drives, and cloud storage buckets
  • Reproducibility is impossible โ€” the exact code, data, and configuration that produced a model are not recorded
  • Model dependencies (preprocessing code, feature engineering pipelines, configuration files) are not tracked alongside the model
  • Audit requests for regulated clients cannot be answered ("which model made this prediction and what data was it trained on?")

The Business Case

Quantifiable benefits of a model registry:

  • Incident response time: Reduce model rollback time from hours to minutes
  • Deployment frequency: Enable confident weekly or daily model updates instead of quarterly manual deployments
  • Compliance readiness: Provide immediate answers to audit questions about model lineage, training data, and performance history
  • Team efficiency: Eliminate the time engineers spend searching for model artifacts and reconstructing deployment procedures
  • Client confidence: Demonstrate professional model management practices that differentiate your agency from competitors

Model Registry Architecture

Core Components

Model artifact storage: The registry stores the serialized model file (pickle, ONNX, TorchScript, SavedModel) along with all artifacts needed to use the model โ€” preprocessing code, configuration files, feature schemas, and dependency specifications.

Metadata store: A database that tracks metadata for every registered model version โ€” training metrics, training data reference, hyperparameters, model lineage (which code and data produced the model), deployment status, and custom tags.

Version management: A versioning scheme that uniquely identifies every model version and supports state transitions โ€” from "development" to "staging" to "production" to "archived."

Access control: Role-based access to models โ€” data scientists can register new versions, ML engineers can promote to staging, and only authorized operators can promote to production.

API layer: A programmatic interface (REST API or SDK) that allows training pipelines to register models, deployment pipelines to fetch models, and monitoring systems to query model metadata.

Model Versioning Strategy

Semantic versioning for models:

  • Major version (v2.0.0): New model architecture, new training data schema, or breaking changes to input/output format
  • Minor version (v1.1.0): Retrained model with updated data, new features added, or hyperparameter changes
  • Patch version (v1.0.1): Bug fixes, configuration changes, or dependency updates that do not change model behavior

Stage-based lifecycle:

  • Development: Model is being trained and evaluated. Not ready for any form of production use.
  • Staging: Model has passed automated validation and is being tested in a staging environment with production-like data.
  • Production: Model is serving live traffic. Only one version per model can be in the "Production" stage at a time.
  • Archived: Model has been replaced by a newer version. Retained for rollback and audit purposes.
  • Deprecated: Model is scheduled for deletion. A grace period allows dependent systems to migrate.

What to Store in the Registry

For every model version, store:

  • Model artifact: The serialized model file in a format suitable for deployment
  • Training code reference: Git commit hash and repository URL for the exact code that produced the model
  • Training data reference: Dataset version identifier (DVC hash, data warehouse query, or feature store snapshot ID)
  • Hyperparameters: Every hyperparameter used during training
  • Training metrics: All evaluation metrics computed during training and validation
  • Feature schema: The expected input features, their data types, and expected value ranges
  • Output schema: The model's output format, including prediction types and confidence score ranges
  • Dependencies: Python package versions, system dependencies, and runtime requirements
  • Preprocessing artifacts: Tokenizers, encoders, scalers, and any other preprocessing components needed at inference time
  • Model card: A human-readable document describing the model's purpose, training data, performance characteristics, known limitations, and ethical considerations
  • Deployment configuration: Infrastructure requirements (GPU type, memory, replicas) and serving configuration

Choosing a Registry Solution

MLflow Model Registry

MLflow is the most widely adopted open-source model registry and the default choice for most agencies.

Strengths:

  • Mature, well-documented, large community
  • Built-in experiment tracking alongside model registry
  • Supports all major ML frameworks (scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM)
  • REST API and Python SDK for programmatic access
  • Integrates with major cloud platforms (AWS, GCP, Azure)
  • Stage-based model lifecycle management

Limitations:

  • No built-in model serving โ€” you need a separate serving infrastructure
  • Limited access control in the open-source version (use Databricks-managed MLflow for enterprise access control)
  • Artifact storage requires external configuration (S3, GCS, Azure Blob)

Best for: Agencies that want a proven, flexible registry with strong experiment tracking integration and are comfortable managing their own infrastructure.

Weights and Biases Model Registry

W&B provides a model registry as part of their broader ML development platform.

Strengths:

  • Excellent visualization and comparison tools
  • Strong artifact lineage tracking โ€” trace any model back to its training run, data, and code
  • Built-in team collaboration features
  • Managed infrastructure โ€” no servers to maintain
  • Integrates tightly with W&B experiment tracking

Limitations:

  • SaaS-only (no self-hosted option for the full platform, though they offer a server product)
  • Pricing can be significant for large teams
  • Less flexible than MLflow for custom workflows

Best for: Agencies that want a managed solution with excellent visualization and are willing to pay for reduced operational overhead.

Cloud-Native Registries

Amazon SageMaker Model Registry: Built into the SageMaker ecosystem. Best for agencies already using SageMaker for training and deployment. Provides automatic model approval workflows and integration with SageMaker endpoints.

Google Vertex AI Model Registry: Integrated with Vertex AI training and serving. Provides model evaluation and comparison tools. Best for agencies on Google Cloud.

Azure ML Model Registry: Part of Azure Machine Learning. Provides model cataloging, versioning, and deployment integration. Best for agencies on Azure.

Best for: Agencies that are committed to a single cloud provider and want tight integration with that provider's ML services.

Custom Registries

For agencies with unique requirements (multi-cloud deployments, strict data residency requirements, or integration with proprietary systems), a custom registry built on a metadata database and object storage may be the right choice.

Custom registry components:

  • PostgreSQL or DynamoDB for metadata storage
  • S3 or GCS for artifact storage
  • A Python SDK for model registration and retrieval
  • A simple web UI for browsing and managing models
  • CI/CD integration for automated model promotion

Build custom only when existing solutions do not meet specific requirements โ€” the maintenance burden of a custom registry is significant and rarely justified for agencies managing fewer than 50 models.

Implementing Registry Workflows

Model Registration Workflow

Every model that could potentially be deployed to production should be registered automatically as part of the training pipeline.

Automated registration steps:

  1. Training pipeline completes and produces a model artifact
  2. Pipeline evaluates the model on the validation set and computes all required metrics
  3. Pipeline registers the model in the registry with all metadata (metrics, hyperparameters, data reference, code reference, dependencies)
  4. Model is assigned the "Development" stage
  5. If the model passes automated validation gates (metrics above minimum thresholds), it is automatically promoted to "Staging"
  6. Notification is sent to the ML engineer responsible for manual review and production promotion

Model Promotion Workflow

Promoting a model from staging to production is the highest-risk operation in the model lifecycle. It should be deliberate, validated, and reversible.

Pre-promotion checklist:

  • All evaluation metrics meet or exceed the current production model
  • No regression on any per-class or per-segment metric
  • Inference latency is within acceptable bounds
  • Model passes integration tests with the production serving infrastructure
  • Model passes fairness and bias checks
  • Model card is complete and reviewed
  • Rollback procedure is documented and tested

Promotion process:

  1. ML engineer reviews the staging model and approves promotion
  2. The registry updates the model stage from "Staging" to "Production"
  3. The deployment pipeline detects the stage change and begins deployment
  4. The new model is deployed alongside the current model in a canary or blue-green configuration
  5. Production traffic is gradually shifted to the new model (5% then 20% then 50% then 100%)
  6. At each stage, metrics are compared to the previous model
  7. If metrics degrade at any stage, traffic is shifted back to the previous model and the promotion is rolled back
  8. Once 100% of traffic is on the new model and metrics are stable, the previous model is archived

Model Rollback Workflow

Rollback must be fast and reliable. When a production model is causing problems, every minute counts.

Rollback process:

  1. Operator triggers rollback via CLI, API, or dashboard
  2. Registry identifies the most recent archived model for this model name
  3. Deployment pipeline fetches the archived model and its dependencies
  4. The archived model is deployed to the serving infrastructure
  5. Traffic is shifted to the rolled-back model
  6. The problematic model is demoted from "Production" to "Development" for investigation
  7. An incident record is created, linking the model version, the trigger, and the resolution

Rollback time target: Under 5 minutes from trigger to full traffic shift. This requires pre-staged model artifacts and automated deployment pipelines. Manual rollbacks should be practiced during drills, not discovered during incidents.

Multi-Client Registry Management

Organization Structure

Agencies serving multiple clients need a registry structure that isolates client data while enabling cross-client operational efficiency.

Recommended structure:

  • Workspace per client: Each client has a separate workspace or project within the registry, ensuring data isolation
  • Shared model templates: Common model architectures and training pipelines shared across workspaces as templates, not as shared models
  • Centralized operational dashboard: A single view across all client workspaces showing model health, deployment status, and alerting

Naming Conventions

Consistent naming conventions across clients prevent confusion and enable automation.

Model naming convention: {client}{domain}{task}_{variant}

Examples:

  • acmeretailchurnpredictionv1
  • globexfinancefrauddetectionensemble
  • initechsupportticketclassificationmultilabel

Artifact naming convention: {modelname}{version}_{format}

Examples:

  • acmeretailchurnpredictionv12.1.0onnx
  • globexfinancefrauddetectionensemble1.0.3torchscript

Access Control

Role-based access control for agencies:

  • Data scientist: Can register models, view all models in their client workspace, compare model versions
  • ML engineer: All data scientist permissions plus promote models to staging, configure deployment parameters
  • Operations lead: All ML engineer permissions plus promote models to production, trigger rollbacks
  • Client viewer: Can view model metadata, metrics, and deployment status for their models (read-only)
  • Agency admin: Full access across all client workspaces, manage users and permissions

Registry Maintenance and Operations

Artifact Lifecycle Management

Model artifacts consume storage. Without lifecycle management, storage costs grow unboundedly.

Retention policy:

  • Production models: Retained indefinitely (or until the client contract ends)
  • Archived models (previous production versions): Retained for 12 months or as required by the client's regulatory requirements
  • Staging models (never promoted to production): Retained for 3 months
  • Development models: Retained for 30 days, then automatically deleted

Storage optimization:

  • Compress model artifacts before storage
  • Use tiered storage โ€” keep recent artifacts in hot storage (S3 Standard) and older artifacts in cold storage (S3 Glacier)
  • Deduplicate shared components โ€” if multiple model versions use the same preprocessing artifacts, store them once

Registry Monitoring

The registry itself is a critical system that needs monitoring.

Monitor:

  • Registry API availability and latency
  • Artifact storage capacity and growth rate
  • Registration and retrieval success rates
  • Deployment pipeline success rates triggered by registry events
  • User activity โ€” who is promoting models, how frequently, at what times

Disaster Recovery

Registry backup strategy:

  • Metadata database: Daily backups with point-in-time recovery
  • Artifact storage: Cross-region replication for production model artifacts
  • Configuration: Infrastructure-as-code for the registry itself, stored in version control

Recovery procedure:

  • Restore metadata database from backup
  • Verify artifact storage integrity
  • Re-establish connections between the registry and deployment pipelines
  • Verify that production model serving is unaffected

Your Next Step

If you do not have a model registry today, start with MLflow. Install it, register every model you currently have in production (with whatever metadata you can reconstruct), and set up automated registration in one training pipeline. This takes a day. Then add the pre-promotion checklist as a gate before any model goes to production. Within a week, you will have a functioning registry that eliminates the worst model management risks. Within a month, you will wonder how you ever operated without one. The agencies that manage models professionally retain clients longer, respond to incidents faster, and scale their operations without proportionally scaling their headaches.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

Delivery

Real-Time Stream Processing for AI Applications: The Complete Delivery Guide

When your client's AI model needs predictions in milliseconds instead of minutes, batch processing is not an option. Here is how to deliver production-grade stream processing for AI workloads.

A
Agency Script Editorial
March 21, 2026ยท14 min read
Delivery

Delivering Survival Analysis for Customer Retention: The AI Agency Playbook

A SaaS company knew their churn rate was 18 percent annually but could not predict when specific customers would leave. Survival analysis gave them a 90-day early warning system that saved $2.1 million in ARR.

A
Agency Script Editorial
March 21, 2026ยท13 min read
Delivery

Building Synthetic Data Generation Pipelines โ€” Creating Training Data When Real Data Is Scarce, Sensitive, or Biased

A healthcare AI company generated 500,000 synthetic patient records that preserved statistical patterns while eliminating privacy risk, cutting their model development timeline by 60%. Here is how to build synthetic data pipelines.

A
Agency Script Editorial
March 21, 2026ยท12 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification