Setting Up and Managing ML Model Registries — Version Control for Production Intelligence

An AI agency in Seattle managed 23 production models across 8 clients. Every model existed as a file on a shared drive with naming conventions like "fraudmodelv3finalFINAL_fixed.pkl." When a client's recommendation model started producing wildly incorrect suggestions at 2 AM on a Tuesday, the on-call engineer needed to roll back to the previous model version. They spent 11 hours finding the right model file, figuring out which preprocessing pipeline it required, confirming which configuration it expected, and deploying it — all while the client's system served bad recommendations to 200,000 users. The post-mortem identified the root cause not as a bad model but as the absence of a model registry. There was no systematic way to track which model version was in production, what its dependencies were, or how to deploy a previous version. The agency spent the next month building a model registry that would have reduced that 11-hour incident to a 3-minute rollback.

A model registry is a centralized system for storing, versioning, and managing machine learning model artifacts throughout their lifecycle — from training through staging, production, and retirement. For AI agencies managing multiple models across multiple clients, a model registry is not a nice-to-have — it is the foundation of reliable ML operations.

Why Agencies Need Model Registries

The Model Management Problem

Without a registry, agencies accumulate model management debt that compounds with every new model and every new client.

Common symptoms of missing model management:

Nobody knows which model version is currently in production for a given client
Rolling back to a previous model version requires reverse-engineering the deployment
Training artifacts are scattered across laptops, shared drives, and cloud storage buckets
Reproducibility is impossible — the exact code, data, and configuration that produced a model are not recorded
Model dependencies (preprocessing code, feature engineering pipelines, configuration files) are not tracked alongside the model
Audit requests for regulated clients cannot be answered ("which model made this prediction and what data was it trained on?")

The Business Case

Quantifiable benefits of a model registry:

Incident response time: Reduce model rollback time from hours to minutes
Deployment frequency: Enable confident weekly or daily model updates instead of quarterly manual deployments
Compliance readiness: Provide immediate answers to audit questions about model lineage, training data, and performance history
Team efficiency: Eliminate the time engineers spend searching for model artifacts and reconstructing deployment procedures
Client confidence: Demonstrate professional model management practices that differentiate your agency from competitors

Model Registry Architecture

Core Components

Model artifact storage: The registry stores the serialized model file (pickle, ONNX, TorchScript, SavedModel) along with all artifacts needed to use the model — preprocessing code, configuration files, feature schemas, and dependency specifications.

Metadata store: A database that tracks metadata for every registered model version — training metrics, training data reference, hyperparameters, model lineage (which code and data produced the model), deployment status, and custom tags.

Version management: A versioning scheme that uniquely identifies every model version and supports state transitions — from "development" to "staging" to "production" to "archived."

Access control: Role-based access to models — data scientists can register new versions, ML engineers can promote to staging, and only authorized operators can promote to production.

API layer: A programmatic interface (REST API or SDK) that allows training pipelines to register models, deployment pipelines to fetch models, and monitoring systems to query model metadata.

Model Versioning Strategy

Semantic versioning for models:

Major version (v2.0.0): New model architecture, new training data schema, or breaking changes to input/output format
Minor version (v1.1.0): Retrained model with updated data, new features added, or hyperparameter changes
Patch version (v1.0.1): Bug fixes, configuration changes, or dependency updates that do not change model behavior

Stage-based lifecycle:

Development: Model is being trained and evaluated. Not ready for any form of production use.
Staging: Model has passed automated validation and is being tested in a staging environment with production-like data.
Production: Model is serving live traffic. Only one version per model can be in the "Production" stage at a time.
Archived: Model has been replaced by a newer version. Retained for rollback and audit purposes.
Deprecated: Model is scheduled for deletion. A grace period allows dependent systems to migrate.

What to Store in the Registry

For every model version, store:

Model artifact: The serialized model file in a format suitable for deployment
Training code reference: Git commit hash and repository URL for the exact code that produced the model
Training data reference: Dataset version identifier (DVC hash, data warehouse query, or feature store snapshot ID)
Hyperparameters: Every hyperparameter used during training
Training metrics: All evaluation metrics computed during training and validation
Feature schema: The expected input features, their data types, and expected value ranges
Output schema: The model's output format, including prediction types and confidence score ranges
Dependencies: Python package versions, system dependencies, and runtime requirements
Preprocessing artifacts: Tokenizers, encoders, scalers, and any other preprocessing components needed at inference time
Model card: A human-readable document describing the model's purpose, training data, performance characteristics, known limitations, and ethical considerations
Deployment configuration: Infrastructure requirements (GPU type, memory, replicas) and serving configuration

Choosing a Registry Solution

MLflow Model Registry

MLflow is the most widely adopted open-source model registry and the default choice for most agencies.

Strengths:

Mature, well-documented, large community
Built-in experiment tracking alongside model registry
Supports all major ML frameworks (scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM)
REST API and Python SDK for programmatic access
Integrates with major cloud platforms (AWS, GCP, Azure)
Stage-based model lifecycle management

Limitations:

No built-in model serving — you need a separate serving infrastructure
Limited access control in the open-source version (use Databricks-managed MLflow for enterprise access control)
Artifact storage requires external configuration (S3, GCS, Azure Blob)

Best for: Agencies that want a proven, flexible registry with strong experiment tracking integration and are comfortable managing their own infrastructure.

Weights and Biases Model Registry

W&B provides a model registry as part of their broader ML development platform.

Strengths:

Excellent visualization and comparison tools
Strong artifact lineage tracking — trace any model back to its training run, data, and code
Built-in team collaboration features
Managed infrastructure — no servers to maintain
Integrates tightly with W&B experiment tracking

Limitations:

SaaS-only (no self-hosted option for the full platform, though they offer a server product)
Pricing can be significant for large teams
Less flexible than MLflow for custom workflows

Best for: Agencies that want a managed solution with excellent visualization and are willing to pay for reduced operational overhead.

Cloud-Native Registries

Amazon SageMaker Model Registry: Built into the SageMaker ecosystem. Best for agencies already using SageMaker for training and deployment. Provides automatic model approval workflows and integration with SageMaker endpoints.

Google Vertex AI Model Registry: Integrated with Vertex AI training and serving. Provides model evaluation and comparison tools. Best for agencies on Google Cloud.

Azure ML Model Registry: Part of Azure Machine Learning. Provides model cataloging, versioning, and deployment integration. Best for agencies on Azure.

Best for: Agencies that are committed to a single cloud provider and want tight integration with that provider's ML services.

Custom Registries

For agencies with unique requirements (multi-cloud deployments, strict data residency requirements, or integration with proprietary systems), a custom registry built on a metadata database and object storage may be the right choice.

Custom registry components:

PostgreSQL or DynamoDB for metadata storage
S3 or GCS for artifact storage
A Python SDK for model registration and retrieval
A simple web UI for browsing and managing models
CI/CD integration for automated model promotion

Build custom only when existing solutions do not meet specific requirements — the maintenance burden of a custom registry is significant and rarely justified for agencies managing fewer than 50 models.

Implementing Registry Workflows

Model Registration Workflow

Every model that could potentially be deployed to production should be registered automatically as part of the training pipeline.

Automated registration steps:

Training pipeline completes and produces a model artifact
Pipeline evaluates the model on the validation set and computes all required metrics
Pipeline registers the model in the registry with all metadata (metrics, hyperparameters, data reference, code reference, dependencies)
Model is assigned the "Development" stage
If the model passes automated validation gates (metrics above minimum thresholds), it is automatically promoted to "Staging"
Notification is sent to the ML engineer responsible for manual review and production promotion

Model Promotion Workflow

Promoting a model from staging to production is the highest-risk operation in the model lifecycle. It should be deliberate, validated, and reversible.

Pre-promotion checklist:

All evaluation metrics meet or exceed the current production model
No regression on any per-class or per-segment metric
Inference latency is within acceptable bounds
Model passes integration tests with the production serving infrastructure
Model passes fairness and bias checks
Model card is complete and reviewed
Rollback procedure is documented and tested

Promotion process:

ML engineer reviews the staging model and approves promotion
The registry updates the model stage from "Staging" to "Production"
The deployment pipeline detects the stage change and begins deployment
The new model is deployed alongside the current model in a canary or blue-green configuration
Production traffic is gradually shifted to the new model (5% then 20% then 50% then 100%)
At each stage, metrics are compared to the previous model
If metrics degrade at any stage, traffic is shifted back to the previous model and the promotion is rolled back
Once 100% of traffic is on the new model and metrics are stable, the previous model is archived

Model Rollback Workflow

Rollback must be fast and reliable. When a production model is causing problems, every minute counts.

Rollback process:

Operator triggers rollback via CLI, API, or dashboard
Registry identifies the most recent archived model for this model name
Deployment pipeline fetches the archived model and its dependencies
The archived model is deployed to the serving infrastructure
Traffic is shifted to the rolled-back model
The problematic model is demoted from "Production" to "Development" for investigation
An incident record is created, linking the model version, the trigger, and the resolution

Rollback time target: Under 5 minutes from trigger to full traffic shift. This requires pre-staged model artifacts and automated deployment pipelines. Manual rollbacks should be practiced during drills, not discovered during incidents.

Multi-Client Registry Management

Organization Structure

Agencies serving multiple clients need a registry structure that isolates client data while enabling cross-client operational efficiency.

Recommended structure:

Workspace per client: Each client has a separate workspace or project within the registry, ensuring data isolation
Shared model templates: Common model architectures and training pipelines shared across workspaces as templates, not as shared models
Centralized operational dashboard: A single view across all client workspaces showing model health, deployment status, and alerting

Naming Conventions

Consistent naming conventions across clients prevent confusion and enable automation.

Model naming convention: {client}{domain}{task}_{variant}

Examples:

acmeretailchurnpredictionv1
globexfinancefrauddetectionensemble
initechsupportticketclassificationmultilabel

Artifact naming convention: {modelname}{version}_{format}

Examples:

acmeretailchurnpredictionv12.1.0onnx
globexfinancefrauddetectionensemble1.0.3torchscript

Access Control

Role-based access control for agencies:

Data scientist: Can register models, view all models in their client workspace, compare model versions
ML engineer: All data scientist permissions plus promote models to staging, configure deployment parameters
Operations lead: All ML engineer permissions plus promote models to production, trigger rollbacks
Client viewer: Can view model metadata, metrics, and deployment status for their models (read-only)
Agency admin: Full access across all client workspaces, manage users and permissions

Registry Maintenance and Operations

Artifact Lifecycle Management

Model artifacts consume storage. Without lifecycle management, storage costs grow unboundedly.

Retention policy:

Production models: Retained indefinitely (or until the client contract ends)
Archived models (previous production versions): Retained for 12 months or as required by the client's regulatory requirements
Staging models (never promoted to production): Retained for 3 months
Development models: Retained for 30 days, then automatically deleted

Storage optimization:

Compress model artifacts before storage
Use tiered storage — keep recent artifacts in hot storage (S3 Standard) and older artifacts in cold storage (S3 Glacier)
Deduplicate shared components — if multiple model versions use the same preprocessing artifacts, store them once

Registry Monitoring

The registry itself is a critical system that needs monitoring.

Monitor:

Registry API availability and latency
Artifact storage capacity and growth rate
Registration and retrieval success rates
Deployment pipeline success rates triggered by registry events
User activity — who is promoting models, how frequently, at what times

Disaster Recovery

Registry backup strategy:

Metadata database: Daily backups with point-in-time recovery
Artifact storage: Cross-region replication for production model artifacts
Configuration: Infrastructure-as-code for the registry itself, stored in version control

Recovery procedure:

Restore metadata database from backup
Verify artifact storage integrity
Re-establish connections between the registry and deployment pipelines
Verify that production model serving is unaffected

Your Next Step

If you do not have a model registry today, start with MLflow. Install it, register every model you currently have in production (with whatever metadata you can reconstruct), and set up automated registration in one training pipeline. This takes a day. Then add the pre-promotion checklist as a gate before any model goes to production. Within a week, you will have a functioning registry that eliminates the worst model management risks. Within a month, you will wonder how you ever operated without one. The agencies that manage models professionally retain clients longer, respond to incidents faster, and scale their operations without proportionally scaling their headaches.

Why Agencies Need Model Registries

The Model Management Problem

Without a registry, agencies accumulate model management debt that compounds with every new model and every new client.

Common symptoms of missing model management:

Nobody knows which model version is currently in production for a given client
Rolling back to a previous model version requires reverse-engineering the deployment
Training artifacts are scattered across laptops, shared drives, and cloud storage buckets
Reproducibility is impossible — the exact code, data, and configuration that produced a model are not recorded
Model dependencies (preprocessing code, feature engineering pipelines, configuration files) are not tracked alongside the model
Audit requests for regulated clients cannot be answered ("which model made this prediction and what data was it trained on?")

The Business Case

Quantifiable benefits of a model registry:

Incident response time: Reduce model rollback time from hours to minutes
Deployment frequency: Enable confident weekly or daily model updates instead of quarterly manual deployments
Compliance readiness: Provide immediate answers to audit questions about model lineage, training data, and performance history
Team efficiency: Eliminate the time engineers spend searching for model artifacts and reconstructing deployment procedures
Client confidence: Demonstrate professional model management practices that differentiate your agency from competitors

Model Registry Architecture

Core Components

Version management: A versioning scheme that uniquely identifies every model version and supports state transitions — from "development" to "staging" to "production" to "archived."

Access control: Role-based access to models — data scientists can register new versions, ML engineers can promote to staging, and only authorized operators can promote to production.

API layer: A programmatic interface (REST API or SDK) that allows training pipelines to register models, deployment pipelines to fetch models, and monitoring systems to query model metadata.

Model Versioning Strategy

Semantic versioning for models:

Major version (v2.0.0): New model architecture, new training data schema, or breaking changes to input/output format
Minor version (v1.1.0): Retrained model with updated data, new features added, or hyperparameter changes
Patch version (v1.0.1): Bug fixes, configuration changes, or dependency updates that do not change model behavior

Stage-based lifecycle:

Development: Model is being trained and evaluated. Not ready for any form of production use.
Staging: Model has passed automated validation and is being tested in a staging environment with production-like data.
Production: Model is serving live traffic. Only one version per model can be in the "Production" stage at a time.
Archived: Model has been replaced by a newer version. Retained for rollback and audit purposes.
Deprecated: Model is scheduled for deletion. A grace period allows dependent systems to migrate.

What to Store in the Registry

For every model version, store:

Model artifact: The serialized model file in a format suitable for deployment
Training code reference: Git commit hash and repository URL for the exact code that produced the model
Training data reference: Dataset version identifier (DVC hash, data warehouse query, or feature store snapshot ID)
Hyperparameters: Every hyperparameter used during training
Training metrics: All evaluation metrics computed during training and validation
Feature schema: The expected input features, their data types, and expected value ranges
Output schema: The model's output format, including prediction types and confidence score ranges
Dependencies: Python package versions, system dependencies, and runtime requirements
Preprocessing artifacts: Tokenizers, encoders, scalers, and any other preprocessing components needed at inference time
Model card: A human-readable document describing the model's purpose, training data, performance characteristics, known limitations, and ethical considerations
Deployment configuration: Infrastructure requirements (GPU type, memory, replicas) and serving configuration

Choosing a Registry Solution

MLflow Model Registry

MLflow is the most widely adopted open-source model registry and the default choice for most agencies.

Strengths:

Mature, well-documented, large community
Built-in experiment tracking alongside model registry
Supports all major ML frameworks (scikit-learn, PyTorch, TensorFlow, XGBoost, LightGBM)
REST API and Python SDK for programmatic access
Integrates with major cloud platforms (AWS, GCP, Azure)
Stage-based model lifecycle management

Limitations:

No built-in model serving — you need a separate serving infrastructure
Limited access control in the open-source version (use Databricks-managed MLflow for enterprise access control)
Artifact storage requires external configuration (S3, GCS, Azure Blob)

Best for: Agencies that want a proven, flexible registry with strong experiment tracking integration and are comfortable managing their own infrastructure.

Weights and Biases Model Registry

W&B provides a model registry as part of their broader ML development platform.

Strengths:

Excellent visualization and comparison tools
Strong artifact lineage tracking — trace any model back to its training run, data, and code
Built-in team collaboration features
Managed infrastructure — no servers to maintain
Integrates tightly with W&B experiment tracking

Limitations:

SaaS-only (no self-hosted option for the full platform, though they offer a server product)
Pricing can be significant for large teams
Less flexible than MLflow for custom workflows

Best for: Agencies that want a managed solution with excellent visualization and are willing to pay for reduced operational overhead.

Cloud-Native Registries

Google Vertex AI Model Registry: Integrated with Vertex AI training and serving. Provides model evaluation and comparison tools. Best for agencies on Google Cloud.

Azure ML Model Registry: Part of Azure Machine Learning. Provides model cataloging, versioning, and deployment integration. Best for agencies on Azure.

Best for: Agencies that are committed to a single cloud provider and want tight integration with that provider's ML services.

Custom Registries

Custom registry components:

PostgreSQL or DynamoDB for metadata storage
S3 or GCS for artifact storage
A Python SDK for model registration and retrieval
A simple web UI for browsing and managing models
CI/CD integration for automated model promotion

Implementing Registry Workflows

Model Registration Workflow

Every model that could potentially be deployed to production should be registered automatically as part of the training pipeline.

Automated registration steps:

Training pipeline completes and produces a model artifact
Pipeline evaluates the model on the validation set and computes all required metrics
Pipeline registers the model in the registry with all metadata (metrics, hyperparameters, data reference, code reference, dependencies)
Model is assigned the "Development" stage
If the model passes automated validation gates (metrics above minimum thresholds), it is automatically promoted to "Staging"
Notification is sent to the ML engineer responsible for manual review and production promotion

Model Promotion Workflow

Promoting a model from staging to production is the highest-risk operation in the model lifecycle. It should be deliberate, validated, and reversible.

Pre-promotion checklist:

All evaluation metrics meet or exceed the current production model
No regression on any per-class or per-segment metric
Inference latency is within acceptable bounds
Model passes integration tests with the production serving infrastructure
Model passes fairness and bias checks
Model card is complete and reviewed
Rollback procedure is documented and tested

Promotion process:

ML engineer reviews the staging model and approves promotion
The registry updates the model stage from "Staging" to "Production"
The deployment pipeline detects the stage change and begins deployment
The new model is deployed alongside the current model in a canary or blue-green configuration
Production traffic is gradually shifted to the new model (5% then 20% then 50% then 100%)
At each stage, metrics are compared to the previous model
If metrics degrade at any stage, traffic is shifted back to the previous model and the promotion is rolled back
Once 100% of traffic is on the new model and metrics are stable, the previous model is archived

Model Rollback Workflow

Rollback must be fast and reliable. When a production model is causing problems, every minute counts.

Rollback process:

Operator triggers rollback via CLI, API, or dashboard
Registry identifies the most recent archived model for this model name
Deployment pipeline fetches the archived model and its dependencies
The archived model is deployed to the serving infrastructure
Traffic is shifted to the rolled-back model
The problematic model is demoted from "Production" to "Development" for investigation
An incident record is created, linking the model version, the trigger, and the resolution

Multi-Client Registry Management

Organization Structure

Agencies serving multiple clients need a registry structure that isolates client data while enabling cross-client operational efficiency.

Recommended structure:

Workspace per client: Each client has a separate workspace or project within the registry, ensuring data isolation
Shared model templates: Common model architectures and training pipelines shared across workspaces as templates, not as shared models
Centralized operational dashboard: A single view across all client workspaces showing model health, deployment status, and alerting

Naming Conventions

Consistent naming conventions across clients prevent confusion and enable automation.

Model naming convention: {client}{domain}{task}_{variant}

Examples:

acmeretailchurnpredictionv1
globexfinancefrauddetectionensemble
initechsupportticketclassificationmultilabel

Artifact naming convention: {modelname}{version}_{format}

Examples:

acmeretailchurnpredictionv12.1.0onnx
globexfinancefrauddetectionensemble1.0.3torchscript

Access Control

Role-based access control for agencies:

Data scientist: Can register models, view all models in their client workspace, compare model versions
ML engineer: All data scientist permissions plus promote models to staging, configure deployment parameters
Operations lead: All ML engineer permissions plus promote models to production, trigger rollbacks
Client viewer: Can view model metadata, metrics, and deployment status for their models (read-only)
Agency admin: Full access across all client workspaces, manage users and permissions

Registry Maintenance and Operations

Artifact Lifecycle Management

Model artifacts consume storage. Without lifecycle management, storage costs grow unboundedly.

Retention policy:

Production models: Retained indefinitely (or until the client contract ends)
Archived models (previous production versions): Retained for 12 months or as required by the client's regulatory requirements
Staging models (never promoted to production): Retained for 3 months
Development models: Retained for 30 days, then automatically deleted

Storage optimization:

Compress model artifacts before storage
Use tiered storage — keep recent artifacts in hot storage (S3 Standard) and older artifacts in cold storage (S3 Glacier)
Deduplicate shared components — if multiple model versions use the same preprocessing artifacts, store them once

Registry Monitoring

The registry itself is a critical system that needs monitoring.

Monitor:

Registry API availability and latency
Artifact storage capacity and growth rate
Registration and retrieval success rates
Deployment pipeline success rates triggered by registry events
User activity — who is promoting models, how frequently, at what times

Disaster Recovery

Registry backup strategy:

Metadata database: Daily backups with point-in-time recovery
Artifact storage: Cross-region replication for production model artifacts
Configuration: Infrastructure-as-code for the registry itself, stored in version control

Recovery procedure:

Restore metadata database from backup
Verify artifact storage integrity
Re-establish connections between the registry and deployment pipelines
Verify that production model serving is unaffected

Setting Up and Managing ML Model Registries — Version Control for Production Intelligence

Why Agencies Need Model Registries

The Model Management Problem

The Business Case

Model Registry Architecture

Core Components

Model Versioning Strategy

What to Store in the Registry

Choosing a Registry Solution

MLflow Model Registry

Weights and Biases Model Registry

Cloud-Native Registries

Custom Registries

Implementing Registry Workflows

Model Registration Workflow

Model Promotion Workflow

Model Rollback Workflow

Multi-Client Registry Management

Organization Structure

Naming Conventions

Access Control

Registry Maintenance and Operations

Artifact Lifecycle Management

Registry Monitoring

Disaster Recovery

Your Next Step

Agency Script Editorial

Related Articles

Delivering AI Analytics for Sports Organizations: From Player Performance to Fan Engagement

Real-Time Stream Processing for AI Applications: The Complete Delivery Guide

Delivering Survival Analysis for Customer Retention: The AI Agency Playbook

Ready to certify your AI capability?

Setting Up and Managing ML Model Registries — Version Control for Production Intelligence

Why Agencies Need Model Registries

The Model Management Problem

The Business Case

Model Registry Architecture

Core Components

Model Versioning Strategy

What to Store in the Registry

Choosing a Registry Solution

MLflow Model Registry

Weights and Biases Model Registry

Cloud-Native Registries

Custom Registries

Implementing Registry Workflows

Model Registration Workflow

Model Promotion Workflow

Model Rollback Workflow

Multi-Client Registry Management

Organization Structure

Naming Conventions

Access Control

Registry Maintenance and Operations

Artifact Lifecycle Management

Registry Monitoring

Disaster Recovery

Your Next Step

Agency Script Editorial

Related Articles

Delivering AI Analytics for Sports Organizations: From Player Performance to Fan Engagement

Real-Time Stream Processing for AI Applications: The Complete Delivery Guide

Delivering Survival Analysis for Customer Retention: The AI Agency Playbook

Ready to certify your AI capability?