AGENCYSCRIPT
CoursesEnterpriseBlog
👑FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

Why Code Standards Matter More in Agencies Than Product CompaniesDefining Your Code StandardsStyle StandardsStructure StandardsDocumentation StandardsAI-Specific StandardsThe Code Review ProcessWho ReviewsWhat Reviewers Should CheckReview EtiquetteAdopting Standards in an Existing AgencyMeasuring Code QualityYour Next Step
Home/Blog/Your Best Engineer Just Quit. Can Anyone Read the Code?
Operations

Your Best Engineer Just Quit. Can Anyone Read the Code?

A

Agency Script Editorial

Editorial Team

·March 21, 2026·11 min read
ai agency code standardscode reviewengineering qualitydelivery consistency

A twenty-person AI agency in Charlotte won a new client engagement and assigned their best ML engineer to the project. Four months into the engagement, that engineer accepted a position at a large tech company and gave two weeks notice. The project manager assigned a replacement engineer from a different delivery team.

The replacement engineer spent her first three days trying to understand the codebase. There were no type hints. Variable names were single letters. The data pipeline had no comments explaining why certain transformations were applied. The model training script used hardcoded paths that only worked on the original engineer's machine. Configuration was scattered across environment variables, YAML files, and inline constants with no documentation about which settings controlled what.

The replacement engineer eventually got up to speed, but it took two weeks instead of two days. The project timeline slipped by ten days. The client noticed and asked uncomfortable questions about the agency's ability to manage staffing transitions.

None of this would have happened if the agency had code standards. Not because standards prevent people from leaving, but because standards ensure that the code they leave behind is readable, maintainable, and transferable.

Why Code Standards Matter More in Agencies Than Product Companies

In a product company, the same team often works on the same codebase for years. Institutional knowledge accumulates organically. If the code is messy, the people who wrote it are usually around to explain it.

Agencies operate differently, and those differences make code standards essential.

Engineers rotate between projects. Your ML engineer might work on a healthcare NLP project this quarter and a retail computer vision project next quarter. When they return to a project or a teammate takes over, standards ensure the code is comprehensible without the original author's presence.

Multiple teams produce client-facing deliverables. If Team A's code follows one style and Team B's follows another, clients who work with both teams see inconsistency. That inconsistency undermines confidence in your agency's engineering quality.

Code is often handed off to clients. Unlike internal product code that stays in-house, agency code frequently becomes the client's property. Sloppy code reflects poorly on your agency and creates maintenance headaches that damage the relationship long after the project ends.

Subcontractors and freelancers contribute. When you bring in external help, code standards are the only way to ensure their contributions match your quality expectations.

Projects have finite lifespans but long maintenance tails. You might actively develop a system for six months, but the client will maintain it for years. Standards ensure the code is maintainable by people who were not involved in writing it.

Defining Your Code Standards

Code standards should cover four areas: style, structure, documentation, and AI-specific practices.

Style Standards

Style standards eliminate debates about formatting and make code visually consistent across projects.

For Python (the dominant language in AI work):

  • Follow PEP 8 for general formatting
  • Use a formatter (Black) to enforce style automatically
  • Use a linter (Ruff or Flake8) to catch style violations and common errors
  • Maximum line length of 88 characters (Black's default)
  • Use type hints for all function signatures and return values
  • Use descriptive variable names (no single-letter variables except for loop counters and mathematical notation where the variable's meaning is universally understood)

For TypeScript (common in API and frontend work):

  • Use ESLint with a consistent configuration across all projects
  • Use Prettier for formatting
  • Enable strict mode in TypeScript configuration
  • Use interfaces over type aliases for object shapes
  • Use named exports over default exports

Enforce style automatically. Do not rely on human reviewers to catch formatting issues. Configure pre-commit hooks and CI checks that reject code that does not pass the formatter and linter. This removes style from the review conversation entirely.

Structure Standards

Structure standards define how code is organized within a project.

Repository structure: Define a standard project template that every new project starts from. This template should include:

  • A consistent directory layout (src, tests, configs, docs, scripts)
  • A pre-configured CI/CD pipeline
  • A standard README template
  • Pre-configured linting and formatting tools
  • A standard .gitignore

Module organization:

  • Separate data processing, model training, model serving, and utilities into distinct modules
  • Keep individual files under 300 lines where practical
  • Use a consistent naming convention for files and directories

Configuration management:

  • All configuration should be in dedicated config files, not hardcoded in source code
  • Use environment variables for secrets and deployment-specific settings
  • Use configuration files (YAML, TOML, or JSON) for application settings
  • Document every configuration parameter

Error handling:

  • Define a standard approach to error handling (exceptions vs. result types)
  • Log errors with sufficient context for debugging
  • Never silently swallow exceptions
  • Use custom exception classes for domain-specific errors

Documentation Standards

Documentation standards ensure that code is understandable without reading every line.

Every function and class needs a docstring that explains:

  • What the function does (one sentence)
  • What the parameters are and what they expect
  • What the function returns
  • Any side effects or important behavior notes
  • Example usage for complex functions

Every module needs a module-level docstring that explains:

  • The purpose of the module
  • The key classes and functions it contains
  • How it relates to other modules in the project

Every repository needs a README that includes:

  • Project overview and purpose
  • Setup instructions (how to install dependencies, configure the environment, run the code)
  • Architecture overview (how the major components fit together)
  • Deployment instructions
  • Testing instructions

Data pipeline documentation:

  • Every data transformation should have a comment explaining why it exists, not just what it does
  • Data schema changes should be documented in a changelog
  • Data quality expectations should be documented alongside the pipeline

AI-Specific Standards

AI agency code has concerns that standard software development standards do not cover.

Experiment tracking:

  • Every training run must be logged to the experiment tracking system (Weights and Biases, MLflow, etc.)
  • Logged information must include: hyperparameters, data version, model architecture, evaluation metrics, and training duration
  • Experiments must be reproducible from the logged information

Model evaluation:

  • Every model must have an evaluation script that produces a standardized report
  • Evaluation must use a held-out test set that is never used during training or hyperparameter tuning
  • Evaluation metrics must include the metrics specified in the client's SOW, not just the metrics the engineer prefers

Data versioning:

  • Training data must be versioned (using DVC, Delta Lake, or a similar tool)
  • Every training run must reference a specific data version
  • Data transformations must be repeatable from the versioned raw data

Model serving:

  • Models must be served through a standardized API format
  • API contracts must be documented with input/output schemas
  • Health check and monitoring endpoints must be included
  • Inference latency must be profiled and documented

Reproducibility:

  • Random seeds must be fixed and documented
  • Dependencies must be pinned to exact versions
  • Container images used for training and serving must be tagged and stored

The Code Review Process

Standards define what good code looks like. Code reviews enforce those standards and catch issues that automated tools miss.

Who Reviews

Every PR needs at least one review from a qualified engineer. "Qualified" means someone who understands the technology being used and has enough context about the project to evaluate the changes meaningfully.

For high-stakes changes (model architecture, data pipeline logic, production deployment configurations), require two reviews, at least one from a senior engineer or tech lead.

Rotate reviewers across projects. This builds cross-project knowledge and ensures multiple people can support any given codebase.

What Reviewers Should Check

Automated checks handle:

  • Code formatting and style
  • Type checking
  • Linting rules
  • Test passing
  • Security scanning

Human reviewers should focus on:

  • Correctness. Does the code actually do what the PR description says it does? Are there edge cases that are not handled?
  • Design. Is the approach appropriate for the problem? Are there simpler alternatives? Will this design scale?
  • Readability. Can another engineer understand this code without asking the author? Are names clear? Is the logic straightforward?
  • Maintainability. Will this code be easy to modify in the future? Are there hidden dependencies or tight couplings?
  • Testing. Are the tests adequate? Do they cover the important cases? Are they testing behavior, not implementation?
  • AI-specific concerns. Are experiments logged properly? Are data transformations documented? Are evaluation metrics appropriate? Is the model reproducible?

Review Etiquette

For reviewers:

  • Be specific and constructive. "This is wrong" is unhelpful. "This approach might fail when the input is empty because X. Consider handling that case like Y" is useful.
  • Distinguish between blocking issues and suggestions. Use labels like "blocking" for things that must be fixed and "nit" or "suggestion" for things that are optional improvements.
  • Respond to review requests within four hours during business hours. Slow reviews slow the entire team.
  • Approve when the code meets standards, even if you would have written it differently. Standards, not personal preference, are the bar.

For authors:

  • Keep PRs small. Under 400 lines of changes whenever possible.
  • Write clear PR descriptions that explain what, why, and how to test.
  • Respond to review comments promptly and professionally.
  • Do not take feedback personally. The reviewer is improving the code, not critiquing you.

Adopting Standards in an Existing Agency

If your agency does not have code standards today, adopting them requires care. Dumping a hundred-page style guide on the team and demanding compliance will fail.

Start with automated enforcement. Add a formatter (Black for Python, Prettier for TypeScript) and a linter to your CI pipeline. This handles the majority of style issues without any cultural change.

Introduce standards incrementally. Start with the three to five most impactful standards (type hints, docstrings, configuration management) and enforce those for new code. Do not require the team to retroactively refactor existing codebases.

Get team buy-in. Present the standards to the team, explain the rationale (consistency, maintainability, client perception), and invite feedback. People follow standards they helped shape.

Lead by example. The tech lead and senior engineers should follow the standards rigorously. If leadership cuts corners, nobody else will take the standards seriously.

Review and update standards semi-annually. Standards should evolve with the team's needs and the industry's best practices. A standard that was appropriate at ten engineers might need revision at thirty.

Measuring Code Quality

Track a few metrics to assess whether your standards and review process are working.

  • PR review turnaround time. How quickly are reviews completed? Increasing times suggest the process is creating bottlenecks.
  • Defects per project. Are client-reported bugs decreasing over time? Standards should reduce the defect rate.
  • Onboarding time for new project assignments. How quickly can an engineer become productive on a new project? Better standards and documentation should reduce this.
  • Code review comment categories. What are reviewers commenting on most? If it is style issues, your automation needs improvement. If it is design issues, your standards might need more architectural guidance.

Your Next Step

If your agency has no code standards today, start with three actions this week.

First, add Black (or your language's equivalent formatter) and a linter to every active repository's CI pipeline. This eliminates style inconsistency automatically.

Second, create a one-page code standards document covering your five most important rules: type hints, docstrings, configuration management, error handling, and testing expectations. Share it with the team.

Third, for your next PR, write a thorough review that models the standard you want to set. Show the team what a good review looks like through practice, not just policy.

Standards are not about bureaucracy. They are about making every project feel like it was built by the same professional team, regardless of who actually wrote the code. That consistency is what separates agencies that scale their engineering quality from those that struggle with every staffing transition.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

Operations

Understaffed or Overstaffed? Both Camps Were Right.

You cannot manage what you cannot see. Here is how to build a team capacity dashboard that prevents burnout, eliminates bench time, and keeps projects staffed correctly.

A
Agency Script Editorial
March 21, 2026·12 min read
Operations

Optimizing Daily Standups for Distributed AI Agency Teams

Optimized standups keep distributed AI agency teams aligned without consuming the focused work time that engineers need to ship quality deliverables.

A
Agency Script Editorial
March 21, 2026·10 min read
Operations

Complete Utilization Rate Management Guide — The Metric That Makes or Breaks Agency Profitability

A 5% shift in utilization can swing agency profit by 30% or more. Here is the definitive guide to measuring, managing, and optimizing the most important metric in your agency.

A
Agency Script Editorial
March 21, 2026·13 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification