Two-Week Sprints Were Finishing Only Half the Planned Work

A nineteen-person AI agency in Seattle ran two-week sprints across all their delivery teams. It was the standard agile cadence they had adopted from day one, borrowed from a product engineering handbook without much thought. For three years, it worked well enough. Then the operations manager noticed a troubling pattern: sprint completion rates across the agency averaged fifty-eight percent. Nearly half the work planned for each sprint was not finishing on time.

The teams were not underperforming. They were planning against a cadence that did not match the nature of AI work. Data cleaning tasks that were estimated at three days took eight because the data was messier than expected. Model training runs were scheduled to complete within the sprint but ran over because hyperparameter tuning required more iterations. Client review cycles were crammed into the last two days of the sprint, creating a bottleneck that pushed approvals into the next sprint.

The operations manager experimented with different cadences for different project phases. Discovery sprints ran one week. Build sprints ran three weeks. Evaluation and client review periods ran two weeks. Sprint completion rates climbed to seventy-nine percent. Team morale improved because people were hitting their targets instead of carrying over unfinished work every cycle.

The sprint cadence that works for a product team building web features is not the cadence that works for an AI team building data pipelines, training models, and deploying inference systems. The rhythm of the work should dictate the rhythm of the process, not the other way around.

Why Default Two-Week Sprints Often Fail for AI Work

Two-week sprints are the most common cadence in software development, and for traditional feature development, they work well. A developer can spec, build, test, and ship a feature in two weeks. The work is relatively predictable, the feedback cycles are fast, and the deliverable at the end of the sprint is tangible.

AI work has different dynamics.

Data work is inherently unpredictable. You might plan three days for data cleaning and discover that the client's dataset has a format inconsistency that affects forty percent of the records. That discovery cannot be predicted during sprint planning. A two-week sprint does not have enough buffer for these surprises.

Training runs have fixed durations. A model training run might take four days on available hardware. You cannot make it faster without different infrastructure or a different approach. If the run starts on day three of the sprint, it finishes on day seven, leaving only three working days for evaluation, debugging, and iteration before the sprint ends. That is not enough time for meaningful work after the run completes.

Feedback cycles with clients are longer. In traditional software, a product owner can review a feature in thirty minutes. An AI model evaluation requires the client to review performance metrics, test edge cases, and often consult with domain experts. This review cycle typically takes three to five business days, which is a quarter of a two-week sprint.

Experimentation does not fit into fixed boxes. If you plan three model experiments for a sprint and the first experiment reveals that the approach needs to change, the remaining two experiments are invalid. You need to replan, which the sprint did not account for.

Integration work has external dependencies. Connecting a model to a client's production environment depends on the client's IT team, their change management processes, and their infrastructure availability. These dependencies operate on their own timeline, not yours.

Matching Sprint Cadence to Project Phase

Different phases of an AI project have different work rhythms. Instead of forcing a single cadence across the entire project, match the cadence to the phase.

Discovery Phase: One-Week Sprints

Discovery work is fast-moving and involves frequent direction changes. Short sprints keep the team aligned and allow rapid adjustment.

Typical discovery activities:

Stakeholder interviews
Data source identification and initial assessment
Technical feasibility analysis
Solution architecture drafting
Scope and estimate development

Why one week works: Discovery involves many small tasks and frequent client interaction. Weekly check-ins keep the client engaged and allow the team to adjust direction based on each week's findings.

Sprint deliverable: Each discovery sprint should produce a tangible output: a data assessment report, a technical feasibility analysis, or a solution design document.

Data Preparation Phase: Two to Three-Week Sprints

Data work is the most unpredictable phase. Longer sprints provide buffer for the surprises that invariably appear.

Typical data preparation activities:

Data extraction and ingestion pipeline development
Data cleaning and transformation
Feature engineering
Data labeling or label quality assessment
Dataset splitting and versioning

Why two to three weeks works: Data issues cascade. Fixing one problem often reveals another. A one-week sprint does not provide enough time to work through a full cleaning and preparation cycle. A three-week sprint gives the team time to discover, diagnose, and resolve data issues without carrying over significant work.

Sprint deliverable: A prepared, versioned dataset that meets the quality criteria for the next phase.

Model Development Phase: Three-Week Sprints

Model development involves training runs, evaluation cycles, and iteration. Three weeks provides enough time for two to three full experiment cycles.

Typical model development activities:

Baseline model training and evaluation
Hyperparameter tuning
Architecture experimentation
Feature selection optimization
Error analysis and iteration

Why three weeks works: A typical experiment cycle (configure, train, evaluate, analyze) takes three to five days. A three-week sprint allows two to three full cycles, providing enough iteration to make meaningful progress while still having a clear end point for review and planning.

Sprint deliverable: Model evaluation results with documented metrics, comparison to baselines, and recommendations for the next sprint.

Integration and Deployment Phase: Two-Week Sprints

Integration work is more predictable than model development because it involves known systems and defined interfaces. Two-week sprints work well here.

Typical integration activities:

API development and testing
Client system integration
Performance optimization
Security hardening
Deployment pipeline setup

Why two weeks works: Integration tasks are more granular and predictable than data or model work. They also involve more frequent coordination with the client's technical team, which benefits from a regular two-week checkpoint.

Sprint deliverable: Working integration in the staging environment, verified through integration tests.

Evaluation and Handoff Phase: Two-Week Sprints

The final phase involves validation, documentation, and knowledge transfer.

Typical activities:

Production validation and monitoring setup
Documentation completion
Knowledge transfer sessions
Client acceptance testing
Project closure activities

Why two weeks works: These activities are largely predictable and benefit from a defined timeline that creates urgency around completion.

Alternative Cadence Models

Sprints are not the only option. Some AI work benefits from alternative delivery frameworks.

Kanban for Ongoing Work

If your team manages multiple client retainers with a mix of small tasks and larger initiatives, Kanban may be more effective than sprints.

How Kanban works for agencies: Instead of planning work into time-boxed sprints, tasks flow through a board with columns (backlog, in progress, review, done). Work-in-progress limits prevent overcommitment. The team pulls the next highest-priority task when capacity is available.

When Kanban works well:

Retainer engagements with a mix of support, enhancement, and small project work
Teams that handle frequent interrupt-driven requests
Situations where work items vary dramatically in size and predictability

When Kanban works poorly:

Projects with fixed deadlines and defined milestones (sprints provide better time-awareness)
Teams that struggle with prioritization without the forcing function of sprint planning
Client relationships that expect regular sprint reviews and demonstrations

Time-Boxed Experiments

For the model development phase specifically, some agencies use "experiment timebox" instead of traditional sprints.

How it works: The team defines a set of experiments (model architectures, hyperparameter ranges, feature sets) and a time budget (for example, two weeks). The goal is to complete as many experiments as possible within the timebox and then review results with the client.

The difference from a sprint: An experiment timebox does not commit to specific outcomes. It commits to a specific investment of time. The output is a set of results and insights, not a guaranteed performance improvement. This framing is more honest about the uncertainty inherent in ML experimentation.

Client communication: "In the next two weeks, we will run experiments with three different model architectures and two different feature engineering approaches. At the end of that period, we will present the results and recommend the path forward." This sets appropriate expectations without overpromising.

Milestone-Based Delivery

For smaller projects or projects with very clear deliverables, milestone-based delivery may be simpler than sprints.

How it works: The project is divided into milestones with defined deliverables and target dates. Work between milestones is managed informally (daily standups, task boards) without the overhead of sprint planning, review, and retrospective ceremonies.

When this works: Projects under eight weeks with three to five clear milestones and a small team (two to four people). The ceremony overhead of sprints can consume a disproportionate amount of time for small teams.

Sprint Ceremonies Adapted for AI Work

If you use sprints, adapt the standard ceremonies to fit AI delivery.

Sprint Planning

Standard approach: Estimate stories in points, load the sprint with stories up to the team's velocity.

AI adaptation: For data and model work, estimate in ranges rather than points. "This data cleaning task will take three to five days" is more honest than "this is a five-point story." Plan the sprint to the lower end of the range and hold stretch tasks in reserve. This accounts for the uncertainty without padding every estimate.

Include client dependencies in the plan. If a sprint depends on client data delivery, approval, or feedback, identify those dependencies during planning and confirm timelines with the client before the sprint starts.

Sprint Review

Standard approach: Demo completed features to stakeholders.

AI adaptation: Present results, not just demos. For a model development sprint, present evaluation metrics, error analysis, and insights alongside any working software. For a data preparation sprint, present data quality metrics and sample outputs. Clients need to understand what the team learned, not just what they built.

Include the "what we tried that did not work" section. In AI work, failed experiments are valuable because they narrow the solution space. Presenting them shows the client that the team is being rigorous, not just lucky.

Retrospective

Standard approach: What went well, what did not, what to change.

AI adaptation: Add a specific question: "Was the sprint cadence right for the work we did this sprint?" If the team consistently finds that work is too compressed or too drawn out, adjust the cadence. The retro is where cadence experimentation should be discussed.

Finding Your Agency's Cadence

There is no universal correct cadence. The right approach depends on your team, your clients, and the type of work you do.

Start with the phase-matched cadence described above (one week for discovery, two to three weeks for data, three weeks for model development, two weeks for integration). Use it for two or three projects and observe the results.

Track sprint completion rate. If the rate is consistently below seventy percent, the sprints are too short for the work being done. Lengthen them.

Track carryover rate. How much work from one sprint carries into the next? Persistent carryover means the planning or estimation process is not accounting for the actual work rhythm.

Ask the team. In retros, ask whether the cadence felt right. Engineers who consistently feel rushed are working with too-short sprints. Engineers who feel like they are filling time at the end are working with too-long sprints.

Ask the clients. Do they feel they get enough visibility into progress? Are sprint reviews happening at a useful frequency? Some clients want weekly touchpoints regardless of sprint length, which you can accommodate through status updates separate from sprint ceremonies.

Your Next Step

Review the sprint completion rates across your active projects for the last three months. If the average is below seventy percent, your cadence is probably too short for the type of work being done.

For your next project, try the phase-matched cadence. Use one-week sprints during discovery, extend to three weeks during model development, and return to two weeks for integration. Track completion rates and team satisfaction.

If you are already comfortable with your cadence, audit whether you are adapting ceremonies appropriately. Are your sprint reviews presenting insights and results, or just demos? Is your planning accounting for data and model uncertainty? Is your retro evaluating the cadence itself?

Sprint cadence is not a religious commitment. It is a tool for organizing work. The best agencies adjust the tool to fit the work rather than forcing the work to fit the tool. Find the rhythm that matches how your team actually delivers, and both your productivity and your team's satisfaction will improve.

Why Default Two-Week Sprints Often Fail for AI Work

AI work has different dynamics.

Matching Sprint Cadence to Project Phase

Different phases of an AI project have different work rhythms. Instead of forcing a single cadence across the entire project, match the cadence to the phase.

Discovery Phase: One-Week Sprints

Discovery work is fast-moving and involves frequent direction changes. Short sprints keep the team aligned and allow rapid adjustment.

Typical discovery activities:

Stakeholder interviews
Data source identification and initial assessment
Technical feasibility analysis
Solution architecture drafting
Scope and estimate development

Sprint deliverable: Each discovery sprint should produce a tangible output: a data assessment report, a technical feasibility analysis, or a solution design document.

Data Preparation Phase: Two to Three-Week Sprints

Data work is the most unpredictable phase. Longer sprints provide buffer for the surprises that invariably appear.

Typical data preparation activities:

Data extraction and ingestion pipeline development
Data cleaning and transformation
Feature engineering
Data labeling or label quality assessment
Dataset splitting and versioning

Sprint deliverable: A prepared, versioned dataset that meets the quality criteria for the next phase.

Model Development Phase: Three-Week Sprints

Model development involves training runs, evaluation cycles, and iteration. Three weeks provides enough time for two to three full experiment cycles.

Typical model development activities:

Baseline model training and evaluation
Hyperparameter tuning
Architecture experimentation
Feature selection optimization
Error analysis and iteration

Sprint deliverable: Model evaluation results with documented metrics, comparison to baselines, and recommendations for the next sprint.

Integration and Deployment Phase: Two-Week Sprints

Integration work is more predictable than model development because it involves known systems and defined interfaces. Two-week sprints work well here.

Typical integration activities:

API development and testing
Client system integration
Performance optimization
Security hardening
Deployment pipeline setup

Sprint deliverable: Working integration in the staging environment, verified through integration tests.

Evaluation and Handoff Phase: Two-Week Sprints

The final phase involves validation, documentation, and knowledge transfer.

Typical activities:

Production validation and monitoring setup
Documentation completion
Knowledge transfer sessions
Client acceptance testing
Project closure activities

Why two weeks works: These activities are largely predictable and benefit from a defined timeline that creates urgency around completion.

Alternative Cadence Models

Sprints are not the only option. Some AI work benefits from alternative delivery frameworks.

Kanban for Ongoing Work

If your team manages multiple client retainers with a mix of small tasks and larger initiatives, Kanban may be more effective than sprints.

When Kanban works well:

Retainer engagements with a mix of support, enhancement, and small project work
Teams that handle frequent interrupt-driven requests
Situations where work items vary dramatically in size and predictability

When Kanban works poorly:

Projects with fixed deadlines and defined milestones (sprints provide better time-awareness)
Teams that struggle with prioritization without the forcing function of sprint planning
Client relationships that expect regular sprint reviews and demonstrations

Time-Boxed Experiments

For the model development phase specifically, some agencies use "experiment timebox" instead of traditional sprints.

Milestone-Based Delivery

For smaller projects or projects with very clear deliverables, milestone-based delivery may be simpler than sprints.

Sprint Ceremonies Adapted for AI Work

If you use sprints, adapt the standard ceremonies to fit AI delivery.

Sprint Planning

Standard approach: Estimate stories in points, load the sprint with stories up to the team's velocity.

Sprint Review

Standard approach: Demo completed features to stakeholders.

Retrospective

Standard approach: What went well, what did not, what to change.

Finding Your Agency's Cadence

There is no universal correct cadence. The right approach depends on your team, your clients, and the type of work you do.

Track sprint completion rate. If the rate is consistently below seventy percent, the sprints are too short for the work being done. Lengthen them.

Track carryover rate. How much work from one sprint carries into the next? Persistent carryover means the planning or estimation process is not accounting for the actual work rhythm.

Your Next Step

Review the sprint completion rates across your active projects for the last three months. If the average is below seventy percent, your cadence is probably too short for the type of work being done.

Two-Week Sprints Were Finishing Only Half the Planned Work

Why Default Two-Week Sprints Often Fail for AI Work

Matching Sprint Cadence to Project Phase

Discovery Phase: One-Week Sprints

Data Preparation Phase: Two to Three-Week Sprints

Model Development Phase: Three-Week Sprints

Integration and Deployment Phase: Two-Week Sprints

Evaluation and Handoff Phase: Two-Week Sprints

Alternative Cadence Models

Kanban for Ongoing Work

Time-Boxed Experiments

Milestone-Based Delivery

Sprint Ceremonies Adapted for AI Work

Sprint Planning

Sprint Review

Retrospective

Finding Your Agency's Cadence

Your Next Step

Agency Script Editorial

Related Articles

Understaffed or Overstaffed? Both Camps Were Right.

Optimizing Daily Standups for Distributed AI Agency Teams

Complete Utilization Rate Management Guide — The Metric That Makes or Breaks Agency Profitability

Ready to certify your AI capability?

Two-Week Sprints Were Finishing Only Half the Planned Work

Why Default Two-Week Sprints Often Fail for AI Work

Matching Sprint Cadence to Project Phase

Discovery Phase: One-Week Sprints

Data Preparation Phase: Two to Three-Week Sprints

Model Development Phase: Three-Week Sprints

Integration and Deployment Phase: Two-Week Sprints

Evaluation and Handoff Phase: Two-Week Sprints

Alternative Cadence Models

Kanban for Ongoing Work

Time-Boxed Experiments

Milestone-Based Delivery

Sprint Ceremonies Adapted for AI Work

Sprint Planning

Sprint Review

Retrospective

Finding Your Agency's Cadence

Your Next Step

Agency Script Editorial

Related Articles

Understaffed or Overstaffed? Both Camps Were Right.

Optimizing Daily Standups for Distributed AI Agency Teams

Complete Utilization Rate Management Guide — The Metric That Makes or Breaks Agency Profitability

Ready to certify your AI capability?