Running Technical Spikes for Uncertain AI Projects: De-Risk Before You Commit
An agency signed a fixed-price contract to build a product recommendation engine for a fashion retailer. The proposal assumed that collaborative filtering on purchase history would produce good recommendations. Eight weeks into the twelve-week project, they discovered that the client's purchase data was too sparse โ most customers had only one or two purchases โ for collaborative filtering to work. They needed a content-based approach using product images and descriptions, which required a completely different architecture and a computer vision pipeline they had not planned for. The project went six weeks over timeline and $40,000 over budget. A two-week technical spike before signing the contract would have revealed the data sparsity issue and led to a proposal that reflected the actual technical approach needed. Two weeks of investigation would have saved six weeks of rework.
AI projects carry more technical uncertainty than traditional software projects. Will the model achieve the required accuracy? Will the data support the intended approach? Will the system meet latency requirements? Will the costs be sustainable at production scale? These questions have answers, but you often cannot answer them from experience alone โ you need to investigate. Technical spikes are structured, time-boxed investigations designed to answer specific technical questions before you commit to a full implementation. They are the most effective risk management tool in an AI agency's arsenal.
What a Technical Spike Is (and Is Not)
A technical spike is a focused, time-boxed investigation aimed at answering a specific question or reducing a specific uncertainty. It is not a mini-prototype, not a proof of concept, and not a phase one of the actual project.
A spike answers a question. "Can we achieve 90 percent accuracy on this classification task with the available training data?" "Can we serve inference at sub-200 millisecond latency on the client's infrastructure?" "Can we integrate with the client's legacy data warehouse within reasonable effort?" Each spike has a clear question and produces a clear answer.
A spike is time-boxed. One to three weeks is typical. The time box forces focus. If you cannot answer the question in three weeks, the question might be too broad or the answer might be that the approach is not feasible.
A spike produces a decision, not a deliverable. The output is knowledge that informs a go/no-go decision, an architectural choice, or a revised estimate. The code written during a spike is throwaway โ it is meant to learn, not to ship.
A spike is cheap insurance. A two-week spike that costs $15,000 can prevent a $100,000 project failure. Clients who understand this math are happy to pay for spikes. Clients who do not need education about the economics of uncertainty.
When to Run a Technical Spike
Not every AI project needs a spike. Run spikes when the cost of being wrong is high and the probability of being wrong is significant.
High-Uncertainty Scenarios That Warrant Spikes
Novel data types. If you have not worked with this type of data before โ specialized imagery, unusual sensor data, domain-specific text โ spike to validate that your approach works on this data.
Accuracy requirements near the state of the art. If the client needs accuracy that approaches the limits of what current technology can achieve, spike to verify that the requirement is achievable with available data and models.
Integration with unknown systems. If the project requires deep integration with client systems you have not worked with before, spike to validate that integration is feasible within the estimated effort.
Performance requirements at the edge of feasibility. If latency, throughput, or cost targets are aggressive, spike to validate that they are achievable with the intended architecture.
Data quality uncertainty. If you have not seen the client's actual data, spike to profile the data and validate your assumptions about quality, completeness, and suitability.
New technology or approach. If the project requires using a technology, model, or technique that your team has not used in production, spike to build familiarity and validate feasibility.
Scenarios That Usually Do Not Need Spikes
Well-understood tasks with proven approaches. If you have delivered similar projects multiple times with similar data and similar requirements, a spike adds cost without reducing risk.
Small, low-stakes projects. If the total project cost is comparable to the spike cost, just do the project. The spike is the project.
Clear go/no-go criteria. If you can determine feasibility from documentation, prior experience, or brief analysis without building anything, a spike is overkill.
Structuring a Technical Spike
A well-structured spike maximizes the learning per hour invested.
Define the Questions
Start by listing the specific questions the spike needs to answer. Be precise. "Will this work?" is too vague. "Can we achieve 85 percent classification accuracy on the client's document types using their existing labeled data of 2,000 examples?" is specific and answerable.
Prioritize questions by impact. If question one reveals that the project is not feasible, you do not need to answer questions two through five. Order your investigation so that high-impact questions are addressed first.
Good spike questions:
- Can we achieve X percent accuracy on task Y with dataset Z?
- Does the client's data contain sufficient signal for the intended prediction?
- Can we serve this model at sub-N millisecond latency on the target infrastructure?
- What is the per-inference cost at production scale?
- Can we extract the needed data from the client's legacy system through their API?
- Does the model produce acceptable quality on the client's edge cases and domain-specific inputs?
Set Success Criteria
For each question, define what constitutes a positive answer, a negative answer, and an inconclusive answer.
Positive: "We achieved 87 percent accuracy, exceeding the 85 percent threshold. The approach is validated."
Negative: "We achieved 62 percent accuracy, well below the 85 percent threshold. The approach does not work with this data."
Inconclusive: "We achieved 78 percent accuracy. With more data or prompt optimization, 85 percent might be achievable. Further investigation is warranted."
Having clear criteria prevents scope creep and post-spike ambiguity. Without them, a failed spike becomes "well, it sort of worked, let's keep going" โ which defeats the purpose.
Plan the Investigation
Break the spike into daily or multi-day chunks, each focused on a specific investigation activity.
Day one: Setup and data access. Get access to the client's data. Set up the development environment. Profile the data. This often takes longer than expected, so dedicating a full day prevents it from eating into investigation time.
Days two through four: Core investigation. Run the experiments that answer your primary questions. Build just enough code to produce meaningful results. Resist the temptation to build production-quality code โ this is throwaway.
Day five: Analysis and documentation. Analyze results, draw conclusions, prepare findings. Write the spike report.
Manage Scope Aggressively
Spike scope creep is real and dangerous. The temptation to "just try one more thing" can turn a one-week spike into a four-week project.
Stick to the questions. The spike exists to answer specific questions. If new questions emerge during the investigation, document them as future work rather than expanding the current spike.
Use shortcuts. Use sample data, not full data. Use pre-trained models, not custom-trained models. Use development infrastructure, not production infrastructure. The goal is learning, not building.
Time-box ruthlessly. When the time box expires, stop investigating and start analyzing. An incomplete spike that delivers clear partial answers is more valuable than a comprehensive spike that is delivered weeks late.
Spike Types for AI Projects
Different types of technical uncertainty call for different types of spikes.
Data Feasibility Spike
Purpose. Validate that the client's data can support the intended AI approach.
Activities. Profile the data โ volume, completeness, quality, distribution. Identify systematic issues โ missing values, inconsistencies, biases. Assess whether the data contains sufficient signal for the intended prediction. Estimate the data preparation effort required.
Common findings. Insufficient training data. Data quality too poor for supervised learning. Critical features missing or unreliable. Data distribution does not match intended use case. Integration complexity higher than expected.
Model Feasibility Spike
Purpose. Validate that available models and approaches can achieve the required performance on the client's specific task.
Activities. Establish a baseline โ how well does the simplest possible approach perform? Test the intended approach โ how well does it perform? Analyze the gap between current performance and the requirement. Identify what it would take to close the gap.
Common findings. Off-the-shelf models achieve surprisingly good performance, validating the approach. Or off-the-shelf models fall significantly short, suggesting the need for custom training, alternative approaches, or adjusted expectations.
Integration Spike
Purpose. Validate that the AI system can integrate with the client's existing systems within reasonable effort.
Activities. Test data access โ can you query the client's systems at the required speed and volume? Test API connectivity โ do the APIs work as documented? Identify integration gaps โ what adapters, transformations, or middleware are needed?
Common findings. APIs are poorly documented and behave differently than described. Data access is slower than required, necessitating caching or precomputation. Security and access control requirements are more complex than anticipated.
Performance Spike
Purpose. Validate that the system can meet latency, throughput, and cost requirements at production scale.
Activities. Benchmark model inference on target hardware. Measure end-to-end latency including data preparation, inference, and post-processing. Estimate costs at production volume. Identify optimization opportunities.
Common findings. Raw inference latency meets requirements, but end-to-end latency including data preparation does not. Costs at production scale are higher than initial estimates. Optimization techniques โ quantization, caching, batching โ can close the gap.
Spike Deliverables
The spike report is the primary deliverable. It should be clear, actionable, and honest.
What to Include
Executive summary. Two to three sentences answering the spike's primary questions with clear conclusions.
Methodology. What you tested, how you tested it, what data you used, what assumptions you made. Enough detail for someone to understand and challenge your conclusions.
Results. Quantitative results with context. "87 percent accuracy" means nothing without the baseline, the dataset size, the evaluation methodology, and the significance.
Analysis. Interpretation of results. What do they mean for the project? What risks do they reveal? What opportunities do they suggest?
Recommendations. Clear recommendations for the path forward. Proceed as planned, modify the approach, adjust the timeline, adjust the budget, or do not proceed.
Open questions. Questions that emerged during the spike but were not answered. Future investigations that would reduce remaining uncertainty.
Effort and cost implications. If the spike reveals that the approach needs to change, estimate the impact on project timeline and budget.
Presenting to Clients
Present spike findings honestly, including negative results. Clients value transparency, and a spike that says "this approach won't work, here's what we should do instead" is far more valuable than a spike that says "probably fine" and leads to a failed project.
Frame negative results as savings. "This spike prevented us from pursuing an approach that would have cost $80,000 and failed. Now we know to use approach B instead, which has a higher probability of success."
Present alternatives with trade-offs. When the spike reveals issues, present alternative approaches with their pros, cons, costs, and risks. Give the client the information they need to make an informed decision.
Be specific about residual uncertainty. If the spike reduced but did not eliminate uncertainty, be explicit about what remains unknown and what would be required to resolve it.
Making Spikes Part of Your Process
The most effective agencies run spikes as a standard part of their pre-project process, not as an exception.
Include spike recommendations in discovery. During client discovery, identify technical uncertainties and recommend spikes where appropriate. Frame them as standard professional practice, not as a sign of weakness.
Price spikes appropriately. Spikes should be priced as standalone engagements โ separate from the project proposal. This ensures the spike happens regardless of whether the project proceeds.
Build spike capability. Maintain the infrastructure, data access patterns, and evaluation tools needed to run spikes quickly. The faster you can stand up a spike, the more practical they become.
Learn from spikes. Track spike outcomes โ how often did spike conclusions prove correct? How often did projects that skipped spikes encounter the issues that a spike would have caught? This data justifies the spike investment to future clients.
Technical spikes are not a delay tactic. They are the fastest path to a successful project because they prevent you from building the wrong thing. The agencies that run spikes deliver more projects on time and on budget because they start each project with validated assumptions instead of optimistic guesses. The agencies that skip spikes are faster to start but slower to finish โ and some of them never finish at all.