Set Plays for Turning Models Into Idea Engines

A model that drafts emails is useful. A model that helps you see possibilities you had not considered is something else entirely. The difference is not the model—it is the prompt and the process wrapped around it. Most teams default to using language models as confirmation machines: they ask a question they already half-answered and accept the response that matches their prior. Hypothesis generation is the opposite discipline. You deliberately ask the model to widen the field, surface candidate explanations, and propose tests, so that you end up considering ideas you would never have reached alone.

The trouble is that this rarely survives contact with a busy week. One analyst gets good at it, everyone else asks the model to summarize, and the capability stays trapped in a single head. A playbook fixes that. It does not teach the technique in the abstract; it tells you, in the middle of real work, which move to make, who owns it, and what to do when the output is thin.

This article lays out the plays, their triggers, the owners who run them, and the sequence for installing the whole thing in a team. If you are new to the underlying mechanics, start with the step-by-step approach to grounding outputs and return when you want to operationalize idea generation itself.

What Hypothesis Generation Actually Asks Of A Model

Before the plays, get the framing right. A hypothesis is a testable claim about why something is happening or what will happen if you act. Generating good ones means producing claims that are specific, falsifiable, and worth the cost of testing. A model is well suited to the first half of that work—producing many candidate claims quickly—and poorly suited to the second half, which is judgment about which claims matter.

The division of labor

The model proposes; humans dispose. Treat every generated hypothesis as a draft, not a verdict.
Volume is the model's gift. Ask for fifteen explanations, not three, then cut.
Specificity is your job. A vague hypothesis is unfalsifiable and therefore useless.

Why naive prompting fails here

Asking "what's causing X?" yields the three most common textbook answers, not the situation-specific ones.
Without constraints, models converge on safe, generic claims that no one can act on.
A single round produces a list; real generation needs iteration that prunes and recombines.

The Plays

Each play has a trigger that tells you when to run it, a method, and an owner who is accountable for the output. Keep the set small enough to remember.

Play 1: The Divergent Sweep

Trigger: A surprising result, an unexplained metric move, or the start of a new initiative.
Method: Prompt the model to list as many distinct candidate explanations as it can, explicitly instructing it to include unlikely ones and to avoid repeating the same idea in different words.
Owner: The analyst or strategist closest to the data.

Play 2: The Adversarial Recast

Trigger: You already have a favored hypothesis and want to avoid confirmation bias.
Method: Ask the model to argue the strongest case against your leading idea and to propose alternatives that would explain the same evidence.
Owner: A peer who did not originate the favored hypothesis.

Play 3: The Test Designer

Trigger: A hypothesis has survived the first cut and needs a cheap way to be proven wrong.
Method: Prompt the model to propose the smallest experiment, query, or observation that would falsify the claim, ranked by cost.
Owner: Whoever will run the test.

Play 4: The Recombination Pass

Trigger: Your list has gone stale and ideas feel incremental.
Method: Feed the model the surviving hypotheses and ask it to combine pairs into novel claims or to invert assumptions shared across all of them.
Owner: The session facilitator.

Triggers And Sequencing

Plays run in a loop, not a line. A typical cycle moves from divergence to pruning to testing and back.

The standard sequence

Run the Divergent Sweep to flood the field with candidates.
Cluster the output by hand, collapsing near-duplicates.
Apply the Adversarial Recast to the two or three survivors.
Hand each survivor to the Test Designer.
After early test results, run a Recombination Pass on what remains.

When to break the sequence

Skip straight to the Test Designer when a hypothesis is already obvious and you only need a cheap falsification.
Repeat the Divergent Sweep with a different framing if the first round was too narrow.
Stop entirely once the cost of generating exceeds the value of one more idea.

Owners And Accountability

A play without an owner is a suggestion. Assign one person per play per cycle, and make the handoffs explicit so ideas do not die between steps.

Defining ownership

The generator runs the sweep and is accountable for producing genuine variety, not a list of synonyms.
The pruner owns the cut and must be able to defend why each survivor stayed.
The tester owns falsification and reports results back into the loop.

Avoiding the lone-genius trap

Rotate the generator role so the skill spreads across the team.
Document the prompts that worked in a shared library, the way you would with any reusable asset—see how teams manage a prompt library.
Review generated hypotheses in a standing meeting so judgment is collective.

Guarding Against Confident Nonsense

The fastest way to discredit this practice is to let a fabricated hypothesis reach a decision-maker dressed as fact. Models will happily invent plausible-sounding causes.

Built-in safeguards

Require that any hypothesis citing data be checked against the source before it advances.
Separate the generation step from any claim of evidence; a hypothesis is a question, not an answer.
Watch for the failure mode where the model asserts a mechanism it cannot support, a pattern covered in depth in the common mistakes teams make with generative tools.

A lightweight verification gate

Before a hypothesis enters testing, one person confirms it is falsifiable.
Before any result is shared, the underlying evidence is traceable.
Anything the model presents as established fact gets treated as a claim to verify, not a conclusion.

Installing The Playbook In A Team

A playbook only works if running it is easier than not running it. The installation is mostly about lowering friction.

The first thirty days

Pick one recurring decision—a weekly metrics review, a campaign retrospective—and run the full sequence on it.
Capture every prompt that produced a usable hypothesis in a shared document.
Hold a short debrief on what the model surfaced that a human would have missed.

Making it stick

Tie the practice to an existing ritual so it does not require a new meeting.
Track a simple metric: how often a generated hypothesis changed a decision.
Pair this work with broader standards for prompt quality, such as those in prompt review standards.

Frequently Asked Questions

How is hypothesis generation different from just brainstorming with a model?

Brainstorming produces ideas; hypothesis generation produces testable claims. The discipline is in the constraint. A brainstorm can end with a list you feel good about. A hypothesis generation cycle ends with claims specific enough that you could design an experiment to prove each one wrong. The plays in this article exist to force that specificity rather than leaving you with a pile of vague observations.

Won't the model just produce generic, obvious hypotheses?

It will if you let it. Generic output is a symptom of generic prompting. The Divergent Sweep deliberately asks for unlikely candidates and forbids restating the same idea. The Recombination Pass pushes past the obvious by combining and inverting. The work is in refusing to accept the first safe list and pushing the model toward situation-specific claims grounded in your actual context.

Who should own this in a small team?

Start with one person who runs the generator role and one peer who runs the adversarial recast, then rotate. The point of rotation is to prevent the capability from living in a single head. Even on a two-person team, separating the person who generates from the person who prunes adds real value, because it breaks the loop where someone falls in love with their own first idea.

How do I keep fabricated claims from slipping through?

Treat every generated hypothesis as a question until evidence is attached. The verification gate is simple: a hypothesis cannot enter testing until someone confirms it is falsifiable, and no result can be shared until its evidence is traceable. Generation and evidence are kept as separate steps so the model's fluency is never mistaken for proof.

How many hypotheses should a session produce?

Generate many, advance few. A good Divergent Sweep yields a dozen or more candidates; a healthy cycle advances two or three to testing. If you are consistently advancing most of what you generate, your sweep was too narrow. If you advance none, your pruning criteria may be unrealistic for the cost of testing available to you.

Does this work for forecasting as well as diagnosis?

Yes, with a shift in framing. Diagnostic hypotheses explain why something happened; predictive ones claim what will happen if you act. The plays are the same, but the Test Designer becomes more important, because forecasts are only useful when you can name the cheap observation that would tell you the forecast was wrong before you have committed real resources to it.

Key Takeaways

Hypothesis generation uses models to widen the field of ideas, the opposite of using them to confirm what you already believe.
The four core plays—Divergent Sweep, Adversarial Recast, Test Designer, and Recombination Pass—each have a trigger, a method, and an owner.
Plays run in a loop: diverge, prune, test, recombine, with judgment kept firmly in human hands.
Every generated hypothesis is a question until evidence is attached; a verification gate keeps fabricated claims out of decisions.
The playbook only sticks when it attaches to an existing ritual and the generator role rotates across the team.

What Hypothesis Generation Actually Asks Of A Model

The division of labor

The model proposes; humans dispose. Treat every generated hypothesis as a draft, not a verdict.
Volume is the model's gift. Ask for fifteen explanations, not three, then cut.
Specificity is your job. A vague hypothesis is unfalsifiable and therefore useless.

Why naive prompting fails here

Asking "what's causing X?" yields the three most common textbook answers, not the situation-specific ones.
Without constraints, models converge on safe, generic claims that no one can act on.
A single round produces a list; real generation needs iteration that prunes and recombines.

The Plays

Each play has a trigger that tells you when to run it, a method, and an owner who is accountable for the output. Keep the set small enough to remember.

Play 1: The Divergent Sweep

Trigger: A surprising result, an unexplained metric move, or the start of a new initiative.
Method: Prompt the model to list as many distinct candidate explanations as it can, explicitly instructing it to include unlikely ones and to avoid repeating the same idea in different words.
Owner: The analyst or strategist closest to the data.

Play 2: The Adversarial Recast

Trigger: You already have a favored hypothesis and want to avoid confirmation bias.
Method: Ask the model to argue the strongest case against your leading idea and to propose alternatives that would explain the same evidence.
Owner: A peer who did not originate the favored hypothesis.

Play 3: The Test Designer

Trigger: A hypothesis has survived the first cut and needs a cheap way to be proven wrong.
Method: Prompt the model to propose the smallest experiment, query, or observation that would falsify the claim, ranked by cost.
Owner: Whoever will run the test.

Play 4: The Recombination Pass

Trigger: Your list has gone stale and ideas feel incremental.
Method: Feed the model the surviving hypotheses and ask it to combine pairs into novel claims or to invert assumptions shared across all of them.
Owner: The session facilitator.

Triggers And Sequencing

Plays run in a loop, not a line. A typical cycle moves from divergence to pruning to testing and back.

The standard sequence

Run the Divergent Sweep to flood the field with candidates.
Cluster the output by hand, collapsing near-duplicates.
Apply the Adversarial Recast to the two or three survivors.
Hand each survivor to the Test Designer.
After early test results, run a Recombination Pass on what remains.

When to break the sequence

Skip straight to the Test Designer when a hypothesis is already obvious and you only need a cheap falsification.
Repeat the Divergent Sweep with a different framing if the first round was too narrow.
Stop entirely once the cost of generating exceeds the value of one more idea.

Owners And Accountability

A play without an owner is a suggestion. Assign one person per play per cycle, and make the handoffs explicit so ideas do not die between steps.

Defining ownership

The generator runs the sweep and is accountable for producing genuine variety, not a list of synonyms.
The pruner owns the cut and must be able to defend why each survivor stayed.
The tester owns falsification and reports results back into the loop.

Avoiding the lone-genius trap

Rotate the generator role so the skill spreads across the team.
Document the prompts that worked in a shared library, the way you would with any reusable asset—see how teams manage a prompt library.
Review generated hypotheses in a standing meeting so judgment is collective.

Guarding Against Confident Nonsense

The fastest way to discredit this practice is to let a fabricated hypothesis reach a decision-maker dressed as fact. Models will happily invent plausible-sounding causes.

Built-in safeguards

Require that any hypothesis citing data be checked against the source before it advances.
Separate the generation step from any claim of evidence; a hypothesis is a question, not an answer.
Watch for the failure mode where the model asserts a mechanism it cannot support, a pattern covered in depth in the common mistakes teams make with generative tools.

A lightweight verification gate

Before a hypothesis enters testing, one person confirms it is falsifiable.
Before any result is shared, the underlying evidence is traceable.
Anything the model presents as established fact gets treated as a claim to verify, not a conclusion.

Installing The Playbook In A Team

A playbook only works if running it is easier than not running it. The installation is mostly about lowering friction.

The first thirty days

Pick one recurring decision—a weekly metrics review, a campaign retrospective—and run the full sequence on it.
Capture every prompt that produced a usable hypothesis in a shared document.
Hold a short debrief on what the model surfaced that a human would have missed.

Making it stick

Tie the practice to an existing ritual so it does not require a new meeting.
Track a simple metric: how often a generated hypothesis changed a decision.
Pair this work with broader standards for prompt quality, such as those in prompt review standards.

Frequently Asked Questions

How is hypothesis generation different from just brainstorming with a model?

Won't the model just produce generic, obvious hypotheses?

Who should own this in a small team?

How do I keep fabricated claims from slipping through?

How many hypotheses should a session produce?

Does this work for forecasting as well as diagnosis?

Key Takeaways

Hypothesis generation uses models to widen the field of ideas, the opposite of using them to confirm what you already believe.
The four core plays—Divergent Sweep, Adversarial Recast, Test Designer, and Recombination Pass—each have a trigger, a method, and an owner.
Plays run in a loop: diverge, prune, test, recombine, with judgment kept firmly in human hands.
Every generated hypothesis is a question until evidence is attached; a verification gate keeps fabricated claims out of decisions.
The playbook only sticks when it attaches to an existing ritual and the generator role rotates across the team.

Set Plays for Turning Models Into Idea Engines

What Hypothesis Generation Actually Asks Of A Model

The division of labor

Why naive prompting fails here

The Plays

Play 1: The Divergent Sweep

Play 2: The Adversarial Recast

Play 3: The Test Designer

Play 4: The Recombination Pass

Triggers And Sequencing

The standard sequence

When to break the sequence

Owners And Accountability

Defining ownership

Avoiding the lone-genius trap

Guarding Against Confident Nonsense

Built-in safeguards

A lightweight verification gate

Installing The Playbook In A Team

The first thirty days

Making it stick

Frequently Asked Questions

How is hypothesis generation different from just brainstorming with a model?

Won't the model just produce generic, obvious hypotheses?

Who should own this in a small team?

How do I keep fabricated claims from slipping through?

How many hypotheses should a session produce?

Does this work for forecasting as well as diagnosis?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Set Plays for Turning Models Into Idea Engines

What Hypothesis Generation Actually Asks Of A Model

The division of labor

Why naive prompting fails here

The Plays

Play 1: The Divergent Sweep

Play 2: The Adversarial Recast

Play 3: The Test Designer

Play 4: The Recombination Pass

Triggers And Sequencing

The standard sequence

When to break the sequence

Owners And Accountability

Defining ownership

Avoiding the lone-genius trap

Guarding Against Confident Nonsense

Built-in safeguards

A lightweight verification gate

Installing The Playbook In A Team

The first thirty days

Making it stick

Frequently Asked Questions

How is hypothesis generation different from just brainstorming with a model?

Won't the model just produce generic, obvious hypotheses?

Who should own this in a small team?

How do I keep fabricated claims from slipping through?

How many hypotheses should a session produce?

Does this work for forecasting as well as diagnosis?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?