AGENCYSCRIPT
CoursesEnterpriseBlog
👑FoundersSign inJoin Waitlist
AGENCYSCRIPT

Governed Certification Framework

The operating system for AI-enabled agency building. Certify judgment under constraint. Standards over scale. Governance over shortcuts.

Stay informed

Governance updates, certification insights, and industry standards.

Products

  • Platform
  • Certification
  • Launch Program
  • Vault
  • The Book

Certification

  • Foundation (AS-F)
  • Operator (AS-O)
  • Architect (AS-A)
  • Principal (AS-P)

Resources

  • Blog
  • Verify Credential
  • Enterprise
  • Partners
  • Pricing

Company

  • About
  • Contact
  • Careers
  • Press
© 2026 Agency Script, Inc.·
Privacy PolicyTerms of ServiceCertification AgreementSecurity

Standards over scale. Judgment over volume. Governance over shortcuts.

On This Page

What Hypothesis Generation Actually Asks Of A ModelThe division of laborWhy naive prompting fails hereThe PlaysPlay 1: The Divergent SweepPlay 2: The Adversarial RecastPlay 3: The Test DesignerPlay 4: The Recombination PassTriggers And SequencingThe standard sequenceWhen to break the sequenceOwners And AccountabilityDefining ownershipAvoiding the lone-genius trapGuarding Against Confident NonsenseBuilt-in safeguardsA lightweight verification gateInstalling The Playbook In A TeamThe first thirty daysMaking it stickFrequently Asked QuestionsHow is hypothesis generation different from just brainstorming with a model?Won't the model just produce generic, obvious hypotheses?Who should own this in a small team?How do I keep fabricated claims from slipping through?How many hypotheses should a session produce?Does this work for forecasting as well as diagnosis?Key Takeaways
Home/Blog/Set Plays for Turning Models Into Idea Engines
General

Set Plays for Turning Models Into Idea Engines

A

Agency Script Editorial

Editorial Team

·November 29, 2020·8 min read
prompting for hypothesis generationprompting for hypothesis generation playbookprompting for hypothesis generation guideprompt engineering

A model that drafts emails is useful. A model that helps you see possibilities you had not considered is something else entirely. The difference is not the model—it is the prompt and the process wrapped around it. Most teams default to using language models as confirmation machines: they ask a question they already half-answered and accept the response that matches their prior. Hypothesis generation is the opposite discipline. You deliberately ask the model to widen the field, surface candidate explanations, and propose tests, so that you end up considering ideas you would never have reached alone.

The trouble is that this rarely survives contact with a busy week. One analyst gets good at it, everyone else asks the model to summarize, and the capability stays trapped in a single head. A playbook fixes that. It does not teach the technique in the abstract; it tells you, in the middle of real work, which move to make, who owns it, and what to do when the output is thin.

This article lays out the plays, their triggers, the owners who run them, and the sequence for installing the whole thing in a team. If you are new to the underlying mechanics, start with the step-by-step approach to grounding outputs and return when you want to operationalize idea generation itself.

What Hypothesis Generation Actually Asks Of A Model

Before the plays, get the framing right. A hypothesis is a testable claim about why something is happening or what will happen if you act. Generating good ones means producing claims that are specific, falsifiable, and worth the cost of testing. A model is well suited to the first half of that work—producing many candidate claims quickly—and poorly suited to the second half, which is judgment about which claims matter.

The division of labor

  • The model proposes; humans dispose. Treat every generated hypothesis as a draft, not a verdict.
  • Volume is the model's gift. Ask for fifteen explanations, not three, then cut.
  • Specificity is your job. A vague hypothesis is unfalsifiable and therefore useless.

Why naive prompting fails here

  • Asking "what's causing X?" yields the three most common textbook answers, not the situation-specific ones.
  • Without constraints, models converge on safe, generic claims that no one can act on.
  • A single round produces a list; real generation needs iteration that prunes and recombines.

The Plays

Each play has a trigger that tells you when to run it, a method, and an owner who is accountable for the output. Keep the set small enough to remember.

Play 1: The Divergent Sweep

  • Trigger: A surprising result, an unexplained metric move, or the start of a new initiative.
  • Method: Prompt the model to list as many distinct candidate explanations as it can, explicitly instructing it to include unlikely ones and to avoid repeating the same idea in different words.
  • Owner: The analyst or strategist closest to the data.

Play 2: The Adversarial Recast

  • Trigger: You already have a favored hypothesis and want to avoid confirmation bias.
  • Method: Ask the model to argue the strongest case against your leading idea and to propose alternatives that would explain the same evidence.
  • Owner: A peer who did not originate the favored hypothesis.

Play 3: The Test Designer

  • Trigger: A hypothesis has survived the first cut and needs a cheap way to be proven wrong.
  • Method: Prompt the model to propose the smallest experiment, query, or observation that would falsify the claim, ranked by cost.
  • Owner: Whoever will run the test.

Play 4: The Recombination Pass

  • Trigger: Your list has gone stale and ideas feel incremental.
  • Method: Feed the model the surviving hypotheses and ask it to combine pairs into novel claims or to invert assumptions shared across all of them.
  • Owner: The session facilitator.

Triggers And Sequencing

Plays run in a loop, not a line. A typical cycle moves from divergence to pruning to testing and back.

The standard sequence

  1. Run the Divergent Sweep to flood the field with candidates.
  2. Cluster the output by hand, collapsing near-duplicates.
  3. Apply the Adversarial Recast to the two or three survivors.
  4. Hand each survivor to the Test Designer.
  5. After early test results, run a Recombination Pass on what remains.

When to break the sequence

  • Skip straight to the Test Designer when a hypothesis is already obvious and you only need a cheap falsification.
  • Repeat the Divergent Sweep with a different framing if the first round was too narrow.
  • Stop entirely once the cost of generating exceeds the value of one more idea.

Owners And Accountability

A play without an owner is a suggestion. Assign one person per play per cycle, and make the handoffs explicit so ideas do not die between steps.

Defining ownership

  • The generator runs the sweep and is accountable for producing genuine variety, not a list of synonyms.
  • The pruner owns the cut and must be able to defend why each survivor stayed.
  • The tester owns falsification and reports results back into the loop.

Avoiding the lone-genius trap

  • Rotate the generator role so the skill spreads across the team.
  • Document the prompts that worked in a shared library, the way you would with any reusable asset—see how teams manage a prompt library.
  • Review generated hypotheses in a standing meeting so judgment is collective.

Guarding Against Confident Nonsense

The fastest way to discredit this practice is to let a fabricated hypothesis reach a decision-maker dressed as fact. Models will happily invent plausible-sounding causes.

Built-in safeguards

  • Require that any hypothesis citing data be checked against the source before it advances.
  • Separate the generation step from any claim of evidence; a hypothesis is a question, not an answer.
  • Watch for the failure mode where the model asserts a mechanism it cannot support, a pattern covered in depth in the common mistakes teams make with generative tools.

A lightweight verification gate

  • Before a hypothesis enters testing, one person confirms it is falsifiable.
  • Before any result is shared, the underlying evidence is traceable.
  • Anything the model presents as established fact gets treated as a claim to verify, not a conclusion.

Installing The Playbook In A Team

A playbook only works if running it is easier than not running it. The installation is mostly about lowering friction.

The first thirty days

  • Pick one recurring decision—a weekly metrics review, a campaign retrospective—and run the full sequence on it.
  • Capture every prompt that produced a usable hypothesis in a shared document.
  • Hold a short debrief on what the model surfaced that a human would have missed.

Making it stick

  • Tie the practice to an existing ritual so it does not require a new meeting.
  • Track a simple metric: how often a generated hypothesis changed a decision.
  • Pair this work with broader standards for prompt quality, such as those in prompt review standards.

Frequently Asked Questions

How is hypothesis generation different from just brainstorming with a model?

Brainstorming produces ideas; hypothesis generation produces testable claims. The discipline is in the constraint. A brainstorm can end with a list you feel good about. A hypothesis generation cycle ends with claims specific enough that you could design an experiment to prove each one wrong. The plays in this article exist to force that specificity rather than leaving you with a pile of vague observations.

Won't the model just produce generic, obvious hypotheses?

It will if you let it. Generic output is a symptom of generic prompting. The Divergent Sweep deliberately asks for unlikely candidates and forbids restating the same idea. The Recombination Pass pushes past the obvious by combining and inverting. The work is in refusing to accept the first safe list and pushing the model toward situation-specific claims grounded in your actual context.

Who should own this in a small team?

Start with one person who runs the generator role and one peer who runs the adversarial recast, then rotate. The point of rotation is to prevent the capability from living in a single head. Even on a two-person team, separating the person who generates from the person who prunes adds real value, because it breaks the loop where someone falls in love with their own first idea.

How do I keep fabricated claims from slipping through?

Treat every generated hypothesis as a question until evidence is attached. The verification gate is simple: a hypothesis cannot enter testing until someone confirms it is falsifiable, and no result can be shared until its evidence is traceable. Generation and evidence are kept as separate steps so the model's fluency is never mistaken for proof.

How many hypotheses should a session produce?

Generate many, advance few. A good Divergent Sweep yields a dozen or more candidates; a healthy cycle advances two or three to testing. If you are consistently advancing most of what you generate, your sweep was too narrow. If you advance none, your pruning criteria may be unrealistic for the cost of testing available to you.

Does this work for forecasting as well as diagnosis?

Yes, with a shift in framing. Diagnostic hypotheses explain why something happened; predictive ones claim what will happen if you act. The plays are the same, but the Test Designer becomes more important, because forecasts are only useful when you can name the cheap observation that would tell you the forecast was wrong before you have committed real resources to it.

Key Takeaways

  • Hypothesis generation uses models to widen the field of ideas, the opposite of using them to confirm what you already believe.
  • The four core plays—Divergent Sweep, Adversarial Recast, Test Designer, and Recombination Pass—each have a trigger, a method, and an owner.
  • Plays run in a loop: diverge, prune, test, recombine, with judgment kept firmly in human hands.
  • Every generated hypothesis is a question until evidence is attached; a verification gate keeps fabricated claims out of decisions.
  • The playbook only sticks when it attaches to an existing ritual and the generator role rotates across the team.

Search Articles

Categories

OperationsSalesDeliveryGovernance

Popular Tags

prompt engineeringai fundamentalsai toolsthe difference between AIMLagency operationsagency growthenterprise sales

Share Article

A

Agency Script Editorial

Editorial Team

The Agency Script editorial team delivers operational insights on AI delivery, certification, and governance for modern agency operators.

Related Articles

General

Prompt Quality Decides Whether AI Earns Its Keep

Prompt quality is the single biggest variable in whether AI delivers real work or expensive noise. The model matters, the platform matters — but the prompt you write determines whether you get a first

A
Agency Script Editorial
June 1, 2026·10 min read
General

Counting the Real Cost of Every Token You Send

Tokens and context windows sit at the intersection of AI capability and operational cost—yet most business cases treat them as technical footnotes. That's a mistake that costs real money. Every time y

A
Agency Script Editorial
June 1, 2026·10 min read
General

Rolling Out AI Hallucinations Across a Team

Most teams discover AI hallucinations the hard way — a confident-sounding wrong answer makes it into a client deliverable, a legal brief, or a published report. The damage isn't just to the output; it

A
Agency Script Editorial
June 1, 2026·11 min read

Ready to certify your AI capability?

Join the professionals building governed, repeatable AI delivery systems.

Explore Certification