Walking Through Code Prompts That Worked and Ones That Didn't

Principles are easier to remember when you have seen them play out. This article works through specific scenarios of prompting for code generation, each one showing a realistic task, the kind of prompt a developer might write, what the model tends to return, and the concrete reason behind the result. Some succeed, some fail, and the failures are as instructive as the wins.

These are composite scenarios drawn from common situations, not transcripts of any single session. The point is not the exact wording but the pattern: what kind of prompt produces usable code and what kind produces work you have to throw away. Read them as a set, because the contrast between a weak and a strong version of the same task is where the lesson lives.

By the end you should be able to look at your own prompts and predict, before sending them, whether they will land.

Scenario: A Data Transformation Utility

A developer needs a function that takes a list of order records and returns total revenue grouped by month.

The Weak Prompt and Why It Fails

The weak version: "write a function to group orders by month and sum revenue." The model has to guess the shape of an order record, the date format, the currency representation, and the return structure. It invents a plausible record shape that does not match the real data, so the code runs in the model's imagined world and breaks in yours. The cost is a full rewrite once you supply what was missing.

The Strong Prompt and Why It Works

The strong version includes a sample record—"each order looks like { id, createdAt (ISO string), amountCents (integer) }"—states the return shape, "return an object mapping YYYY-MM to total cents," and notes the edge case, "skip orders with a null createdAt." Now the model has no decisions left to guess, and the output drops in cleanly. The difference was three sentences of context, exactly the discipline the step-by-step process builds in.

Scenario: Adding a Method to Existing Code

A developer wants to add a caching layer to an existing service class.

The Failure: Ignoring the Existing Code

Asking "add caching to my user service" without showing the service produces a generic cache wrapper that does not match the class's structure, uses a different naming style, and assumes dependencies that are not present. The generated method looks fine in isolation and is useless in place.

The Fix: Show the Class First

Pasting the existing class and saying "add a cached version of getUser that expires after five minutes, matching the style and error handling of the existing methods" yields a method that slots in. The model mirrors the surrounding code because it can see it. Showing beats describing, every time—a point the best practices guide makes the foundation of good context.

Scenario: Generating Tests

A developer has a working function and wants a test suite for it.

What Tends to Go Wrong

A bare "write tests for this function" often produces tests that assert whatever the function currently does, including its bugs. The tests pass, which feels reassuring, but they verify behavior rather than correctness. This is a subtle trap: green tests that prove nothing.

The Better Approach

Specify the behavior the tests should verify: "write tests covering a normal case, an empty input, a null value, and a boundary at zero, asserting the documented behavior—not just the current output." Then read the tests to confirm they check what you care about. This guards against the failure mode detailed in 7 Common Mistakes, where unverified tests give false confidence.

Scenario: Debugging With a Stack Trace

A developer's code throws an error and they want help fixing it.

The Low-Information Version

"My code is broken, here it is" forces the model to hunt for the problem across the whole snippet, often guessing wrong and proposing changes that do not address the actual fault.

The High-Information Version

Pasting the code, the exact error message, and "this fails when the input list is empty" points the model straight at the fault. The stack trace is the highest-value feedback you can give because it locates the divergence between expectation and reality. Errors are the model's native currency for correction.

Scenario: Scaffolding Boilerplate

A developer needs a standard CRUD endpoint that mirrors a dozen existing ones.

Why This Is a Best-Case Use

This is where code generation shines. The pattern is well-established, repetitive, and tedious to write by hand. Showing one existing endpoint and saying "create the same for the Product resource with these fields" produces near-perfect output because the model has a precise template to follow and almost nothing to invent.

The Reusable Lesson

Tasks that are patterned and repetitive are the sweet spot. The more your task resembles something with a clear template, the better the model performs—which is also why saving these as prompt templates pays off. A team that systematized this is described in the case study.

Scenario: A Novel Algorithm

A developer needs a non-standard algorithm with specific performance constraints.

Why This Is the Hardest Case

When the task is genuinely novel and constrained—an unusual data structure, a tight performance budget—the model has fewer patterns to draw on and is more likely to produce plausible but suboptimal or incorrect code. The polish stays high while the correctness drops, which makes errors harder to catch.

How to Handle It

Break the problem into smaller, verifiable pieces, state the constraints explicitly, ask the model to explain its approach before coding so you can catch a flawed plan early, and verify each piece independently. Heavy verification is non-negotiable here. This is the case where reading every line earns its keep most clearly.

Scenario: Translating Code Between Languages

A developer needs to port a working utility from one language to another—say, a parser from Python to TypeScript.

Why It Looks Easy and Isn't

Translation feels low-risk because the logic already exists and works. But languages differ in subtle ways the model can paper over incorrectly: integer division, null versus undefined, how each handles unexpected types or floating-point edges. A naive "translate this to TypeScript" often produces code that mirrors the structure faithfully while quietly changing behavior at the edges.

The Stronger Approach

Provide the source code, name the target language and version, and call out the behaviors that must be preserved exactly: "keep the same rounding, treat missing keys as errors not defaults." Then test the translated version against the same cases as the original, comparing outputs directly. Translation is verification-heavy precisely because the surface similarity hides the differences—the structure looks identical while a boundary case diverges.

Frequently Asked Questions

Are these real transcripts?

They are composites of common situations rather than logs of a single session, chosen to isolate the pattern that drives success or failure. The exact wording matters less than the structure—what context was present and what was missing.

Which scenario is most representative of daily work?

For most developers, scaffolding boilerplate and adding to existing code dominate. These are the patterned, context-heavy tasks where good prompting produces the biggest, most reliable wins, which is why they make up the bulk of practical use.

Why do the failures all look avoidable in hindsight?

Because they are. Nearly every failure here traces to missing context or a vague request—mistakes that feel obvious once named but are easy to commit in the moment. Seeing them laid out is what makes them avoidable in practice.

How do I know in advance if a prompt will work?

Ask whether you have left the model anything important to guess. If the inputs, outputs, environment, and edge cases are all stated, it will likely land. If you are relying on the model to infer your situation, expect to iterate.

Key Takeaways

The same task produces a rewrite or a clean drop-in depending entirely on whether context is supplied.
Show existing code when adding to a project; the model mirrors what it can see and invents what it cannot.
Specify the behavior tests should verify, then read them—green tests that assert current behavior prove nothing.
A stack trace is the highest-value feedback you can give; errors are the model's native correction currency.
Patterned, repetitive tasks like boilerplate are the sweet spot; novel constrained algorithms need heavy verification.
Before sending a prompt, ask what you have left the model to guess—the gaps predict the failures.

By the end you should be able to look at your own prompts and predict, before sending them, whether they will land.

Scenario: A Data Transformation Utility

A developer needs a function that takes a list of order records and returns total revenue grouped by month.

The Weak Prompt and Why It Fails

The Strong Prompt and Why It Works

Scenario: Adding a Method to Existing Code

A developer wants to add a caching layer to an existing service class.

The Failure: Ignoring the Existing Code

The Fix: Show the Class First

Scenario: Generating Tests

A developer has a working function and wants a test suite for it.

What Tends to Go Wrong

The Better Approach

Scenario: Debugging With a Stack Trace

A developer's code throws an error and they want help fixing it.

The Low-Information Version

"My code is broken, here it is" forces the model to hunt for the problem across the whole snippet, often guessing wrong and proposing changes that do not address the actual fault.

The High-Information Version

Scenario: Scaffolding Boilerplate

A developer needs a standard CRUD endpoint that mirrors a dozen existing ones.

Why This Is a Best-Case Use

The Reusable Lesson

Scenario: A Novel Algorithm

A developer needs a non-standard algorithm with specific performance constraints.

Why This Is the Hardest Case

How to Handle It

Scenario: Translating Code Between Languages

A developer needs to port a working utility from one language to another—say, a parser from Python to TypeScript.

Why It Looks Easy and Isn't

The Stronger Approach

Frequently Asked Questions

Are these real transcripts?

Which scenario is most representative of daily work?

Why do the failures all look avoidable in hindsight?

How do I know in advance if a prompt will work?

Key Takeaways

The same task produces a rewrite or a clean drop-in depending entirely on whether context is supplied.
Show existing code when adding to a project; the model mirrors what it can see and invents what it cannot.
Specify the behavior tests should verify, then read them—green tests that assert current behavior prove nothing.
A stack trace is the highest-value feedback you can give; errors are the model's native correction currency.
Patterned, repetitive tasks like boilerplate are the sweet spot; novel constrained algorithms need heavy verification.
Before sending a prompt, ask what you have left the model to guess—the gaps predict the failures.

Walking Through Code Prompts That Worked and Ones That Didn't

Scenario: A Data Transformation Utility

The Weak Prompt and Why It Fails

The Strong Prompt and Why It Works

Scenario: Adding a Method to Existing Code

The Failure: Ignoring the Existing Code

The Fix: Show the Class First

Scenario: Generating Tests

What Tends to Go Wrong

The Better Approach

Scenario: Debugging With a Stack Trace

The Low-Information Version

The High-Information Version

Scenario: Scaffolding Boilerplate

Why This Is a Best-Case Use

The Reusable Lesson

Scenario: A Novel Algorithm

Why This Is the Hardest Case

How to Handle It

Scenario: Translating Code Between Languages

Why It Looks Easy and Isn't

The Stronger Approach

Frequently Asked Questions

Are these real transcripts?

Which scenario is most representative of daily work?

Why do the failures all look avoidable in hindsight?

How do I know in advance if a prompt will work?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?

Walking Through Code Prompts That Worked and Ones That Didn't

Scenario: A Data Transformation Utility

The Weak Prompt and Why It Fails

The Strong Prompt and Why It Works

Scenario: Adding a Method to Existing Code

The Failure: Ignoring the Existing Code

The Fix: Show the Class First

Scenario: Generating Tests

What Tends to Go Wrong

The Better Approach

Scenario: Debugging With a Stack Trace

The Low-Information Version

The High-Information Version

Scenario: Scaffolding Boilerplate

Why This Is a Best-Case Use

The Reusable Lesson

Scenario: A Novel Algorithm

Why This Is the Hardest Case

How to Handle It

Scenario: Translating Code Between Languages

Why It Looks Easy and Isn't

The Stronger Approach

Frequently Asked Questions

Are these real transcripts?

Which scenario is most representative of daily work?

Why do the failures all look avoidable in hindsight?

How do I know in advance if a prompt will work?

Key Takeaways

Agency Script Editorial

Related Articles

Rolling Out AI Hallucinations Across a Team

A Model Behind an API Is Only Potential

Case Study: Large Language Models in Practice

Ready to certify your AI capability?