Conceptual articles tell you what a sandbox is. This one tells you what to do, in order, starting now. By the time you reach the end, you will have a working isolated environment and a way to prove it actually contains what you put inside it.
The sequence below is deliberately concrete. Each step depends on the one before it, so resist the urge to skip ahead. The whole point of a sandbox is that nothing leaks, and leaks usually come from the step somebody decided was optional.
You do not need exotic infrastructure to follow along. A laptop, a container runtime, and a willingness to test your own work adversarially are enough to get a credible sandbox running. If you are completely new to the concept, the beginner's introduction is worth a quick read first.
Step 1: Define what you are protecting
Before you build a wall, decide what is on the other side of it. Write down the specific things the sandbox must never reach: production databases, customer records, billing systems, internal APIs, your email. This list is your blast radius, the set of things a mistake could damage.
Be concrete. "Production stuff" is not a list. "The orders table, the Stripe key, and the customer email service" is. Everything you build afterward exists to keep the sandbox away from this list.
Step 2: Choose your isolation boundary
Now pick the technical wall. The most common options, in rough order of strength:
- A container (such as Docker), which isolates the filesystem and process but shares the host kernel.
- A microVM (such as Firecracker), which adds a stronger kernel-level boundary for untrusted code.
- A separate cloud account or project, which isolates everything at the billing and permission level.
For most first sandboxes, a container is the right starting point. It is fast, disposable, and good enough for code an agent writes against fake data. Choose the strongest boundary your use case justifies, not the strongest one that exists.
Step 3: Provision fake or masked data
Never put real data in your first sandbox. Generate synthetic records that match the shape of your real data, same fields, same formats, but invented values. If you need realism, mask production data by replacing every sensitive value with a fake one before it enters the box.
The rule is simple: if a record leaked from the sandbox tomorrow, you should not care. If you would care, it does not belong inside.
Step 4: Lock the network by default
This is the step people skip, and it is the one that bites them. Configure the sandbox so that all outbound network access is denied unless explicitly allowed.
How to do it
- Start the container with networking disabled, then add back only the specific endpoints you need.
- Maintain an allowlist of domains the AI may reach (for example, the model API itself) and deny everything else.
- Block access to internal hostnames and your cloud metadata endpoint, which is a classic exfiltration path.
A sandbox with open networking is not a sandbox. It is a container with a nice name.
Step 5: Wire in the AI under tight permissions
Now connect the model or agent. Give it the minimum set of tools it needs and nothing more. If the agent only needs to read files and run code, do not also hand it the ability to send emails or make purchases.
Each capability you grant is a new way for a mistake to escape. Grant deliberately. The best practices guide goes deep on least-privilege design if you want the reasoning behind this.
Step 6: Turn on observability before you run anything
You want a record of everything the agent does, captured from the very first run, not added after something goes wrong. Log every prompt, every tool call, every command executed, and every output produced.
Two reasons. First, when something behaves strangely, the log is how you understand why. Second, the log is your evidence that the sandbox behaved as designed. Run nothing meaningful until logging is on.
Step 7: Set spend and rate limits
Autonomous agents can loop. A buggy agent that retries forever can run up a startling token bill overnight. Before you let it run unattended, set a hard cap on token spend and a rate limit on actions.
Treat these caps as a circuit breaker. They will not improve your results, but they will prevent the 3 a.m. surprise that turns an experiment into an expense report.
Step 8: Run an experiment and watch it
Now actually use the thing. Give the agent a realistic task against your synthetic data and watch the logs in real time. Do not walk away during the first run. You are not just testing the agent; you are testing the sandbox.
Pay attention to anything the agent attempts that you did not expect, especially network calls or file access outside its working directory. Those attempts are exactly what your walls are meant to stop.
Step 9: Verify containment adversarially, then tear down
The final step is the one that earns trust. Try to break out of your own sandbox.
Containment checks to run
- Instruct the agent to reach a known external URL it should not have access to. It should fail.
- Have it try to read a production hostname or your cloud metadata endpoint. It should fail.
- Have it write a file, then destroy and recreate the sandbox. The file should be gone.
If any of these succeed, you have found a hole, and finding it now is the whole point. Once the checks pass, destroy the environment. Disposability is a feature; use it. For more ways environments fail these checks, see the common mistakes breakdown, and for the full conceptual picture, the complete guide.
Frequently Asked Questions
How long does this whole process take the first time?
Expect a few hours for your first sandbox if you are comfortable with containers, longer if you are learning the tooling alongside it. The good news is that once you script the steps, recreating a sandbox drops to seconds. The time cost is almost entirely upfront.
Can I skip the synthetic data step if my data is not sensitive?
Even non-sensitive data can carry surprises, like an internal URL embedded in a record that the agent then tries to reach. Synthetic data also makes your tests reproducible. Skipping it is a shortcut that occasionally turns expensive, so generate fakes even when the stakes feel low.
What if my agent genuinely needs internet access to work?
Then allowlist exactly the endpoints it needs and deny everything else. "Needs the internet" almost never means "needs all of the internet." Identify the specific domains, permit those, and keep the default-deny posture for everything else.
Do I need a microVM, or is a container enough?
For code your own team writes and trusts, a container is usually enough. For running code that an AI generates from untrusted prompts, the stronger kernel boundary of a microVM is worth it. Match the boundary to how much you trust what runs inside.
How often should I run the adversarial containment checks?
Run them every time you change the sandbox's configuration, and on a recurring schedule even when you have not. Isolation tends to erode quietly as people add allowlist entries for convenience. Periodic checks catch that drift before it becomes a leak.
Key Takeaways
- Start by writing down your blast radius, the specific real things the sandbox must never reach.
- Choose an isolation boundary that matches your trust level: containers for trusted code, microVMs for untrusted, separate accounts for full isolation.
- Always use synthetic or masked data, and deny outbound network access by default, the step most often skipped.
- Turn on observability and set spend and rate limits before running anything unattended.
- Verify containment adversarially by trying to break out yourself, and tear the environment down afterward since disposability is the feature.