Wes

Posted on Mar 30 • Originally published at wshoffner.dev

The Blackwall Between Your AI Agent and Your Filesystem

#opensource #security #go #ai

Every AI coding agent you run has the same permissions you do. Claude Code, Cursor, Codex, Aider. They can read your SSH keys, write to your shell config, and run any command your user account can. We accept this because the alternative is setting up Docker containers and dealing with volume mounts and broken toolchains every time we want an agent to help with a project.

That trade-off has always felt wrong to me. Not because I think my AI agent is malicious, but because I know it executes code from dependencies I haven't read, runs shell commands it hallucinated, and sometimes rms things it shouldn't. The blast radius of a mistake is my entire home directory.

I went looking for something between "full trust" and "Docker wrapper," and I found a project named after the barrier between humanity and rogue AIs in Cyberpunk 2077.

What Is greywall?

greywall is a container-free sandbox for AI coding agents. It uses kernel-level enforcement on Linux (bubblewrap, seccomp, Landlock, eBPF) and Seatbelt profiles on macOS to isolate your agent's filesystem access, network traffic, and syscalls. Deny by default. No Docker, no VMs. One binary, four direct dependencies.

It ships with built-in profiles for 13 agents (Claude Code, Cursor, Codex, Aider, and more), and it has a learning mode that traces what your agent actually touches and generates a least-privilege profile from the results. The project is three weeks old, has about 110 stars, and the sole maintainer merges external PRs within hours.

The Snapshot


Project	greywall
Stars	~109 at time of writing
Maintainer	Solo (tito), mass-committing daily
Code health	17,400 lines of Go, 151 tests, clean layered architecture
Docs	ARCHITECTURE.md, CONTRIBUTING.md, 18 doc files, a full docs site
Contributor UX	Merged my PR same-day, CI catches lint, good first issues labeled
Worth using	Yes if you run AI agents on Linux or macOS

Under the Hood

The codebase is ~17,400 lines of Go with only four direct dependencies: cobra for CLI, doublestar for glob matching, jsonc for config with comments, and x/sys for kernel syscalls. Everything else is hand-rolled against the kernel API.

On Linux, greywall stacks five security layers, each covering what the others can't:

Bubblewrap namespaces (linux.go, 1,642 lines) handle the heavy lifting. In DefaultDenyRead mode, the sandbox starts from an empty root filesystem (--tmpfs /) and selectively mounts system paths read-only and your project directory read-write. Network isolation drops all connectivity, then three bridge types restore controlled access: a ProxyBridge for SOCKS5 traffic, a DnsBridge for DNS resolution, and a ReverseBridge for inbound port forwarding. All of them relay over Unix sockets via socat.

Seccomp BPF (linux_seccomp.go) blocks 30+ dangerous syscalls: ptrace, mount, reboot, bpf, perf_event_open. If your kernel doesn't support seccomp, greywall skips it and continues. This graceful fallback pattern repeats at every layer.

Landlock (linux_landlock.go) adds kernel-level filesystem access control. It opens paths with O_PATH and uses fstat to avoid TOCTOU races between checking a path and applying a rule to it. It handles ABI versions 1 through 5, stripping directory-only rights from non-directory paths to avoid EINVAL from the kernel.

eBPF monitoring traces violations in real time via bpftrace. Learning mode runs strace under the hood, captures every file your agent touches, and collapses the results into a reusable profile.

On macOS, greywall generates Seatbelt profiles for sandbox-exec with deny-by-default network rules and selective file access via regex patterns. macOS actually has a cleaner security model here. Seatbelt supports both allow and deny rules with regex, so you can write "allow ~/.claude.json*, deny everything else in home." Linux's Landlock is additive-only. Once you grant write access to a directory, you can't deny individual files inside it. This is the project's most interesting architectural tension, and it surfaces as a real bug: issue #62, where programs that do atomic file writes (create a temp file, rename over the target) break because the temp file and the target live on different filesystems inside the sandbox.

Command blocking (command.go, 524 lines) doesn't just match command names. It parses shell syntax: pipes, &&, ||, semicolons, subshells, and quoted strings. echo foo | shutdown gets caught. bash -c "rm -rf /" gets caught. It's more parser than filter.

The architecture makes sense for what it's doing. Each layer has a clear file, clear responsibility, and a fallback path. The build tags (//go:build linux, //go:build darwin) keep platform code separated without runtime conditionals. The test suite has 151 tests across 13 files covering command blocking, Landlock rules, Seatbelt profile generation, learning mode, and config validation. For a three-week-old project, that's unusually disciplined.

What's rough: the project is pre-1.0 and moving fast. Eight releases in 23 days. The DefaultDenyRead mode is ambitious and still has edge cases (the atomic writes bug, WSL DNS issues, AppArmor conflicts with TUN devices). The documentation is comprehensive but assumes you already know what bubblewrap and Landlock are. If you're new to Linux security primitives, the onboarding curve is steep.

The Contribution

Issue #5 asked for a greywall profiles edit command. The learning mode generates JSON profiles and saves them to ~/.config/greywall/learned/, but there was no way to edit them without hunting for the file path and hand-validating the JSON. The maintainer wanted an editor command that validates on close.

Getting into the codebase was straightforward. The existing profiles list and profiles show commands were right there in main.go, following the standard cobra subcommand pattern. The config validation was already built: config.Load() parses JSON (with comments via jsonc) and runs Validate(). I just needed to wire up an editor loop.

The implementation opens the profile in $EDITOR (splitting on whitespace to support code --wait and emacs -nw), saves the original content for rollback, and after the editor closes: detects no-change exits, validates the JSON, and on failure prompts to re-edit or discard. Discard restores the original file. About 95 lines total.

CI caught two lint issues I couldn't test locally (the project requires Go 1.25, I had 1.22): gocritic flagged an append to a different variable, and gofumpt wanted explicit octal syntax (0o600 instead of 0600). Pushed the fix, and the maintainer merged the whole thing within hours of submission. Approved the code immediately, just asked for the lint fix. That's a three-week-old project with a same-day merge for a first-time contributor. PR #64.

The Verdict

greywall is for anyone running AI coding agents who wants more than trust and less than Docker. If you use Claude Code or Cursor on a machine with real credentials, SSH keys, or cloud configs, this fills a gap that nothing else does at this weight class.

The project is young and moving fast. Three weeks old, 109 stars, eight releases. The maintainer is clearly using it daily and fixing bugs as they surface. The contributor experience is excellent: labeled issues, fast merges, CI that catches real problems. The Landlock limitation (no per-file deny inside a writable directory) is a genuine technical constraint that will shape the project's future, and the maintainer's detailed write-up on issue #62 shows someone who understands the problem deeply and isn't reaching for shortcuts.

What would push greywall to the next level? Solving the atomic writes problem would unblock a lot of real-world usage. A guided setup wizard (instead of requiring users to understand profiles and config files) would lower the barrier for non-security-minded developers. And more built-in profiles for common development workflows beyond AI agents could widen the audience. But the foundation is solid, the security model is sound, and the code is cleaner than most projects ten times its age.

Go Look At This

If you run AI agents on your dev machine, go install greywall and try greywall -- claude or greywall -- cursor. The built-in profiles work out of the box. If you want tighter control, run greywall --learning -- <your-agent> to generate a profile from actual usage, then greywall profiles edit to fine-tune it.

Star the repo. Try the learning mode. If something breaks in your setup, open an issue. The maintainer responds fast and the codebase is navigable enough that you might end up fixing it yourself.

This is Review Bomb #9, a series where I find under-the-radar projects on GitHub, read the code, contribute something, and write it up. If you know a project that deserves more eyeballs, drop it in the comments.

This post was originally published at wshoffner.dev/blog. If you liked it, the Review Bomb series lives there too.

Top comments (8)

ArkForge • Mar 30

greywall addresses the blast radius problem well -- deny by default at the kernel level is the right direction.

The complementary problem is the audit trail: after the agent has acted within its allowed scope, can you reconstruct why it made each decision? Filesystem isolation prevents catastrophic mistakes, but it doesn't help when the agent does something within-bounds that you didn't expect.

For the cases where you need to explain to a stakeholder what the agent did and why -- not just what it accessed -- you still need a separate layer. The two approaches are orthogonal, not competing.

Wes • Mar 30

Spot on. Isolation and audit are orthogonal problems, and most of the tooling conversation right now is focused on the isolation half.

It gets worse with unsanctioned agents. Greywall as a system-wide kernel policy can at least constrain blast radius even when nobody sandboxed the agent. But the audit trail requires instrumentation the agent operator chose to install, which means it fails exactly when you need it most: when nobody was following process.

What's ArkForge building in this space? The runtime audit layer for agent decisions is still mostly ad hoc from what I've seen, and the standards work happening around agent discovery and capability declaration hasn't caught up to it yet.

Comment deleted

Wes • Mar 31

The proxy layer is the right call. Same philosophy as greywall's kernel-level enforcement: don't trust the thing you're constraining to cooperate with being constrained. If the agent doesn't need to know it's being audited, you've removed the single point of failure that kills most observability approaches.

The cryptographic proof chain is interesting for a different reason too. Right now there's no standard way for an agent to declare what it's capable of, what it's allowed to do, and what it actually did. Those are three separate problems and most of the ecosystem is focused on the first one at best.

We've been working on the declaration side through a set of open specs: Agent Site Manifest (ASM) for capability discovery and Agent Context Protocol (ACP) for scoped context delivery. Both ship via a single agents.json file at the site root. The idea is that if an agent can declare its capabilities and constraints up front, and a proxy like yours can verify what it actually did at runtime, you've closed the loop between intent and action.

Specs are published at clocktowerassoc.com/specs if you want to take a look. I'd be curious whether the proof chain model maps cleanly onto declared capabilities, or if there's a gap between what ASM describes and what your proxy can actually verify.

ArkForge • Mar 31

The three-problem framing is the right cut. Most of the ecosystem treats capability declaration as the whole problem when it's actually just the precondition. What you're allowed to do doesn't tell you what you did.

The proof chain sits in the third bucket. At the proxy layer we capture the full request context (what the agent saw), the decision (what it asked for), and the outcome (what was returned) anchored cryptographically so the record can't be altered retroactively. The agent's declared capabilities aren't in the loop right now.

The gap you're pointing to is real. If ASM declares the agent is scoped to read-only operations, and the proof chain shows it attempting a write, you've got a verifiable deviation between declared intent and actual behavior. Right now those two records exist in separate systems with no link between them.

The natural integration point: the proof references the capability declaration at creation time — proof includes the ASM fingerprint alongside the action record. Anyone auditing the trail can then verify not just what happened, but whether it was within declared scope. That's the loop closure.

I'll look at the specs at clocktowerassoc.com/specs. The agents.json approach is interesting for the same reason we sit at the proxy layer if the declaration mechanism doesn't require the agent to cooperate, you get the same enforcement properties without the single point of failure.

If there's a concrete gap between what ASM describes and what the proxy can verify, my guess is it's around dynamic capability scoping declarations that change based on runtime context rather than fixed at deploy time. Worth a closer look.

Wes • Mar 31

The ASM fingerprint in the proof record is a clean integration point. If the proof carries a reference to the capability declaration that was active at creation time, any auditor can diff declared scope against actual behavior without trusting either system independently. That's a stronger guarantee than either layer provides alone.

Your instinct on dynamic capability scoping is right, and it's the hardest open question in the spec right now. ASM currently declares capabilities as static: here's what this agent can do, period. But real-world agents negotiate scope at runtime based on authentication context, user role, resource state. A read-only agent that escalates to write access when a certain condition is met isn't violating its declaration if the declaration doesn't model conditional scoping. The proof chain would catch the write, but without a dynamic declaration to diff against, you can't distinguish a violation from a legitimate escalation.

That's exactly the kind of problem that's easier to solve with two perspectives on it. If you're interested in going deeper on how the proof chain and ASM could talk to each other, I'd be happy to continue the conversation outside of a comment thread. I'm easy to find through Clocktower's site or on GitHub as TickTockBent.

Ali Muwwakkil • Mar 30

A common misconception is that AI agents inherently understand security protocols, but in reality, they mirror your permissions without discretion. In our experience with enterprise teams, we've seen success by implementing sandbox environments where agents can safely experiment without risking critical data. This setup not only mitigates potential breaches but also allows for controlled testing and learning. - Ali Muwwakkil (ali-muwwakkil on LinkedIn)

Wes • Mar 30

Sandbox environments are definitely the enterprise default, but I think greywall opens up a threat model most people aren't considering yet. We tend to think of agents as tools we deliberately run, like apps we choose to launch inside a controlled environment. But what happens when an agent is running on your infrastructure that you didn't sanction?

A developer installs Cursor on a shared build server. An intern runs Claude Code against a repo with prod credentials in the environment. A malicious actor uses an agent as a force multiplier for lateral movement. In those scenarios, your sandbox doesn't help because nobody put the agent inside one.

What makes greywall interesting for enterprise is that kernel-level enforcement via Landlock and seccomp could work as a system-wide policy layer, not just a wrapper you voluntarily put around your own tools. Deny-by-default at the OS level, applied whether or not the person spinning up the agent thought to sandbox it.

Are your enterprise teams thinking about unsanctioned agent usage as a threat vector yet? I feel like most orgs are still in the "agents are tools we control" mindset and haven't caught up to the reality that anyone with a terminal and an API key can spin one up.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.