Starting local is not thinking small. It's thinking strategically.
Stripe recently published a series on how they build "Minions" — fully auton...
For further actions, you may consider blocking this person and/or reporting abuse
The three-repo separation is clean and the Backstage catalog integration is a smart choice. What strikes me is that your resource-catalog is essentially solving agent discovery at the individual/team scale -- it answers the question "what exists and where does it live" for your agents.
The challenge I keep thinking about is what happens when this needs to work across organizational boundaries. Your library.yaml manifest is a single source of truth for your agent ecosystem, but when agents from different orgs need to discover each other's capabilities (which MCP servers are available, what skills a remote agent exposes, what protocols it speaks), there is no equivalent of library.yaml at the network level.
The symlink-based deployment is elegant for the single-developer case, but it also highlights the gap: symlinks only work when you control both ends. For cross-org agent interop, you would need something more like a DNS-style discovery mechanism where agents can resolve capabilities by querying a well-known endpoint rather than relying on pre-configured paths.
The token efficiency design is probably the most underrated part of this architecture. Scoping skills by directory is such a simple idea but the savings compound fast. Have you measured the actual token reduction versus a flat AGENTS.md approach? I would be curious to see the numbers.
Thank you so much for this insightful feedback! I really like your concerns and I can tell you that I've been thinking about it since I read it. Thanks for sharing it! It gave me a real thinking boost XD.
Inspired by your comment, I've been giving it some thought and you are completely right. So I propose:
In the library.yaml, allowing a dual system where:
Even so, we need a strategy to expose this library.yaml to the internet to be consumed by other agents. So, I also propose adopting the .well-known/ directory pattern.
Instead of an agent needing pre-configured knowledge of another organization's tools or relying on reading a remote library.yaml (which statically couples them to our Git structure and creates security/token overhead), we treat library.yaml solely as our GitOps Single Source of Truth.
Our CI/CD pipelines will parse this YAML, extract the publicly available MCP endpoints, and 'compile' an agent-capabilities.json file. This is then published to a standard endpoint like https:// api.yourcompany.com/.well-known/agent-capabilities.json (either via a static GCP Bucket with a CDN, Kubernetes Ingress, etc.).
When an external agent needs to interact with our infrastructure, it simply queries that public 'reception desk' endpoint. It discovers dynamically where the MCP servers live, what skills are exposed (e.g., Terraform runner), and what authentication is required (OAuth/mTLS).
What do you think?
Regarding token efficiency—you are absolutely right, it's the hidden superpower of the hierarchical design. While I haven't measured the exact token reduction yet, moving from a monolithic flat AGENTS.md (which would inject 10k-20k tokens mixing Terraform, React, personal, and work rules into every prompt) to a scoped directory approach keeps the context hyper-focused (around 2k-4k tokens). It saves cost, reduces latency (TTFT), and significantly mitigates the 'lost in the middle' phenomenon for LLMs. I'll definitely run some metrics on this for a follow-up post!
Thanks again for sparking this evolution in the design! All this transition is definitely blowing my mind right now and I think it is the next frontier for Agent Platform Engineering.
PD: In the following days I will publish a repo with this methodology and structure to iterate and work with it. Actually, the results I am having are amazing so I hope this could be a help for someone else and with some luck, to recieve external ideas and contribuitions like yours ;). Again, thanks.
Made an account just to ask when you're publishing that repo. I am sure it would help a lot of people out. This is unreal, I'm not a coder but I am able to understand.
You have the repo already published at the end of the post ;)
If I have an OpenClaw setup on a VPS With missioncontrol and I'm setting up a whole kuktiagent workflow with APIs etc (ideally model-router) and will have a dev aspect. This will be good for my setup? For any agentic setup?
Of course, I am using pi-coding-agent and OpenClawd and both are working perfectly fine, although I use both for different reasons. In fact, the proyect started with OpenClawd and all this design was first conceived to make OpenClawd be able to work in all my codebase.
Good framing on the laptop-to-enterprise path. One thing I'd add: the gap between 'works on my laptop' and 'runs in production' for agents is mostly about state management and failure recovery, not compute. I run agent workflows on a single VPS with systemd services and structured task queues. The infra complexity only needs to scale when the concurrency demands actually justify it.
Hey Saul, this is great. My new venture is about to come out of Stealth mode which is an agentic orchestration layer/adapter for Enterprise. My architecture covers everything in your article plus your non built section and more.
My name is Daniel Novitzkas and my startup is Astrohive (dot) ai
Come find me!
It looks amazing! Share or contact if you want user testing and feedback ;)
Really clean separation of concerns here... the brain/bridge/catalog distinction resolves something most people just ignore until it bites them.
One genuine question: how do you think about context overload as the agent-library grows? The directory scoping is elegant for coding contexts but I'm curious whether you've hit ceilings on the flat file approach as skill count scales.
I ended up going a different direction for similar reasons: a database-backed custom RAG implementation where each entry carries a summary, topic tree, relationships and metadata rather than full content. Sidesteps the chunking loss problem and lets agents pull exactly what's relevant per task rather than loading by directory proximity.
Curious whether you've considered a hybrid or whether the file approach has held up better than I'd expect.
This is a fantastic question, and context overload is definitely the final boss of agentic systems and its part of my focus when thinking in agentic platform engineering design.
To answer your question: the flat-file approach has held up surprisingly well, and hasn't hit a ceiling yet. The main reason is lazy loading and the fact that directory scoping acts as a highly accurate proxy for "task context".
When the agent is in a directory, the AGENTS.md layer doesn't inject the entire content of all available skills into the context window. It only injects the index (the skill names and a one-line
description). The agent only reads the actual step-by-step execution file when the skill is explicitly invoked. Because of the directory scoping, a single layer rarely exposes more than 5-10 highly relevant skills anyway.
However, the reason I actively avoid a database for agent instructions comes down to core Platform Engineering principles: GitOps and Determinism.
Agent behavior, safety constraints, and standard operating procedures are essentially Infrastructure as Code. If a team's agent misbehaves or executes a destructive action, I need to be able to look at a Git commit history, review a PR, and perform a deterministic rollback. DB-backed RAG systems are fantastic for dynamic retrieval, but they lose that strict version control and peer-reviewability. It's hard to PR a vector database update.
That said, I completely agree with your approach for a different tier of context, and I think the
hybrid model is the ultimate sweet spot.
In my roadmap, this hybrid model is achieved via MCP (Model Context Protocol):
So instead of the agent trying to load a massive API spec or relationship tree from a file, the
flat-file skill simply instructs: "Use your internal knowledge MCP tool to query the database for relationships and summaries before writing the code."
By the way, and talking about MCP, have you considered wrapping your DB-backed system into an MCP server? It feels like it would be the perfect bridge between deterministic agent instructions and dynamic knowledge retrieval.
Thanks for your comment! it gave me a lot to think about :)
Really appreciate you taking the time to break that down. The behavior/knowledge split you're describing is exactly the right mental model, and the GitOps determinism argument is one I hadn't seen articulated that cleanly before.
I did end up wrapping the DB-backed system into an MCP server, and it's working well... but building it surfaced something your roadmap might want to account for early: GitOps solves auditability of rules, but it doesn't solve auditability of retrieval. You can version-control what the agent is supposed to do, but the specific vector or graph result it actually got at runtime is still opaque. When something goes wrong, you can roll back the skill file, but you can't easily reconstruct why the agent pulled the context it did.
Still working through the right answer to that honestly. Logging the retrieval payloads alongside task execution helps, but it adds complexity fast.
The hybrid model is definitely the sweet spot though in my experience so far. Curious whether you're thinking about that retrieval auditability problem or approaching it differently.