Why Most Agentic Engineering Tools Plateau

Q: How does knowledge compounding work in AI-assisted development?

A compound knowledge system follows a four-stage loop: work (execute a task), capture (document what was learned, including failures), route (feed the insight back to the specific agent or workflow that was active), and enforce (promote critical learnings to code-level guardrails that prevent recurrence). The key distinction from documentation-only approaches is the enforcement stage — where learnings change the system's behavior, not just its memory.

Your AI coding tools stop getting better after week two.

Session one hundred starts from the same blank slate as session one. The autocomplete gets faster. The models get smarter. But the system around them — the accumulated knowledge of what works, what broke, what to avoid — resets every time.

This is the plateau. And it is the central unsolved problem in AI-assisted engineering.

The industry has moved fast. Andrej Karpathy coined "vibe coding" in February 2025, then declared it passé exactly one year later when he introduced "agentic engineering" — the practice of orchestrating AI agents with human oversight instead of prompting one model at a time. That shift matters. But it only describes what changed in how we write code. It says nothing about what happens to the knowledge generated along the way.

The question that separates the tools that plateau from the ones that compound: does your system actually get better with use? Can you prove it?

This article traces where the industry is, where it stops, and what breaks through.

The Landscape: Where Most Tools Stop

AI-assisted development split into three distinct approaches over the past two years. Each one solved a real problem. None solved the deeper one.

Vibe Coding (2024–2025)

Ad-hoc prompting. Conversation as IDE. You describe what you want, the model generates code, you accept or reject. GitHub Copilot, Cursor, and Windsurf built massive businesses on this model — and for good reason. For prototypes and greenfield projects, it is fast.

Where it breaks: no memory between sessions, no specifications, no quality gates, no enforcement. Every conversation starts from scratch. The hundredth session is no smarter than the first.

Spec-Driven Development (2025–2026)

The first correction. Instead of prompting directly, you write a specification — a structured document describing what to build — and let agents execute against it.

Spec Kit, open-sourced by GitHub with over 76,000 stars, organizes work into four gated phases: specify, plan, tasks, implement. OpenSpec, backed by Y Combinator, takes a brownfield-first approach where specs live alongside code as long-term documentation. Kiro, from AWS, formalizes intent into structured specs using EARS notation. Tessl, founded by Snyk's Guy Podjarny and backed by $125 million in venture funding, maintains a registry of over 10,000 specs that prevent AI hallucinations about library APIs.

These are real advances. Capturing intent before coding produces better output than ad-hoc prompting.

But specs alone do not compound. A specification describes what to build. It does not describe what the team learned while building it, what mistakes to avoid next time, or how the workflow itself should change. The spec from project twelve looks the same as the spec from project one.

Compound Engineering (2025–2026)

The second correction. Every's Compound Engineering introduced a learning capture step after each unit of work. The workflow — plan, work, assess, compound — creates a loop where each feature generates documentation that informs the next. With 29 specialized agents, it brought the concept of compounding to the Claude Code ecosystem and inspired an important conversation about what it means for systems to learn.

This is closer. But compound engineering, as implemented by most tools, captures learnings into documentation files. Documentation is a starting point, not an endpoint. The deeper question is whether those learnings change how the system behaves — not just what it knows.

What Compound Knowledge Actually Looks Like

Compound knowledge is not a feature. It is an architectural property. A system either compounds or it does not, and the difference becomes visible over time.

Soleur is built on this principle. Every task it executes generates knowledge that feeds back into the system's rules, agents, and workflows — not just its documentation. Here is what that looks like in practice, drawn from real incidents in the project's compounding knowledge base.

Failure, Documentation, Rule, Enforcement

An AI agent edited files outside its designated workspace. Two hours of work disappeared — applied to the wrong directory, invisible until the session ended. In most systems, this is a lesson learned by a human and forgotten by the next session.

Here, the failure triggered a four-stage response:

Documentation. The incident was captured as a structured learning with root cause, symptoms, and prevention guidance.
Governance rule. The learning was promoted to the project's governance document — a living constitution of rules that grows with every failure.
Enforcement hook. A code-level guardrail was added that makes the mistake mechanically impossible. Not discouraged. Not documented. Impossible.
Routing. The insight was fed back to the specific skill that was active during the incident, making that skill's instructions permanently smarter.

This is the compounding arc. It took seventeen days from the initial failure to automated prevention. The system can never make that mistake again. No team member needs to remember the rule. No agent needs to read and follow a document. The guardrail is structural.

Hooks Beat Documentation

This leads to a contrarian insight about AI-assisted development: documentation-only rules fail.

Every enforcement hook in Soleur exists because a written rule was insufficient. Agents rationalize skipping prose instructions the way developers rationalize skipping code review at 5 PM on a Friday. The escalation path — prose rule fails, incident documented, code guardrail added — has repeated across dozens of failure classes. The result is a system with four mechanical guards that block known failure modes before they happen: direct commits to the main branch, destructive operations on isolated workspaces, merges without upstream synchronization, and commits containing unresolved conflicts.

These same guards enable multiple agents to work on separate features in parallel — each in its own isolated workspace, mechanically prevented from interfering with the others. Parallel execution is not a configuration option. It is a byproduct of the compounding arc: the guardrails that make it safe were themselves discovered through failures and enforced through hooks.

This is not a theoretical position. It is an empirical finding from hundreds of sessions.

The System Validates Its Own Workflow

Across eight features, a workflow gate called plan review — where parallel specialized reviewers analyze an implementation plan before any code is written — reduced scope by 30 to 96 percent.

Feature	Before Review	After Review	Reduction
Deduplication system	65 tasks	4 tasks	94%
Agent discovery	14+ files	1 file	93%
Rule automation	~395 lines	~15 lines	96%
Pre-flight checks	3 agents + 150 lines	23 lines inline	85%
Brand marketing	4 components	2 components	50%
Content generation	5 phases	4 phases	20%
Pipeline compliance	257 lines	55 lines	78%
Deviation analysis	7+ files, 30 criteria	1 file edit	86%

The shape is always the same: remove infrastructure that serves hypothetical future scale, keep the behavior change that delivers immediate value.

The compound system did not just execute this pattern — it generated the data that proved the pattern works. By the eighth confirmation, plan review was no longer an opinion. It was an empirically validated workflow gate. The system compounded its own evidence.

Self-Improving Instructions

The governance document that guides every session started as 26 lines. It now contains over 200 rules, each triggered by a real failure. When external research showed that oversized instruction files increase reasoning costs by 10–22 percent per interaction, the system applied that finding to itself — restructuring its own governance to contain only rules the AI would violate without being told on every turn.

The compound step does not just capture learnings into a file. It routes insights back to the specific agent or workflow that was active during the session. A lesson learned while using the planning workflow makes the planning workflow permanently better. A guardrail discovered during code review makes the review process permanently safer.

This is what it looks like when a system's governance document contains a rule that reads:

- Never commit directly to main [hook-enforced: guardrails.sh guardrails:block-commit-on-main]
- Never edit files in the main repo when a worktree is active [hook-enforced: worktree-write-guard.sh]
- Before merging any PR, merge origin/main into the feature branch [hook-enforced: pre-merge-rebase.sh]

Each line is a scar from a real incident. Each annotation — [hook-enforced] — means the system no longer relies on the AI reading and following the instruction. It is mechanically enforced.

How This Compares

What you need	Spec-driven	Compound engineering	Soleur
Capture intent before coding	Yes	Partial	Yes
Remember learnings across sessions	No	Yes	Yes
Self-improving rules and guardrails	No	No	Yes
Mechanical prevention of known failures	No	No	Yes
Full lifecycle (brainstorm through ship)	No	Partial (4 stages)	Yes (7+ stages)

Spec-driven development captures intent. Compound engineering captures learnings. Soleur compounds both — and feeds them back into the system's behavior, not just its documentation.

Beyond Engineering

Everything described above operates within engineering. But the principle extends further.

If compound knowledge transforms how engineering teams build software, what happens when the same architecture runs across every department — legal, marketing, sales, finance, operations, product, and support?

A brand guide created by a marketing agent informs the content strategy. A competitive analysis shapes pricing decisions. A legal audit references the privacy policy. Knowledge flows across domains because every agent reads from and writes to the same compounding knowledge base. The same principle extends to how those agents actually execute outside the codebase — by calling vendor APIs directly rather than driving server-side browsers, so the operational layer compounds the same way the engineering layer does.

This is the thesis behind Company-as-a-Service — a model where a single AI organization runs every department of a business. Not a copilot for code. Not an assistant for tasks. A full AI organization that plans, builds, reviews, remembers, and self-improves.

The engineering depth described in this article is the foundation. The full vision is bigger.

Start Building

Soleur runs 67 agents across 8 departments, all sharing a compounding knowledge base. Every decision teaches the system. Every project starts faster and more informed than the last.

The first billion-dollar company run by one person is not science fiction. It is an engineering problem.

Start building →

Frequently Asked Questions

What is compound engineering?

Compound engineering is the practice of designing AI-assisted development systems where each unit of work makes subsequent work easier. Unlike traditional development where technical debt accumulates, compound engineering inverts the curve: every feature, bug fix, and code review generates learnings that are captured, routed, and — in the most mature implementations — enforced mechanically.

How does knowledge compounding work in AI-assisted development?

A compound knowledge system follows a four-stage loop: work (execute a task), capture (document what was learned, including failures), route (feed the insight back to the specific agent or workflow that was active), and enforce (promote critical learnings to code-level guardrails that prevent recurrence). The key distinction from documentation-only approaches is the enforcement stage — where learnings change the system’s behavior, not just its memory.

What is the difference between vibe coding and agentic engineering?

Vibe coding, coined by Andrej Karpathy in February 2025, describes ad-hoc AI-assisted development: prompting a model conversationally and accepting the output. Agentic engineering, which Karpathy introduced in February 2026, describes the structured orchestration of AI agents with human oversight — using specifications, workflow gates, and quality checks to produce reliable output. The shift is from conversation to governance: from “tell the AI what you want” to “define the constraints, delegate execution, verify the results.”