AI can write code. Shipping software is a different problem.
Most AI coding today is a prompt loop. You describe what you want, the AI generates code, you review it, you go back and forth. That works for small tasks — a utility function, a config change, a quick script.
But real development isn’t small tasks. It’s navigating a 200-file codebase you’ve never seen before. It’s planning a multi-step feature that touches six services. It’s debugging a production issue where the error message is three layers removed from the actual cause.
And the prompt loop breaks down fast:
Agents forget context. Long sessions lose the thread. You re-explain things the AI already knew two messages ago.
Mistakes repeat. Without structure, AI makes the same bad assumptions over and over. No memory, no learning.
Brownfield is hard. Large, complex existing codebases don’t fit in a context window. The AI can’t reason about what it can’t see.
Nothing ships. Generating code is not the same as researching, planning, implementing, validating, and delivering a working change.
Catalyst solves this by giving Claude Code structure.
Catalyst is developed by Ryan Rozich. The methodology behind it — inverting time allocation to roughly 80% research and planning, 20% execution — is described in detail in Beyond Prompt and Pray: A Field Guide to Structured AI Engineering. That guide captures the daily workflow that the catalyst-dev plugin implements: defining tasks clearly before agents engage, fanning out sub-agents for parallel research, compacting findings into focused documents, planning interactively with human approval gates, and executing against an agreed spec with continuous validation.
If you want to understand why Catalyst works the way it does — why each phase clears context, why research comes before planning, why sub-agents get their own context windows — that guide is the place to start.
Catalyst is a plugin system for Claude Code. It chains together four types of building blocks into repeatable development workflows — so the AI works reliably and predictably, and code actually ships.
Skills
The core unit of Catalyst functionality. Some skills are user-invocable — structured workflows you trigger with a slash command (/commit). Others are model-invocable — reference knowledge that Claude activates automatically when it detects relevant context, like seeing a ticket ID. Skills orchestrate multi-step processes, spawn agents, and save artifacts.
Agents
Specialized roles that skills delegate to. A locator agent finds files. An analyzer agent reads implementation details. A pattern-finder agent discovers reusable code. Skills spawn these in parallel to gather information fast — and each agent gets its own context window, so it can focus deeply on one thing.
Memory
Persistent, git-backed storage that survives across sessions. Catalyst uses the HumanLayer thoughts system as long-term memory — research findings, implementation plans, handoff checkpoints, and PR descriptions all persist in thoughts/shared/. Any agent, in any session, can come back and understand what happened before. This is what makes multi-day features and team handoffs possible.
Hooks
Automation that runs at specific moments in a workflow. A hook injects plan structure guidance, syncs plans to persistent storage, or tracks workflow context for skill chaining. Guardrails you don’t have to remember.
These four pieces compose. A skill spawns agents, agents use their focused context to research and analyze, memory preserves artifacts across sessions, and hooks enforce quality and track state at each gate. The result is a workflow — not a conversation, but a structured process that produces predictable output.
The Core Workflow: Research, Plan, Implement, Ship
The central pattern in Catalyst is the RPI workflow — research, plan, implement. Every phase produces a persistent artifact that feeds the next.
graph LR
R["/research-codebase"] --> P["/create-plan"]
P --> I["/implement-plan"]
I --> V["/validate-plan"]
V --> S["/create-pr"]
style R fill:#2563eb,color:#fff,stroke:none
style P fill:#7c3aed,color:#fff,stroke:none
style I fill:#059669,color:#fff,stroke:none
style V fill:#d97706,color:#fff,stroke:none
style S fill:#dc2626,color:#fff,stroke:none
Research — /research-codebase spawns parallel agents (locator, analyzer, pattern-finder) that fan out across the codebase and report back. External tools like DeepWiki and Context7 pull in library docs and open-source patterns. The output is a research document saved to persistent storage.
Plan — /create-plan reads the research, then works with you interactively to build an implementation plan. Phases, success criteria, file-level specifics. The plan is a contract — both you and the AI agree on what gets built before any code is written.
Implement — /implement-plan reads the full plan and executes it phase by phase. Each phase has automated verification. Checkboxes update as work completes. Subagents handle individual tasks — not just for parallelism, but so each agent gets a focused job and can use its full context window on that one thing.
Validate — /validate-plan verifies the implementation against the plan’s success criteria. Tests pass. Behavior matches. Deviations are documented.
Ship — /create-pr generates a PR description from the work, links to the Linear ticket, and pushes. /merge-pr handles the merge with verification.
Each phase clears context and starts fresh — that’s by design. Long-running sessions are where AI loses the thread. Instead, Catalyst uses handoff checkpoints (/create-handoff and /resume-handoff) to compress what the AI knows into a persistent document before moving on. The next phase picks up exactly where the last one left off, with clean context and no drift.
This happens automatically between workflow phases, but you can also create handoffs manually — mid-implementation when context is getting long, at the end of a session, or when passing work to a teammate. Every handoff is a snapshot: what was done, what’s left, and what decisions were made.
The workflow runs on a few key pieces of infrastructure:
Thoughts system — Think of it like a shared S3 bucket for everything the AI has done: research, plans, handoffs, PR descriptions. It’s git-backed, shared across worktrees and sessions, and it’s how any agent — in any session — can come back and understand what happened before.
Worktrees — Catalyst is designed around git worktrees so you can have multiple features in flight at once. Each worktree gets its own workspace, its own branch, and its own local state — but they all share the same thoughts repo.
Workflow context — Local state that tracks what you’ve done in this worktree. It’s what lets /create-plan automatically find your latest research and /implement-plan find your latest plan — no file paths to remember.
Subagents — Commands spawn specialized agents, each with a focused job and a full context window dedicated to it. A locator agent searches the whole codebase for relevant files. An analyzer agent reads those files deeply. They work in parallel, but the real benefit is focus — each agent reasons about one thing well instead of juggling everything at once.
Third-party integrations — Linear for tickets, GitHub for PRs, DeepWiki and Context7 for external documentation, Sentry for error investigation.
You start a session. You run /research-codebase and describe what you need to understand. Three agents fan out across the codebase, find relevant files, analyze patterns, and report back. A research document lands in your thoughts repo. Context is clean.
You run /create-plan. Catalyst picks up the research automatically, and you work together to design the implementation. Phases, files, success criteria. When you’re satisfied, the plan saves. If the session is getting long, /create-handoff compresses everything into a checkpoint — you can resume later or hand it to a fresh session.
You run /implement-plan. Catalyst reads the full plan and starts building — phase by phase, with verification at each step. When it’s done, you run /validate-plan to confirm everything matches the spec.
You run /create-pr. A PR with a structured description, linked to your Linear ticket, ready for review.
Research to shipped PR. Each step is structured, each handoff is clean, and humans approve at every gate.