How to combine Claude Code, Codex, Ralph Loop, and agent teams to build high-quality software with maximum efficiency.
Before diving into tools and workflows, it helps to understand the three building blocks that make all of this possible. Every tool on this page—Claude Code, Codex, Ralph Loop, agent teams—is built from these same primitives.
A large language model (LLM) takes text in and produces text out. That's it. It has no memory between calls, no access to your filesystem, and no ability to run code. On its own, it's a very sophisticated autocomplete. Everything else is built on top of this.
To make an LLM useful, you give it tools—structured actions it can request. Instead of just outputting text, the model can output: "I want to read src/app.ts" or "run npm test". The system executes that action and feeds the result back. This is called function calling or tool use.
An agent is an LLM running in a loop: think → tool call → observe result → think → tool call → ... until the task is done. This is what separates an agent from a chatbot. A chatbot answers once. An agent keeps working—reading files, running tests, fixing errors—until it reaches the goal or gets stuck.
LLM reads the conversation so far and decides what to do next.
LLM requests an action: read file, write file, run command, search code.
The result of the action is fed back into the conversation as context.
Back to Think. The loop continues until the task is complete or the agent asks for help.
LLMs only know what was in their training data. RAG is the pattern of retrieving relevant information first, then generating a response grounded in that information. When Claude Code searches your codebase, reads your files, or looks up documentation—that's RAG. It's why the agent can work with code it has never seen before: it reads your code first, then reasons about it.
Every tool on this page is a different way of orchestrating the same agent loop. Claude Code is one agent looping. Agent teams are multiple agents looping in parallel. Ralph Loop is a bash script that restarts the agent loop when it finishes, pointed at the next task. The differences are in orchestration, not in kind. Once you understand think → act → observe, you understand all of them.
Both Claude Code and Codex rely on instruction files that act as persistent memory. These files are the single most important thing to get right—they compound knowledge across every session and every team member.
Claude Code uses a layered memory hierarchy: managed policy → user memory (~/.claude/CLAUDE.md) → project memory (./CLAUDE.md or .claude/CLAUDE.md) → modular rules (.claude/rules/*.md) → local memory (CLAUDE.local.md, gitignored) → auto memory. Use @path imports to pull in external files without bloating the root file. The entire team should contribute—every time Claude makes a mistake, add a rule so it never happens again.
AGENTS.md is an open standard under the Linux Foundation's Agentic AI Foundation, meaning it works across tools—not just Codex. Codex reads it using a layered discovery system: global (~/.codex/AGENTS.md), then project root to current directory. Files concatenate root-to-current, with closer files overriding earlier guidance. Supports AGENTS.override.md for temporary changes.
| Aspect | Claude Code | Codex |
|---|---|---|
| Instruction file | CLAUDE.md | AGENTS.md |
| Scope | Managed policy → user → project → .claude/rules/ → local | Global + layered per-directory |
| Override mechanism | CLAUDE.local.md + .claude/rules/*.md + @path imports | AGENTS.override.md files |
| Personal/local config | CLAUDE.local.md (gitignored) | AGENTS.override.md |
| Size limit | ~2.5k tokens recommended | 32 KiB default (configurable) |
| Team sharing | Checked into git | Checked into git |
Both files serve the same purpose: preventing repeated mistakes and encoding team knowledge. Maintain both if you use both tools. Keep them concise—treat them like code, not documentation. Every rule should earn its place.
Claude Code is an agentic coding tool that runs in your terminal. It can read files, execute commands, write code, and create pull requests. The key to using it well is understanding its core workflow patterns.
Shift+Tab twice. Iterate on the approach before any code is written.
Switch to auto-accept. Claude executes the plan, typically in one shot.
Run tests, lint, build. Give Claude a feedback loop to self-correct.
Use /commit-push-pr or similar slash command to create the PR.
Reusable prompts that automate repeated workflows—/commit-push-pr, /verify-app, /code-simplifier. Defined in .claude/skills/<name>/SKILL.md (legacy .claude/commands/ still works). Inline bash pre-computes context to avoid wasted model calls.
Lightweight child agents that run focused tasks within your session. Use them for code simplification, build validation, architecture review. Results return to your main context. Lower token cost than agent teams.
Use /permissions to pre-allow safe commands instead of --dangerously-skip-permissions. Share via .claude/settings.json in git so the whole team has consistent behavior.
Connect Claude to external tools (Slack, databases, Sentry) via Model Context Protocol. Config lives in .mcp.json, checked into git for team consistency.
Shell commands triggered on PreToolUse and PostToolUse events. Auto-format after edits, validate parameters, enforce rules on every tool call. Hooks communicate via stdout/stderr and exit codes only—they can't trigger slash commands or tool calls directly.
Use Stop and UserPromptSubmit hooks to block PR creation if tests fail, enforce no secrets in diffs, run linters before commits.
TeammateIdle and TaskCompleted hooks prevent agent team members from going idle or marking tasks done without passing checks. Exit code 2 blocks the action and feeds the error message (stderr) back to Claude as feedback. The programmatic equivalent of Mitchell's "engineer the harness."
Run multiple Claude Code instances simultaneously, each in its own git worktree to avoid file conflicts. This is the foundation of high-throughput development.
OpenAI Codex CLI runs tasks locally in sandboxed containers with the full repository pre-loaded. This execution model makes it ideal for code review, quality assurance, and parallel task queues.
Queue multiple tasks that run independently in sandboxed environments. Each one has the repo pre-loaded, can run tests, and presents a PR when done. This is Codex's killer feature for throughput.
Point Codex at a PR and ask it to review for security, performance, or correctness. Its sandboxed environment means it can actually run the code to verify claims, not just read it.
Codex provides openai/codex-action@v1 to automatically review PRs and post feedback. This makes the cross-review pipeline fully automated—every PR gets agent-powered review without manual triggering.
The most powerful pattern: use Claude Code for implementation and Codex for independent review. This creates a two-model adversarial check that catches errors neither tool would find alone.
Implements the feature, runs tests, creates a PR.
Reviews the PR in a sandbox. Runs tests, checks quality, leaves feedback.
Addresses review feedback, updates the PR.
Final approval with high confidence from dual-model QA.
Set up a GitHub Action that triggers Codex review on every PR created by Claude Code. This makes the cross-review pipeline fully automated.
Agent teams coordinate multiple Claude Code instances working together. One session is the team lead, spawning teammates that work independently with their own context windows, communicating via a shared task list and mailbox.
Agent teams are experimental. Enable via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in your environment or settings.
"Agent teams are most effective for tasks where parallel exploration adds real value."
— Claude Code DocumentationMultiple teammates investigate different aspects simultaneously, then share and challenge each other's findings.
Teammates each own a separate module or layer without stepping on each other's work.
Test competing hypotheses in parallel. Teammates actively try to disprove each other's theories.
Frontend, backend, and test changes each owned by a different teammate, coordinating via the shared task list.
| Aspect | Subagents | Agent Teams |
|---|---|---|
| Communication | Report back to main agent only | Message each other directly |
| Context | Results summarized back | Fully independent context windows |
| Token cost | Lower | Higher (scales with team size) |
| Best for | Focused tasks, quick results | Complex work needing collaboration |
| Display mode | Within main session | tmux split panes or in-process |
Lead focuses purely on coordination—spawning teammates, sending messages, managing the task list—without implementing tasks itself. This prevents the lead from burning context on implementation details. Toggle with Shift+Tab.
Since you use tmux, set "teammateMode": "tmux" in your settings. Each teammate gets its own pane. Use tmux -CC in iTerm2 for the best experience, or standard tmux on Linux.
Agents are autonomous executors that need a clear target. A prompt is a one-shot instruction that gets lost in context; a spec is a verifiable contract between you and the agent. It tells the agent what "done" looks like, gives it criteria to check its own work, and enables the outer loop (Ralph) to run unattended overnight. Writing the spec first is the single highest-leverage thing you can do before starting any agent-driven work.
Every line in the spec should be verifiable—include test commands, expected behaviors, edge cases. If the agent can't check its own work, it can't self-correct.
Break features into stories that fit in one context window. One story per iteration keeps the agent focused and prevents context exhaustion. Too large = wasted tokens. Too small = overhead.
Describe acceptance criteria, not implementation steps. Let the agent choose the approach. Over-specifying implementation constrains the agent and often produces worse results.
The spec IS the prompt. A well-written PRD replaces long, fragile prompt chains. Write the spec, point the agent at it, and let the loop run. Formats (PRD.md, prd.json) are covered in the Ralph Loop section.
Ralph Loop is a community-built bash script that runs Claude Code repeatedly until all requirements in a PRD are complete. It is not an Anthropic product—it's an open-source tool you run in your own terminal. Named after the "Ralph Wiggum" technique, it's designed for unattended operation—perfect for overnight feature development.
Strip away the name and Ralph Loop is just a bash while loop. It runs Claude Code, waits for it to finish, checks if there's more work in the PRD, and starts Claude Code again. That's the entire trick.
The problem it solves: Claude Code's agent loop (think → act → observe) runs inside a single session with a finite context window. For large features, one session isn't enough—the context fills up, or the model finishes one story and stops. Ralph Loop solves this by giving the agent loop an outer loop. Each session gets a fresh context window, picks up where the last one left off by reading the PRD, and works on the next incomplete item. It turns a single-session tool into an overnight assembly line.
Define requirements in PRD.md or prd.json with checkable items.
The bash script begins autonomous iteration.
Claude reads PRD, implements next item, runs tests, marks done.
Loop exits when all items pass. PR ready for review.
Markdown with checkboxes. Best for small features. Claude checks off items as it completes them. Each major item gets a commit.
JSON with stories. Best for large features. One story per iteration keeps context windows clean. Each story gets its own commit.
"End-of-day agents handling research, exploration, and triage during low-energy periods."
— Mitchell Hashimoto, on using AI agents during off-hoursBoris Cherny created Claude Code and uses it to ship ~100 PRs per week. His setup is surprisingly vanilla—no exotic hacks, just disciplined application of fundamentals at scale.
5 in terminal (numbered, with OS notifications), 5-10 on claude.ai, plus mobile sessions started in the morning. Each uses its own git checkout to avoid conflicts. Expects 10-20% abandonment rate.
Uses Opus with thinking exclusively. "Even though it's bigger & slower than Sonnet, since you have to steer it less, it is almost always faster in the end."
Starts most sessions in Plan Mode (Shift+Tab twice). Iterates on the plan, then switches to auto-accept. "A good plan is really important!"
"The most important thing: give Claude a way to verify its work. This feedback loop 2-3x the quality of the final result."
/commit-push-pr is used dozens of times daily. Skills live in .claude/skills/<name>/SKILL.md (legacy .claude/commands/ still works) with inline bash for context./permissions to pre-allow safe commands. Share settings in .claude/settings.json via git..mcp.json, checked into git.The co-founder of HashiCorp describes a deliberate six-step evolution from chatbot usage to continuous agent operation. The key insight: you must push through the inefficiency phase to reach transformation.
"Break complex sessions into separate, clear tasks rather than attempting everything simultaneously. Provide agents with verification mechanisms to self-correct."
— Mitchell HashimotoOne task per session. Don't overload context windows with unrelated work. Clear, focused prompts produce better results.
Give agents a way to check their own work. Tests, linters, type checkers—anything that provides automated feedback.
When an agent makes a mistake, add it to CLAUDE.md/AGENTS.md. This compounds over time into increasingly reliable behavior.
Mitchell's principle: you should always have at least one agent working in the background. While you focus on one task, an agent researches the next, reviews previous work, or explores alternatives. Dead time is wasted compute.
Here is the optimized workflow combining all tools and techniques for maximum efficiency and quality.
Use Plan Mode to define the approach. Use agent teams for research if the problem is complex.
Claude Code in parallel worktrees via tmux. One feature per session.
Run /verify. Give Claude the feedback loop. Fix issues in the same session.
Claude creates PR. Codex auto-reviews. Fix feedback. Human approves.
While working through the Plan → Build → Test → Ship cycle, always keep a background agent running on the next task, reviewing the last PR, or researching an upcoming feature. The goal: zero idle compute.
Create PRD.md or prd.json with clear, testable requirements and verification steps.
Ralph script in tmux. Claude iterates through stories autonomously.
Review git log, check completed stories. Codex reviews the overnight PR.
Address any review feedback. Merge and deploy.
| Tool | Role | When |
|---|---|---|
| Claude Code | Primary implementation, planning, shipping | Active development hours |
| Codex | Code review, parallel task queues, QA | After every PR, parallel tasks |
| Ralph Loop | Autonomous iteration through PRD stories | Overnight, long-running features |
| Agent Teams | Parallel research, multi-module features | Complex tasks needing coordination |
| tmux + Worktrees | Session management, parallel isolation | Always (infrastructure layer) |
| CLAUDE.md / AGENTS.md | Institutional memory, mistake prevention | Always (knowledge layer) |
Verification is everything. Every workflow above depends on giving the AI a way to check its own work. Tests, linters, type checkers, build scripts—these aren't just quality tools, they're the feedback loop that makes autonomous development possible. Invest in your test suite and CI pipeline before anything else.
This analysis is based on official documentation, creator insights, and practitioner experience.