Deep Analysis

The AI-Native
Development Workflow

How to combine Claude Code, Codex, Ralph Loop, and agent teams to build high-quality software with maximum efficiency.

How AI Agents Work

Before diving into tools and workflows, it helps to understand the three building blocks that make all of this possible. Every tool on this page—Claude Code, Codex, Ralph Loop, agent teams—is built from these same primitives.

🧠 The LLM

A large language model (LLM) takes text in and produces text out. That's it. It has no memory between calls, no access to your filesystem, and no ability to run code. On its own, it's a very sophisticated autocomplete. Everything else is built on top of this.

🔧 Tool Calls

To make an LLM useful, you give it tools—structured actions it can request. Instead of just outputting text, the model can output: "I want to read src/app.ts" or "run npm test". The system executes that action and feeds the result back. This is called function calling or tool use.

🔄 The Agent Loop

An agent is an LLM running in a loop: think → tool call → observe result → think → tool call → ... until the task is done. This is what separates an agent from a chatbot. A chatbot answers once. An agent keeps working—reading files, running tests, fixing errors—until it reaches the goal or gets stuck.

The Agent Loop Visualized

Think
Reason

LLM reads the conversation so far and decides what to do next.

Act
Tool Call

LLM requests an action: read file, write file, run command, search code.

See
Observe

The result of the action is fed back into the conversation as context.

Loop
Repeat

Back to Think. The loop continues until the task is complete or the agent asks for help.

RAG: Retrieval-Augmented Generation

LLMs only know what was in their training data. RAG is the pattern of retrieving relevant information first, then generating a response grounded in that information. When Claude Code searches your codebase, reads your files, or looks up documentation—that's RAG. It's why the agent can work with code it has never seen before: it reads your code first, then reasons about it.

Why This Matters

Every tool on this page is a different way of orchestrating the same agent loop. Claude Code is one agent looping. Agent teams are multiple agents looping in parallel. Ralph Loop is a bash script that restarts the agent loop when it finishes, pointed at the next task. The differences are in orchestration, not in kind. Once you understand think → act → observe, you understand all of them.

Foundation: CLAUDE.md & AGENTS.md

Both Claude Code and Codex rely on instruction files that act as persistent memory. These files are the single most important thing to get right—they compound knowledge across every session and every team member.

📋 CLAUDE.md

Claude Code uses a layered memory hierarchy: managed policy → user memory (~/.claude/CLAUDE.md) → project memory (./CLAUDE.md or .claude/CLAUDE.md) → modular rules (.claude/rules/*.md) → local memory (CLAUDE.local.md, gitignored) → auto memory. Use @path imports to pull in external files without bloating the root file. The entire team should contribute—every time Claude makes a mistake, add a rule so it never happens again.

📜 AGENTS.md

AGENTS.md is an open standard under the Linux Foundation's Agentic AI Foundation, meaning it works across tools—not just Codex. Codex reads it using a layered discovery system: global (~/.codex/AGENTS.md), then project root to current directory. Files concatenate root-to-current, with closer files overriding earlier guidance. Supports AGENTS.override.md for temporary changes.

Key Differences

Aspect Claude Code Codex
Instruction file CLAUDE.md AGENTS.md
Scope Managed policy → user → project → .claude/rules/ → local Global + layered per-directory
Override mechanism CLAUDE.local.md + .claude/rules/*.md + @path imports AGENTS.override.md files
Personal/local config CLAUDE.local.md (gitignored) AGENTS.override.md
Size limit ~2.5k tokens recommended 32 KiB default (configurable)
Team sharing Checked into git Checked into git
Best Practice

Both files serve the same purpose: preventing repeated mistakes and encoding team knowledge. Maintain both if you use both tools. Keep them concise—treat them like code, not documentation. Every rule should earn its place.

Claude Code Mastery

Claude Code is an agentic coding tool that runs in your terminal. It can read files, execute commands, write code, and create pull requests. The key to using it well is understanding its core workflow patterns.

The Inner Loop

1
Plan Mode

Shift+Tab twice. Iterate on the approach before any code is written.

2
Implement

Switch to auto-accept. Claude executes the plan, typically in one shot.

3
Verify

Run tests, lint, build. Give Claude a feedback loop to self-correct.

4
Ship

Use /commit-push-pr or similar slash command to create the PR.

Core Concepts

Skills

Reusable prompts that automate repeated workflows—/commit-push-pr, /verify-app, /code-simplifier. Defined in .claude/skills/<name>/SKILL.md (legacy .claude/commands/ still works). Inline bash pre-computes context to avoid wasted model calls.

🤖 Subagents

Lightweight child agents that run focused tasks within your session. Use them for code simplification, build validation, architecture review. Results return to your main context. Lower token cost than agent teams.

🔒 Permissions

Use /permissions to pre-allow safe commands instead of --dangerously-skip-permissions. Share via .claude/settings.json in git so the whole team has consistent behavior.

🔌 MCP Servers

Connect Claude to external tools (Slack, databases, Sentry) via Model Context Protocol. Config lives in .mcp.json, checked into git for team consistency.

Hooks: Automated Guardrails

🛡 Pre/Post Tool Hooks

Shell commands triggered on PreToolUse and PostToolUse events. Auto-format after edits, validate parameters, enforce rules on every tool call. Hooks communicate via stdout/stderr and exit codes only—they can't trigger slash commands or tool calls directly.

🛑 Quality Gates

Use Stop and UserPromptSubmit hooks to block PR creation if tests fail, enforce no secrets in diffs, run linters before commits.

👥 Team Hooks

TeammateIdle and TaskCompleted hooks prevent agent team members from going idle or marking tasks done without passing checks. Exit code 2 blocks the action and feeds the error message (stderr) back to Claude as feedback. The programmatic equivalent of Mitchell's "engineer the harness."

Parallel Sessions with Git Worktrees

Run multiple Claude Code instances simultaneously, each in its own git worktree to avoid file conflicts. This is the foundation of high-throughput development.

# Create worktrees for parallel features
git worktree add -b feat/auth ../project-auth
git worktree add -b feat/dashboard ../project-dashboard
git worktree add -b fix/perf ../project-perf

# Launch Claude Code in each (separate tmux panes)
tmux new-session -s auth -c ../project-auth
tmux new-window -t auth -c ../project-dashboard
tmux new-window -t auth -c ../project-perf

Codex as Reviewer & Parallel Worker

OpenAI Codex CLI runs tasks locally in sandboxed containers with the full repository pre-loaded. This execution model makes it ideal for code review, quality assurance, and parallel task queues.

📦 Task Queue

Queue multiple tasks that run independently in sandboxed environments. Each one has the repo pre-loaded, can run tests, and presents a PR when done. This is Codex's killer feature for throughput.

🔍 Review Mode

Point Codex at a PR and ask it to review for security, performance, or correctness. Its sandboxed environment means it can actually run the code to verify claims, not just read it.

GitHub Action

Codex provides openai/codex-action@v1 to automatically review PRs and post feedback. This makes the cross-review pipeline fully automated—every PR gets agent-powered review without manual triggering.

AGENTS.md Configuration

# ~/.codex/AGENTS.md - Global defaults
Always run tests before creating a PR.
Follow conventional commits for messages.
Never modify files outside the src/ directory.

# project/AGENTS.md - Project-specific
This is a TypeScript project using Next.js 15.
Run `pnpm test` for tests, `pnpm lint` for linting.
Database migrations go in prisma/migrations/.

The Cross-Review Workflow

The most powerful pattern: use Claude Code for implementation and Codex for independent review. This creates a two-model adversarial check that catches errors neither tool would find alone.

CC
Claude Code

Implements the feature, runs tests, creates a PR.

CX
Codex Reviews

Reviews the PR in a sandbox. Runs tests, checks quality, leaves feedback.

CC
Claude Fixes

Addresses review feedback, updates the PR.

You
Human Review

Final approval with high confidence from dual-model QA.

Why This Works

Automation Tip

Set up a GitHub Action that triggers Codex review on every PR created by Claude Code. This makes the cross-review pipeline fully automated.

Claude Code Agent Teams

Agent teams coordinate multiple Claude Code instances working together. One session is the team lead, spawning teammates that work independently with their own context windows, communicating via a shared task list and mailbox.

Experimental

Agent teams are experimental. Enable via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in your environment or settings.

"Agent teams are most effective for tasks where parallel exploration adds real value."

— Claude Code Documentation

Best Use Cases

🔎 Research & Review

Multiple teammates investigate different aspects simultaneously, then share and challenge each other's findings.

New Features

Teammates each own a separate module or layer without stepping on each other's work.

🐛 Debugging

Test competing hypotheses in parallel. Teammates actively try to disprove each other's theories.

📚 Cross-Layer

Frontend, backend, and test changes each owned by a different teammate, coordinating via the shared task list.

Agent Teams vs. Subagents

Aspect Subagents Agent Teams
Communication Report back to main agent only Message each other directly
Context Results summarized back Fully independent context windows
Token cost Lower Higher (scales with team size)
Best for Focused tasks, quick results Complex work needing collaboration
Display mode Within main session tmux split panes or in-process

Delegate Mode

🎯 Delegate Mode

Lead focuses purely on coordination—spawning teammates, sending messages, managing the task list—without implementing tasks itself. This prevents the lead from burning context on implementation details. Toggle with Shift+Tab.

tmux Users

Since you use tmux, set "teammateMode": "tmux" in your settings. Each teammate gets its own pane. Use tmux -CC in iTerm2 for the best experience, or standard tmux on Linux.

Why Specs Beat Prompts

Agents are autonomous executors that need a clear target. A prompt is a one-shot instruction that gets lost in context; a spec is a verifiable contract between you and the agent. It tells the agent what "done" looks like, gives it criteria to check its own work, and enables the outer loop (Ralph) to run unattended overnight. Writing the spec first is the single highest-leverage thing you can do before starting any agent-driven work.

Testable Requirements

Every line in the spec should be verifiable—include test commands, expected behaviors, edge cases. If the agent can't check its own work, it can't self-correct.

📐 Right-Sized Stories

Break features into stories that fit in one context window. One story per iteration keeps the agent focused and prevents context exhaustion. Too large = wasted tokens. Too small = overhead.

🎯 What, Not How

Describe acceptance criteria, not implementation steps. Let the agent choose the approach. Over-specifying implementation constrains the agent and often produces worse results.

Key Insight

The spec IS the prompt. A well-written PRD replaces long, fragile prompt chains. Write the spec, point the agent at it, and let the loop run. Formats (PRD.md, prd.json) are covered in the Ralph Loop section.

Ralph Loop & PRD-Driven Development

Ralph Loop is a community-built bash script that runs Claude Code repeatedly until all requirements in a PRD are complete. It is not an Anthropic product—it's an open-source tool you run in your own terminal. Named after the "Ralph Wiggum" technique, it's designed for unattended operation—perfect for overnight feature development.

What It Really Is

Strip away the name and Ralph Loop is just a bash while loop. It runs Claude Code, waits for it to finish, checks if there's more work in the PRD, and starts Claude Code again. That's the entire trick.

# The core idea (simplified)
while ! all_stories_pass; do
  claude "Read PRD, implement next story, run tests"
done

The problem it solves: Claude Code's agent loop (think → act → observe) runs inside a single session with a finite context window. For large features, one session isn't enough—the context fills up, or the model finishes one story and stops. Ralph Loop solves this by giving the agent loop an outer loop. Each session gets a fresh context window, picks up where the last one left off by reading the PRD, and works on the next incomplete item. It turns a single-session tool into an overnight assembly line.

How It Works

1
Write PRD

Define requirements in PRD.md or prd.json with checkable items.

2
Start Loop

The bash script begins autonomous iteration.

3
Iterate

Claude reads PRD, implements next item, runs tests, marks done.

4
Complete

Loop exits when all items pass. PR ready for review.

PRD Formats

📝 PRD.md (Simple)

Markdown with checkboxes. Best for small features. Claude checks off items as it completes them. Each major item gets a commit.

# Feature: User Profile
- [ ] Add profile page route
- [ ] Create profile form component
- [ ] Add avatar upload
- [ ] Write tests

📊 prd.json (Structured)

JSON with stories. Best for large features. One story per iteration keeps context windows clean. Each story gets its own commit.

{
  "stories": [
    { "title": "...",
      "passes": false }
  ]
}

The Overnight Workflow

"End-of-day agents handling research, exploration, and triage during low-energy periods."

— Mitchell Hashimoto, on using AI agents during off-hours

Boris Cherny's Workflow

Boris Cherny created Claude Code and uses it to ship ~100 PRs per week. His setup is surprisingly vanilla—no exotic hacks, just disciplined application of fundamentals at scale.

💻 Parallel Sessions

5 in terminal (numbered, with OS notifications), 5-10 on claude.ai, plus mobile sessions started in the morning. Each uses its own git checkout to avoid conflicts. Expects 10-20% abandonment rate.

🎯 Opus for Everything

Uses Opus with thinking exclusively. "Even though it's bigger & slower than Sonnet, since you have to steer it less, it is almost always faster in the end."

🗺 Plan First

Starts most sessions in Plan Mode (Shift+Tab twice). Iterates on the plan, then switches to auto-accept. "A good plan is really important!"

Verify Everything

"The most important thing: give Claude a way to verify its work. This feedback loop 2-3x the quality of the final result."

Key Practices

Mitchell Hashimoto's Framework

The co-founder of HashiCorp describes a deliberate six-step evolution from chatbot usage to continuous agent operation. The key insight: you must push through the inefficiency phase to reach transformation.

The Six Steps

"Break complex sessions into separate, clear tasks rather than attempting everything simultaneously. Provide agents with verification mechanisms to self-correct."

— Mitchell Hashimoto

Practical Takeaways

🎯 Task Isolation

One task per session. Don't overload context windows with unrelated work. Clear, focused prompts produce better results.

🧪 Verification

Give agents a way to check their own work. Tests, linters, type checkers—anything that provides automated feedback.

📓 Document Mistakes

When an agent makes a mistake, add it to CLAUDE.md/AGENTS.md. This compounds over time into increasingly reliable behavior.

Always Have an Agent Running

Mitchell's principle: you should always have at least one agent working in the background. While you focus on one task, an agent researches the next, reviews previous work, or explores alternatives. Dead time is wasted compute.

The Unified Workflow

Here is the optimized workflow combining all tools and techniques for maximum efficiency and quality.

Daytime: Active Development

Plan
Design

Use Plan Mode to define the approach. Use agent teams for research if the problem is complex.

Build
Implement

Claude Code in parallel worktrees via tmux. One feature per session.

Test
Verify

Run /verify. Give Claude the feedback loop. Fix issues in the same session.

Ship
PR + Review

Claude creates PR. Codex auto-reviews. Fix feedback. Human approves.

Background Agent

While working through the Plan → Build → Test → Ship cycle, always keep a background agent running on the next task, reviewing the last PR, or researching an upcoming feature. The goal: zero idle compute.

Overnight: Autonomous Development

PRD
Write Spec

Create PRD.md or prd.json with clear, testable requirements and verification steps.

Ralph
Start Loop

Ralph script in tmux. Claude iterates through stories autonomously.

AM
Morning QA

Review git log, check completed stories. Codex reviews the overnight PR.

Fix
Polish

Address any review feedback. Merge and deploy.

The Complete Stack

Tool Role When
Claude Code Primary implementation, planning, shipping Active development hours
Codex Code review, parallel task queues, QA After every PR, parallel tasks
Ralph Loop Autonomous iteration through PRD stories Overnight, long-running features
Agent Teams Parallel research, multi-module features Complex tasks needing coordination
tmux + Worktrees Session management, parallel isolation Always (infrastructure layer)
CLAUDE.md / AGENTS.md Institutional memory, mistake prevention Always (knowledge layer)
The Meta-Rule

Verification is everything. Every workflow above depends on giving the AI a way to check its own work. Tests, linters, type checkers, build scripts—these aren't just quality tools, they're the feedback loop that makes autonomous development possible. Invest in your test suite and CI pipeline before anything else.

Sources

This analysis is based on official documentation, creator insights, and practitioner experience.