The AI-Native Development Workflow

01 — First Principles

How AI Agents Work

Before diving into tools and workflows, it helps to understand the three building blocks that make all of this possible. Every tool on this page—Claude Code, Codex, Ralph Loop, agent teams—is built from these same primitives.

🧠 The LLM

A large language model (LLM) takes text in and produces text out. That's it. It has no memory between calls, no access to your filesystem, and no ability to run code. On its own, it's a very sophisticated autocomplete. Everything else is built on top of this.

🔧 Tool Calls

To make an LLM useful, you give it tools—structured actions it can request. Instead of just outputting text, the model can output: "I want to read src/app.ts" or "run npm test". The system executes that action and feeds the result back. This is called function calling or tool use.

🔄 The Agent Loop

An agent is an LLM running in a loop: think → tool call → observe result → think → tool call → ... until the task is done. This is what separates an agent from a chatbot. A chatbot answers once. An agent keeps working—reading files, running tests, fixing errors—until it reaches the goal or gets stuck.

The Agent Loop Visualized

Think

Reason

LLM reads the conversation so far and decides what to do next.

→

Act

Tool Call

LLM requests an action: read file, write file, run command, search code.

→

See

Observe

The result of the action is fed back into the conversation as context.

→

Loop

Repeat

Back to Think. The loop continues until the task is complete or the agent asks for help.

RAG: Retrieval-Augmented Generation

LLMs only know what was in their training data. RAG is the pattern of retrieving relevant information first, then generating a response grounded in that information. When Claude Code searches your codebase, reads your files, or looks up documentation—that's RAG. It's why the agent can work with code it has never seen before: it reads your code first, then reasons about it.

Why This Matters

Every tool on this page is a different way of orchestrating the same agent loop. Claude Code is one agent looping. Agent teams are multiple agents looping in parallel. Ralph Loop is a bash script that restarts the agent loop when it finishes, pointed at the next task. The differences are in orchestration, not in kind. Once you understand think → act → observe, you understand all of them.

02 — Configuration

Foundation: CLAUDE.md & AGENTS.md

Both Claude Code and Codex rely on instruction files that act as persistent memory. These files are the single most important thing to get right—they compound knowledge across every session and every team member.

📋 CLAUDE.md

Claude Code uses a layered memory hierarchy: managed policy → user memory (~/.claude/CLAUDE.md) → project memory (./CLAUDE.md or .claude/CLAUDE.md) → modular rules (.claude/rules/*.md) → local memory (CLAUDE.local.md, gitignored) → auto memory. Use @path imports to pull in external files without bloating the root file. The entire team should contribute—every time Claude makes a mistake, add a rule so it never happens again.

📜 AGENTS.md

AGENTS.md is an open standard under the Linux Foundation's Agentic AI Foundation, meaning it works across tools—not just Codex. Codex reads it using a layered discovery system: global (~/.codex/AGENTS.md), then project root to current directory. Files concatenate root-to-current, with closer files overriding earlier guidance. Supports AGENTS.override.md for temporary changes.

Key Differences

Aspect	Claude Code	Codex
Instruction file	CLAUDE.md	AGENTS.md
Scope	Managed policy → user → project → .claude/rules/ → local	Global + layered per-directory
Override mechanism	CLAUDE.local.md + .claude/rules/*.md + @path imports	AGENTS.override.md files
Personal/local config	CLAUDE.local.md (gitignored)	AGENTS.override.md
Size limit	~2.5k tokens recommended	32 KiB default (configurable)
Team sharing	Checked into git	Checked into git

Best Practice

Both files serve the same purpose: preventing repeated mistakes and encoding team knowledge. Maintain both if you use both tools. Keep them concise—treat them like code, not documentation. Every rule should earn its place.

03 — Primary Tool

Claude Code Mastery

Claude Code is an agentic coding tool that runs in your terminal. It can read files, execute commands, write code, and create pull requests. The key to using it well is understanding its core workflow patterns.

The Inner Loop

Plan Mode

Shift+Tab twice. Iterate on the approach before any code is written.

→

Implement

Switch to auto-accept. Claude executes the plan, typically in one shot.

→

Verify

Run tests, lint, build. Give Claude a feedback loop to self-correct.

→

Ship

Use /commit-push-pr or similar slash command to create the PR.

Core Concepts

⚡ Skills

Reusable prompts that automate repeated workflows—/commit-push-pr, /verify-app, /code-simplifier. Defined in .claude/skills/<name>/SKILL.md (legacy .claude/commands/ still works). Inline bash pre-computes context to avoid wasted model calls.

🤖 Subagents

Lightweight child agents that run focused tasks within your session. Use them for code simplification, build validation, architecture review. Results return to your main context. Lower token cost than agent teams.

🔒 Permissions

Use /permissions to pre-allow safe commands instead of --dangerously-skip-permissions. Share via .claude/settings.json in git so the whole team has consistent behavior.

🔌 MCP Servers

Connect Claude to external tools (Slack, databases, Sentry) via Model Context Protocol. Config lives in .mcp.json, checked into git for team consistency.

Hooks: Automated Guardrails

🛡 Pre/Post Tool Hooks

Shell commands triggered on PreToolUse and PostToolUse events. Auto-format after edits, validate parameters, enforce rules on every tool call. Hooks communicate via stdout/stderr and exit codes only—they can't trigger slash commands or tool calls directly.

🛑 Quality Gates

Use Stop and UserPromptSubmit hooks to block PR creation if tests fail, enforce no secrets in diffs, run linters before commits.

👥 Team Hooks

TeammateIdle and TaskCompleted hooks prevent agent team members from going idle or marking tasks done without passing checks. Exit code 2 blocks the action and feeds the error message (stderr) back to Claude as feedback. The programmatic equivalent of Mitchell's "engineer the harness."

Parallel Sessions with Git Worktrees

Run multiple Claude Code instances simultaneously, each in its own git worktree to avoid file conflicts. This is the foundation of high-throughput development.

                # Create worktrees for parallel features

                git worktree add -b feat/auth ../project-auth

                git worktree add -b feat/dashboard ../project-dashboard

                git worktree add -b fix/perf ../project-perf

                # Launch Claude Code in each (separate tmux panes)

                tmux new-session -s auth -c ../project-auth

                tmux new-window -t auth -c ../project-dashboard

                tmux new-window -t auth -c ../project-perf

04 — Complementary Tool

Codex as Reviewer & Parallel Worker

OpenAI Codex CLI runs tasks locally in sandboxed containers with the full repository pre-loaded. This execution model makes it ideal for code review, quality assurance, and parallel task queues.

📦 Task Queue

Queue multiple tasks that run independently in sandboxed environments. Each one has the repo pre-loaded, can run tests, and presents a PR when done. This is Codex's killer feature for throughput.

🔍 Review Mode

Point Codex at a PR and ask it to review for security, performance, or correctness. Its sandboxed environment means it can actually run the code to verify claims, not just read it.

⚙ GitHub Action

Codex provides openai/codex-action@v1 to automatically review PRs and post feedback. This makes the cross-review pipeline fully automated—every PR gets agent-powered review without manual triggering.

AGENTS.md Configuration

                # ~/.codex/AGENTS.md - Global defaults

                Always run tests before creating a PR.

                Follow conventional commits for messages.

                Never modify files outside the src/ directory.

                # project/AGENTS.md - Project-specific

                This is a TypeScript project using Next.js 15.

                Run `pnpm test` for tests, `pnpm lint` for linting.

                Database migrations go in prisma/migrations/.

05 — Quality Assurance

The Cross-Review Workflow

The most powerful pattern: use Claude Code for implementation and Codex for independent review. This creates a two-model adversarial check that catches errors neither tool would find alone.

Claude Code

Implements the feature, runs tests, creates a PR.

→

Codex Reviews

Reviews the PR in a sandbox. Runs tests, checks quality, leaves feedback.

→

Claude Fixes

Addresses review feedback, updates the PR.

→

You

Human Review

Final approval with high confidence from dual-model QA.

Why This Works

Different model architectures catch different classes of bugs. What one model overlooks, the other often finds.
Sandboxed execution in Codex means the reviewer can actually run the code, not just read diffs.
Automated feedback loop—Claude Code can pick up Codex's review comments and fix them without human intervention for routine issues.
Human review becomes final verification rather than first-pass bug hunting, dramatically improving efficiency.

Automation Tip

Set up a GitHub Action that triggers Codex review on every PR created by Claude Code. This makes the cross-review pipeline fully automated.

06 — Parallel Agents

Claude Code Agent Teams

Agent teams coordinate multiple Claude Code instances working together. One session is the team lead, spawning teammates that work independently with their own context windows, communicating via a shared task list and mailbox.

Experimental

Agent teams are experimental. Enable via CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1 in your environment or settings.

"Agent teams are most effective for tasks where parallel exploration adds real value."

— Claude Code Documentation

Best Use Cases

🔎 Research & Review

Multiple teammates investigate different aspects simultaneously, then share and challenge each other's findings.

✨ New Features

Teammates each own a separate module or layer without stepping on each other's work.

🐛 Debugging

Test competing hypotheses in parallel. Teammates actively try to disprove each other's theories.

📚 Cross-Layer

Frontend, backend, and test changes each owned by a different teammate, coordinating via the shared task list.

Agent Teams vs. Subagents

Aspect	Subagents	Agent Teams
Communication	Report back to main agent only	Message each other directly
Context	Results summarized back	Fully independent context windows
Token cost	Lower	Higher (scales with team size)
Best for	Focused tasks, quick results	Complex work needing collaboration
Display mode	Within main session	tmux split panes or in-process

Delegate Mode

🎯 Delegate Mode

Lead focuses purely on coordination—spawning teammates, sending messages, managing the task list—without implementing tasks itself. This prevents the lead from burning context on implementation details. Toggle with Shift+Tab.

tmux Users

Since you use tmux, set "teammateMode": "tmux" in your settings. Each teammate gets its own pane. Use tmux -CC in iTerm2 for the best experience, or standard tmux on Linux.

07 — Spec-Driven Development

Why Specs Beat Prompts

Agents are autonomous executors that need a clear target. A prompt is a one-shot instruction that gets lost in context; a spec is a verifiable contract between you and the agent. It tells the agent what "done" looks like, gives it criteria to check its own work, and enables the outer loop (Ralph) to run unattended overnight. Writing the spec first is the single highest-leverage thing you can do before starting any agent-driven work.

✅ Testable Requirements

Every line in the spec should be verifiable—include test commands, expected behaviors, edge cases. If the agent can't check its own work, it can't self-correct.

📐 Right-Sized Stories

Break features into stories that fit in one context window. One story per iteration keeps the agent focused and prevents context exhaustion. Too large = wasted tokens. Too small = overhead.

🎯 What, Not How

Describe acceptance criteria, not implementation steps. Let the agent choose the approach. Over-specifying implementation constrains the agent and often produces worse results.

Key Insight

The spec IS the prompt. A well-written PRD replaces long, fragile prompt chains. Write the spec, point the agent at it, and let the loop run. Formats (PRD.md, prd.json) are covered in the Ralph Loop section.

08 — Overnight Automation

Ralph Loop & PRD-Driven Development

Ralph Loop is a community-built bash script that runs Claude Code repeatedly until all requirements in a PRD are complete. It is not an Anthropic product—it's an open-source tool you run in your own terminal. Named after the "Ralph Wiggum" technique, it's designed for unattended operation—perfect for overnight feature development.

What It Really Is

Strip away the name and Ralph Loop is just a bash while loop. It runs Claude Code, waits for it to finish, checks if there's more work in the PRD, and starts Claude Code again. That's the entire trick.

                # The core idea (simplified)

                while ! all_stories_pass; do

                  claude "Read PRD, implement next story, run tests"

                done

The problem it solves: Claude Code's agent loop (think → act → observe) runs inside a single session with a finite context window. For large features, one session isn't enough—the context fills up, or the model finishes one story and stops. Ralph Loop solves this by giving the agent loop an outer loop. Each session gets a fresh context window, picks up where the last one left off by reading the PRD, and works on the next incomplete item. It turns a single-session tool into an overnight assembly line.

How It Works

Write PRD

Define requirements in PRD.md or prd.json with checkable items.

→

Start Loop

The bash script begins autonomous iteration.

→

Iterate

Claude reads PRD, implements next item, runs tests, marks done.

→

Complete

Loop exits when all items pass. PR ready for review.

PRD Formats

📝 PRD.md (Simple)

Markdown with checkboxes. Best for small features. Claude checks off items as it completes them. Each major item gets a commit.

                        # Feature: User Profile

                        - [ ] Add profile page route

                        - [ ] Create profile form component

                        - [ ] Add avatar upload

                        - [ ] Write tests

📊 prd.json (Structured)

JSON with stories. Best for large features. One story per iteration keeps context windows clean. Each story gets its own commit.

                        {

                          "stories": [

                            { "title": "...",

                              "passes": false }

                          ]

                        }

The Overnight Workflow

Before bed: Write a detailed PRD with clear, testable requirements. Include verification commands.
Start Ralph Loop: Run the ralph script in a tmux session so it survives terminal disconnect.
Morning review: Check the git log. Each completed story is a separate commit. Review the PR, run final QA.
If blocked: Ralph creates a BLOCKED.md documenting what it couldn't resolve. Pick up from there.

"End-of-day agents handling research, exploration, and triage during low-energy periods."

— Mitchell Hashimoto, on using AI agents during off-hours

09 — From the Creator

Boris Cherny's Workflow

Boris Cherny created Claude Code and uses it to ship ~100 PRs per week. His setup is surprisingly vanilla—no exotic hacks, just disciplined application of fundamentals at scale.

💻 Parallel Sessions

5 in terminal (numbered, with OS notifications), 5-10 on claude.ai, plus mobile sessions started in the morning. Each uses its own git checkout to avoid conflicts. Expects 10-20% abandonment rate.

🎯 Opus for Everything

Uses Opus with thinking exclusively. "Even though it's bigger & slower than Sonnet, since you have to steer it less, it is almost always faster in the end."

🗺 Plan First

Starts most sessions in Plan Mode (Shift+Tab twice). Iterates on the plan, then switches to auto-accept. "A good plan is really important!"

✅ Verify Everything

"The most important thing: give Claude a way to verify its work. This feedback loop 2-3x the quality of the final result."

Key Practices

CLAUDE.md as institutional memory: The team updates it multiple times weekly. During code review, Boris tags @.claude to add learnings. Their CLAUDE.md is ~2.5k tokens.
Skills for every repeated workflow: /commit-push-pr is used dozens of times daily. Skills live in .claude/skills/<name>/SKILL.md (legacy .claude/commands/ still works) with inline bash for context.
Strategic permissions: Use /permissions to pre-allow safe commands. Share settings in .claude/settings.json via git.
MCP for external tools: Slack, BigQuery, Sentry connected via MCP servers. Config in .mcp.json, checked into git.
Dedicated subagents: code-simplifier, verify-app, build-validator, code-architect—each focused on one job.

10 — Adoption Framework

Mitchell Hashimoto's Framework

The co-founder of HashiCorp describes a deliberate six-step evolution from chatbot usage to continuous agent operation. The key insight: you must push through the inefficiency phase to reach transformation.

The Six Steps

Abandon chatbots for agents. Conversational interfaces have limited utility. Agents that read files, execute programs, and loop are transformative.
Reproduce work agentic-ally. Do work twice intentionally—once manually, once with the agent—to build expertise and calibrate trust.
End-of-day agents. Use low-energy periods for agent-driven research, exploration, and triage. Check results the next morning.
Delegate high-confidence tasks. Keep engaging deep work for yourself. Delegate well-defined, verifiable tasks to agents.
Engineer the harness. Create AGENTS.md/CLAUDE.md and programmed tools. Document mistakes systematically to prevent recurrence.
Continuous agent operation. Always have at least one agent running in the background. Target 10-20% of workday with agents active. While you focus on one task, an agent researches the next, reviews previous work, or explores alternatives. Disable notifications to control context-switching.

"Break complex sessions into separate, clear tasks rather than attempting everything simultaneously. Provide agents with verification mechanisms to self-correct."

— Mitchell Hashimoto

Practical Takeaways

🎯 Task Isolation

One task per session. Don't overload context windows with unrelated work. Clear, focused prompts produce better results.

🧪 Verification

Give agents a way to check their own work. Tests, linters, type checkers—anything that provides automated feedback.

📓 Document Mistakes

When an agent makes a mistake, add it to CLAUDE.md/AGENTS.md. This compounds over time into increasingly reliable behavior.

♾ Always Have an Agent Running

Mitchell's principle: you should always have at least one agent working in the background. While you focus on one task, an agent researches the next, reviews previous work, or explores alternatives. Dead time is wasted compute.

11 — Putting It All Together

The Unified Workflow

Here is the optimized workflow combining all tools and techniques for maximum efficiency and quality.

Daytime: Active Development

Plan

Design

Use Plan Mode to define the approach. Use agent teams for research if the problem is complex.

→

Build

Implement

Claude Code in parallel worktrees via tmux. One feature per session.

→

Test

Verify

Run /verify. Give Claude the feedback loop. Fix issues in the same session.

→

Ship

PR + Review

Claude creates PR. Codex auto-reviews. Fix feedback. Human approves.

Background Agent

While working through the Plan → Build → Test → Ship cycle, always keep a background agent running on the next task, reviewing the last PR, or researching an upcoming feature. The goal: zero idle compute.

Overnight: Autonomous Development

PRD

Write Spec

Create PRD.md or prd.json with clear, testable requirements and verification steps.

→

Ralph

Start Loop

Ralph script in tmux. Claude iterates through stories autonomously.

→

Morning QA

Review git log, check completed stories. Codex reviews the overnight PR.

→

Fix

Polish

Address any review feedback. Merge and deploy.

The Complete Stack

Tool	Role	When
Claude Code	Primary implementation, planning, shipping	Active development hours
Codex	Code review, parallel task queues, QA	After every PR, parallel tasks
Ralph Loop	Autonomous iteration through PRD stories	Overnight, long-running features
Agent Teams	Parallel research, multi-module features	Complex tasks needing coordination
tmux + Worktrees	Session management, parallel isolation	Always (infrastructure layer)
CLAUDE.md / AGENTS.md	Institutional memory, mistake prevention	Always (knowledge layer)

The Meta-Rule

Verification is everything. Every workflow above depends on giving the AI a way to check its own work. Tests, linters, type checkers, build scripts—these aren't just quality tools, they're the feedback loop that makes autonomous development possible. Invest in your test suite and CI pipeline before anything else.

12 — References

Sources

This analysis is based on official documentation, creator insights, and practitioner experience.

Claude Code: Memory Claude Code: Skills Claude Code: Hooks Claude Code: Agent Teams Codex: AGENTS.md Guide Codex GitHub Action AGENTS.md (Open Standard) Boris Cherny: How I Use Claude Code (X/Twitter) How Boris Uses Claude Code (paddo.dev) Mitchell Hashimoto: AI Adoption Journey Ralph Claude Code (GitHub) Ralph Loop Setup Script (GitHub) Ralph Wiggum Technique (Awesome Claude)