Episodic AI Agents: A Reconciliation-Based Approach to Autonomous Software Development

Abstract

Traditional "genius agent" architectures maintain persistent context across long-running software development tasks, leading to context drift, token bloat, and unpredictable failures. We present RalphSharp, a system implementing episodic agents, stateless workers that wake, execute a single task, and terminate. State persists exclusively in the filesystem, enabling natural resumability and eliminating memory drift. The system uses a four-stage pipeline (EXPLORE → REFINE → PLAN → EXECUTE) orchestrated by a reconciliation loop that continuously moves actual state toward desired state, analogous to Kubernetes controllers.


1. Introduction

1.1 Problem Statement

Long-running AI agents that hold entire projects in context memory suffer from:

  1. Context Drift: Accumulating irrelevant context leads to hallucinations
  2. Token Bloat: Memory consumption grows unbounded until limits are hit
  3. Unpredictable Failures: Long sessions produce inconsistent behavior
  4. Non-Resumability: Interruptions lose all progress

1.2 Core Insight

Software development tasks are naturally decomposable into atomic, verifiable units. Rather than maintaining a long-lived "genius" agent, we propose dumb, episodic workers that:

  1. Wake with zero memory
  2. Read state from files
  3. Execute ONE task
  4. Terminate
  5. Loop restarts fresh

This transforms AI agent execution from a stateful process to a reconciliation loop.


2. Architecture

2.1 The Two Rooms Model

The system enforces strict separation of concerns through two isolated domains:

Room Role Access Operations
Clean Room (Thinker) Planning, Analysis Database, Memory Refine ideas, generate plans
Workshop (Doer/Ralph) Implementation Filesystem, Terminal Execute code, modify files

A StateSync Bridge provides unidirectional flow: Database → Filesystem. The Workshop never writes to the database; it only modifies files.

2.2 The Four-Stage Pipeline

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   EXPLORE   │───▶│   REFINE    │───▶│    PLAN     │───▶│   EXECUTE   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
  Understand         Clarify            Decompose         Implement
  codebase           requirements       into tasks        one task

Stage 1: EXPLORE. The agent analyzes codebase structure, conventions, and patterns. Stage 2: REFINE. A raw idea becomes a structured specification with scope boundaries. Stage 3: PLAN. The specification becomes an ordered task list with dependencies. Stage 4: EXECUTE. Loop through: pick a task, implement, verify, mark complete, repeat.

2.3 The Episode Lifecycle

┌──────────────────────────────────────────────────────────────┐
│                      EPISODE LIFECYCLE                        │
└──────────────────────────────────────────────────────────────┘

    RECONSTRUCT          PERCEIVE & ACT              PERSIST
    ┌─────────┐         ┌──────────────┐         ┌──────────┐
    │  Read   │────────▶│   Execute    │────────▶│  Write   │
    │  State  │         │   Task       │         │  State   │
    │  Files  │         │              │         │  Die     │
    └─────────┘         └──────────────┘         └──────────┘
         │                     │                       │
         │                     │                       │
    AGENTS.md              Modify code            PLAN.md [x]
    PLAN.md                Run tests             Source files
    specs/*.md             Mark complete         DISCOVERY.md

Each episode is atomic: the agent reads current state, performs one bounded operation, persists results, and terminates. No memory carries between episodes.


3. State Management

3.1 File-Based State Machine

State is encoded in markdown files using a simple marker syntax:

Marker Meaning Transition Rules
[ ] Pending [~] when started
[~] In Progress [x] on success, [!] on failure
[x] Complete Terminal state
[!] Needs Review Human intervention required
[S] Skipped After 3 consecutive failures

3.2 Derived Files

File Created By Purpose
AGENTS.md StateSync Project context, rules, conventions
PLAN.md PLAN stage Ordered task queue
DISCOVERY.md Any stage "I learned something that changes the plan"
IDEAS.md Any stage "I noticed something unrelated"

3.3 Resume Semantics

If PLAN.md exists with pending tasks, the pipeline skips to EXECUTE stage. This provides natural checkpoint/restart without explicit serialization.


4. Resilience Patterns

4.1 Transient Error Detection

Network hiccups and API rate limits are expected in long-running pipelines. The system detects transient errors via pattern matching:

  • "No messages returned" (API hiccup)
  • HTTP 429, 502, 503, 529 (rate limits, server errors)
  • "ETIMEDOUT", "ECONNRESET" (network issues)

Transient errors trigger retry with exponential backoff; they don't count toward failure limits.

4.2 Consecutive Failure Limit

If the same task fails 3 consecutive times with non-transient errors, it's marked [S] (Skipped) and the pipeline continues. This prevents infinite loops on fundamentally broken tasks.

4.3 Process Tree Management

On Windows, killing a process doesn't kill children. The system uses Process.Kill(entireProcessTree: true) to prevent zombie accumulation.


5. Design Principles

5.1 Simplicity Constraints

Principle Implementation
YAGNI No features until needed; no plugin system
KISS Prefer 10 lines over a dependency
One file > Two files Minimize file proliferation

5.2 Code Quality

Principle Implementation
DRY Extract patterns seen 3+ times
SRP One class/function = one responsibility
Composition > Inheritance Small composable pieces, not deep hierarchies

5.3 Reliability

Principle Implementation
Fail Fast Detect errors early, surface clearly
Defensive at Boundaries Validate input at edges, trust internals
Idempotency Operations safe to retry

5.4 Performance

Principle Implementation
Measure First No optimization without evidence
Big O Awareness Know algorithmic complexity
Lazy Evaluation Don't compute what isn't needed

6. Autonomous Execution Model

6.1 No-Pause Philosophy

Once started, Ralph runs to completion without human intervention:

  • Can't proceed → Mark [!], log to DISCOVERY.md, continue
  • Notice unrelated issue → Append to IDEAS.md, continue
  • Success → Mark [x], continue

6.2 Minimal Blast Radius

Each task specifies exactly which files may be modified. Touching files outside the task description is a red flag that halts execution.

6.3 Post-Execution Review

After completion, the system surfaces: - All [!] items requiring human review - All entries in DISCOVERY.md - All entries in IDEAS.md

Human reviews, adjusts, and reruns if needed.


7. Quality Bar

A valid implementation must satisfy:

  1. Compiles without warnings
  2. No new dependencies (exception: Terminal.Gui v2 for TUI)
  3. Each prompt fits one screen (~50 lines)
  4. Code is obvious without comments

Red Flags (Stop and Ask)

  • Adding > 10 new files
  • Creating interfaces before implementations
  • "Service" or "Manager" class names
  • Configuration for single-choice options
  • Modifying files not in task description

8. Comparison to Related Work

System Memory Model Task Granularity State Persistence
AutoGPT Persistent context Arbitrary In-memory
BabyAGI Persistent context Fine-grained Vector DB
MetaGPT Role-based context Role-dependent Mixed
RalphSharp Episodic (zero) One task Filesystem

The key differentiator is episodic execution with filesystem-based state, which provides: - Deterministic resumability - Zero context drift - Natural debugging (just read the files) - Human-readable state at all times


9. Conclusion

RalphSharp demonstrates that dumb, episodic agents can outperform "genius" long-lived agents for software development tasks by:

  1. Eliminating context drift through memory-free episodes
  2. Providing natural resumability through file-based state
  3. Enabling human oversight through readable markdown artifacts
  4. Reducing complexity through strict architectural boundaries

The reconciliation loop pattern, borrowed from infrastructure orchestration, proves effective for AI agent orchestration: define desired state, let agents reconcile actual toward desired, one atomic step at a time.


Appendix A: Configuration Reference

new LoopController(
    workspacePath: "path/to/project",
    timeout: TimeSpan.FromMinutes(10),    // Per episode
    maxTurnsPerEpisode: 20,               // Claude CLI turns
    maxEpisodes: 50                        // Total episodes
);

Appendix B: CLI Reference

dotnet run "Your idea here"              # Full pipeline
dotnet run -w /path/to/project "idea"    # Specify workspace
dotnet run --no-tui "idea"               # Raw ANSI mode (no TUI)

Document generated from RalphSharp project artifacts, January 2026