Episodic AI Agents: Reconciliation-Based Autonomous Development

Abstract

Traditional "genius agent" architectures maintain persistent context across long-running software development tasks, leading to context drift, token bloat, and unpredictable failures. We present RalphSharp, a system implementing episodic agents, stateless workers that wake, execute a single task, and terminate. State persists exclusively in the filesystem, enabling natural resumability and eliminating memory drift. The system uses a four-stage pipeline (EXPLORE → REFINE → PLAN → EXECUTE) orchestrated by a reconciliation loop that continuously moves actual state toward desired state, analogous to Kubernetes controllers.

1. Introduction

1.1 Problem Statement

Long-running AI agents that hold entire projects in context memory suffer from:

Context Drift: Accumulating irrelevant context leads to hallucinations
Token Bloat: Memory consumption grows unbounded until limits are hit
Unpredictable Failures: Long sessions produce inconsistent behavior
Non-Resumability: Interruptions lose all progress

1.2 Core Insight

Software development tasks are naturally decomposable into atomic, verifiable units. Rather than maintaining a long-lived "genius" agent, we propose dumb, episodic workers that:

Wake with zero memory
Read state from files
Execute ONE task
Terminate
Loop restarts fresh

This transforms AI agent execution from a stateful process to a reconciliation loop.

2. Architecture

2.1 The Two Rooms Model

The system enforces strict separation of concerns through two isolated domains:

Room	Role	Access	Operations
Clean Room (Thinker)	Planning, Analysis	Database, Memory	Refine ideas, generate plans
Workshop (Doer/Ralph)	Implementation	Filesystem, Terminal	Execute code, modify files

A StateSync Bridge provides unidirectional flow: Database → Filesystem. The Workshop never writes to the database; it only modifies files.

2.2 The Four-Stage Pipeline

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   EXPLORE   │───▶│   REFINE    │───▶│    PLAN     │───▶│   EXECUTE   │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
  Understand         Clarify            Decompose         Implement
  codebase           requirements       into tasks        one task

Stage 1: EXPLORE. The agent analyzes codebase structure, conventions, and patterns. Stage 2: REFINE. A raw idea becomes a structured specification with scope boundaries. Stage 3: PLAN. The specification becomes an ordered task list with dependencies. Stage 4: EXECUTE. Loop through: pick a task, implement, verify, mark complete, repeat.

2.3 The Episode Lifecycle

┌──────────────────────────────────────────────────────────────┐
│                      EPISODE LIFECYCLE                        │
└──────────────────────────────────────────────────────────────┘

    RECONSTRUCT          PERCEIVE & ACT              PERSIST
    ┌─────────┐         ┌──────────────┐         ┌──────────┐
    │  Read   │────────▶│   Execute    │────────▶│  Write   │
    │  State  │         │   Task       │         │  State   │
    │  Files  │         │              │         │  Die     │
    └─────────┘         └──────────────┘         └──────────┘
         │                     │                       │
         │                     │                       │
    AGENTS.md              Modify code            PLAN.md [x]
    PLAN.md                Run tests             Source files
    specs/*.md             Mark complete         DISCOVERY.md

Each episode is atomic: the agent reads current state, performs one bounded operation, persists results, and terminates. No memory carries between episodes.

3. State Management

3.1 File-Based State Machine

State is encoded in markdown files using a simple marker syntax:

Marker	Meaning	Transition Rules
`[ ]`	Pending	→ `[~]` when started
`[~]`	In Progress	→ `[x]` on success, `[!]` on failure
`[x]`	Complete	Terminal state
`[!]`	Needs Review	Human intervention required
`[S]`	Skipped	After 3 consecutive failures

3.2 Derived Files

File	Created By	Purpose
`AGENTS.md`	StateSync	Project context, rules, conventions
`PLAN.md`	PLAN stage	Ordered task queue
`DISCOVERY.md`	Any stage	"I learned something that changes the plan"
`IDEAS.md`	Any stage	"I noticed something unrelated"

3.3 Resume Semantics

If PLAN.md exists with pending tasks, the pipeline skips to EXECUTE stage. This provides natural checkpoint/restart without explicit serialization.

4. Resilience Patterns

4.1 Transient Error Detection

Network hiccups and API rate limits are expected in long-running pipelines. The system detects transient errors via pattern matching:

"No messages returned" (API hiccup)
HTTP 429, 502, 503, 529 (rate limits, server errors)
"ETIMEDOUT", "ECONNRESET" (network issues)

Transient errors trigger retry with exponential backoff; they don't count toward failure limits.

4.2 Consecutive Failure Limit

If the same task fails 3 consecutive times with non-transient errors, it's marked [S] (Skipped) and the pipeline continues. This prevents infinite loops on fundamentally broken tasks.

4.3 Process Tree Management

On Windows, killing a process doesn't kill children. The system uses Process.Kill(entireProcessTree: true) to prevent zombie accumulation.

5. Design Principles

5.1 Simplicity Constraints

Principle	Implementation
YAGNI	No features until needed; no plugin system
KISS	Prefer 10 lines over a dependency
One file > Two files	Minimize file proliferation

5.2 Code Quality

Principle	Implementation
DRY	Extract patterns seen 3+ times
SRP	One class/function = one responsibility
Composition > Inheritance	Small composable pieces, not deep hierarchies

5.3 Reliability

Principle	Implementation
Fail Fast	Detect errors early, surface clearly
Defensive at Boundaries	Validate input at edges, trust internals
Idempotency	Operations safe to retry

5.4 Performance

Principle	Implementation
Measure First	No optimization without evidence
Big O Awareness	Know algorithmic complexity
Lazy Evaluation	Don't compute what isn't needed

6. Autonomous Execution Model

6.1 No-Pause Philosophy

Once started, Ralph runs to completion without human intervention:

Can't proceed → Mark [!], log to DISCOVERY.md, continue
Notice unrelated issue → Append to IDEAS.md, continue
Success → Mark [x], continue

6.2 Minimal Blast Radius

Each task specifies exactly which files may be modified. Touching files outside the task description is a red flag that halts execution.

6.3 Post-Execution Review

After completion, the system surfaces: - All [!] items requiring human review - All entries in DISCOVERY.md - All entries in IDEAS.md

Human reviews, adjusts, and reruns if needed.

7. Quality Bar

A valid implementation must satisfy:

Compiles without warnings
No new dependencies (exception: Terminal.Gui v2 for TUI)
Each prompt fits one screen (~50 lines)
Code is obvious without comments

Red Flags (Stop and Ask)

Adding > 10 new files
Creating interfaces before implementations
"Service" or "Manager" class names
Configuration for single-choice options
Modifying files not in task description

8. Comparison to Related Work

System	Memory Model	Task Granularity	State Persistence
AutoGPT	Persistent context	Arbitrary	In-memory
BabyAGI	Persistent context	Fine-grained	Vector DB
MetaGPT	Role-based context	Role-dependent	Mixed
RalphSharp	Episodic (zero)	One task	Filesystem

The key differentiator is episodic execution with filesystem-based state, which provides: - Deterministic resumability - Zero context drift - Natural debugging (just read the files) - Human-readable state at all times

9. Conclusion

RalphSharp demonstrates that dumb, episodic agents can outperform "genius" long-lived agents for software development tasks by:

Eliminating context drift through memory-free episodes
Providing natural resumability through file-based state
Enabling human oversight through readable markdown artifacts
Reducing complexity through strict architectural boundaries

The reconciliation loop pattern, borrowed from infrastructure orchestration, proves effective for AI agent orchestration: define desired state, let agents reconcile actual toward desired, one atomic step at a time.

Appendix A: Configuration Reference

new LoopController(
    workspacePath: "path/to/project",
    timeout: TimeSpan.FromMinutes(10),    // Per episode
    maxTurnsPerEpisode: 20,               // Claude CLI turns
    maxEpisodes: 50                        // Total episodes
);

Appendix B: CLI Reference

dotnet run "Your idea here"              # Full pipeline
dotnet run -w /path/to/project "idea"    # Specify workspace
dotnet run --no-tui "idea"               # Raw ANSI mode (no TUI)

Document generated from RalphSharp project artifacts, January 2026

Episodic AI Agents: A Reconciliation-Based Approach to Autonomous Software Development