Abstract
Traditional "genius agent" architectures maintain persistent context across long-running software development tasks, leading to context drift, token bloat, and unpredictable failures. We present RalphSharp, a system implementing episodic agents, stateless workers that wake, execute a single task, and terminate. State persists exclusively in the filesystem, enabling natural resumability and eliminating memory drift. The system uses a four-stage pipeline (EXPLORE → REFINE → PLAN → EXECUTE) orchestrated by a reconciliation loop that continuously moves actual state toward desired state, analogous to Kubernetes controllers.
1. Introduction
1.1 Problem Statement
Long-running AI agents that hold entire projects in context memory suffer from:
- Context Drift: Accumulating irrelevant context leads to hallucinations
- Token Bloat: Memory consumption grows unbounded until limits are hit
- Unpredictable Failures: Long sessions produce inconsistent behavior
- Non-Resumability: Interruptions lose all progress
1.2 Core Insight
Software development tasks are naturally decomposable into atomic, verifiable units. Rather than maintaining a long-lived "genius" agent, we propose dumb, episodic workers that:
- Wake with zero memory
- Read state from files
- Execute ONE task
- Terminate
- Loop restarts fresh
This transforms AI agent execution from a stateful process to a reconciliation loop.
2. Architecture
2.1 The Two Rooms Model
The system enforces strict separation of concerns through two isolated domains:
| Room | Role | Access | Operations |
|---|---|---|---|
| Clean Room (Thinker) | Planning, Analysis | Database, Memory | Refine ideas, generate plans |
| Workshop (Doer/Ralph) | Implementation | Filesystem, Terminal | Execute code, modify files |
A StateSync Bridge provides unidirectional flow: Database → Filesystem. The Workshop never writes to the database; it only modifies files.
2.2 The Four-Stage Pipeline
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ EXPLORE │───▶│ REFINE │───▶│ PLAN │───▶│ EXECUTE │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Understand Clarify Decompose Implement
codebase requirements into tasks one task
Stage 1: EXPLORE. The agent analyzes codebase structure, conventions, and patterns. Stage 2: REFINE. A raw idea becomes a structured specification with scope boundaries. Stage 3: PLAN. The specification becomes an ordered task list with dependencies. Stage 4: EXECUTE. Loop through: pick a task, implement, verify, mark complete, repeat.
2.3 The Episode Lifecycle
┌──────────────────────────────────────────────────────────────┐
│ EPISODE LIFECYCLE │
└──────────────────────────────────────────────────────────────┘
RECONSTRUCT PERCEIVE & ACT PERSIST
┌─────────┐ ┌──────────────┐ ┌──────────┐
│ Read │────────▶│ Execute │────────▶│ Write │
│ State │ │ Task │ │ State │
│ Files │ │ │ │ Die │
└─────────┘ └──────────────┘ └──────────┘
│ │ │
│ │ │
AGENTS.md Modify code PLAN.md [x]
PLAN.md Run tests Source files
specs/*.md Mark complete DISCOVERY.md
Each episode is atomic: the agent reads current state, performs one bounded operation, persists results, and terminates. No memory carries between episodes.
3. State Management
3.1 File-Based State Machine
State is encoded in markdown files using a simple marker syntax:
| Marker | Meaning | Transition Rules |
|---|---|---|
[ ] |
Pending | → [~] when started |
[~] |
In Progress | → [x] on success, [!] on failure |
[x] |
Complete | Terminal state |
[!] |
Needs Review | Human intervention required |
[S] |
Skipped | After 3 consecutive failures |
3.2 Derived Files
| File | Created By | Purpose |
|---|---|---|
AGENTS.md |
StateSync | Project context, rules, conventions |
PLAN.md |
PLAN stage | Ordered task queue |
DISCOVERY.md |
Any stage | "I learned something that changes the plan" |
IDEAS.md |
Any stage | "I noticed something unrelated" |
3.3 Resume Semantics
If PLAN.md exists with pending tasks, the pipeline skips to EXECUTE stage. This provides natural checkpoint/restart without explicit serialization.
4. Resilience Patterns
4.1 Transient Error Detection
Network hiccups and API rate limits are expected in long-running pipelines. The system detects transient errors via pattern matching:
- "No messages returned" (API hiccup)
- HTTP 429, 502, 503, 529 (rate limits, server errors)
- "ETIMEDOUT", "ECONNRESET" (network issues)
Transient errors trigger retry with exponential backoff; they don't count toward failure limits.
4.2 Consecutive Failure Limit
If the same task fails 3 consecutive times with non-transient errors, it's marked [S] (Skipped) and the pipeline continues. This prevents infinite loops on fundamentally broken tasks.
4.3 Process Tree Management
On Windows, killing a process doesn't kill children. The system uses Process.Kill(entireProcessTree: true) to prevent zombie accumulation.
5. Design Principles
5.1 Simplicity Constraints
| Principle | Implementation |
|---|---|
| YAGNI | No features until needed; no plugin system |
| KISS | Prefer 10 lines over a dependency |
| One file > Two files | Minimize file proliferation |
5.2 Code Quality
| Principle | Implementation |
|---|---|
| DRY | Extract patterns seen 3+ times |
| SRP | One class/function = one responsibility |
| Composition > Inheritance | Small composable pieces, not deep hierarchies |
5.3 Reliability
| Principle | Implementation |
|---|---|
| Fail Fast | Detect errors early, surface clearly |
| Defensive at Boundaries | Validate input at edges, trust internals |
| Idempotency | Operations safe to retry |
5.4 Performance
| Principle | Implementation |
|---|---|
| Measure First | No optimization without evidence |
| Big O Awareness | Know algorithmic complexity |
| Lazy Evaluation | Don't compute what isn't needed |
6. Autonomous Execution Model
6.1 No-Pause Philosophy
Once started, Ralph runs to completion without human intervention:
- Can't proceed → Mark
[!], log toDISCOVERY.md, continue - Notice unrelated issue → Append to
IDEAS.md, continue - Success → Mark
[x], continue
6.2 Minimal Blast Radius
Each task specifies exactly which files may be modified. Touching files outside the task description is a red flag that halts execution.
6.3 Post-Execution Review
After completion, the system surfaces:
- All [!] items requiring human review
- All entries in DISCOVERY.md
- All entries in IDEAS.md
Human reviews, adjusts, and reruns if needed.
7. Quality Bar
A valid implementation must satisfy:
- Compiles without warnings
- No new dependencies (exception: Terminal.Gui v2 for TUI)
- Each prompt fits one screen (~50 lines)
- Code is obvious without comments
Red Flags (Stop and Ask)
- Adding > 10 new files
- Creating interfaces before implementations
- "Service" or "Manager" class names
- Configuration for single-choice options
- Modifying files not in task description
8. Comparison to Related Work
| System | Memory Model | Task Granularity | State Persistence |
|---|---|---|---|
| AutoGPT | Persistent context | Arbitrary | In-memory |
| BabyAGI | Persistent context | Fine-grained | Vector DB |
| MetaGPT | Role-based context | Role-dependent | Mixed |
| RalphSharp | Episodic (zero) | One task | Filesystem |
The key differentiator is episodic execution with filesystem-based state, which provides: - Deterministic resumability - Zero context drift - Natural debugging (just read the files) - Human-readable state at all times
9. Conclusion
RalphSharp demonstrates that dumb, episodic agents can outperform "genius" long-lived agents for software development tasks by:
- Eliminating context drift through memory-free episodes
- Providing natural resumability through file-based state
- Enabling human oversight through readable markdown artifacts
- Reducing complexity through strict architectural boundaries
The reconciliation loop pattern, borrowed from infrastructure orchestration, proves effective for AI agent orchestration: define desired state, let agents reconcile actual toward desired, one atomic step at a time.
Appendix A: Configuration Reference
new LoopController(
workspacePath: "path/to/project",
timeout: TimeSpan.FromMinutes(10), // Per episode
maxTurnsPerEpisode: 20, // Claude CLI turns
maxEpisodes: 50 // Total episodes
);
Appendix B: CLI Reference
dotnet run "Your idea here" # Full pipeline
dotnet run -w /path/to/project "idea" # Specify workspace
dotnet run --no-tui "idea" # Raw ANSI mode (no TUI)
Document generated from RalphSharp project artifacts, January 2026