Solution · Multi-agent build pipeline

IWO-DBOS

A Postgres-backed rebuild of the 6-agent build pipeline I use to develop ebatt.ai. Every agent step is durable, every human gate survives a reboot, and every LLM call is idempotent.

Why I rebuilt it on DBOS

The legacy daemon ran the same six agents in tmux panes against the same spec files. It worked — until it didn't. State lived in in-process Python objects and a .active-specs.json reconciliation file. A laptop reboot mid-spec meant manually replaying handoffs. A crash between "agent finished" and "state written" meant paying for a second claude -p call.

DBOS is a lightweight library — not a platform — that turns ordinary Python functions into crash-safe, replayable workflows backed by Postgres. No new orchestrator service to run. No vendor. Just rows in a database I can SELECT when something goes wrong.

Three properties the legacy daemon could not give me

Crash-safe state

Kill the daemon mid-spec and restart it — every in-flight agent resumes from its last completed step. No reconciliation file to hand-patch, no missed handoffs to re-drop.

Durable human gates

DBOS.recv(timeout_seconds=N) blocks a workflow for up to the timeout. The deadline and any buffered approval message live in Postgres, so the wait survives daemon restarts and laptop reboots.

Idempotent LLM dispatch

A step's output is checkpointed to Postgres; an already-written handoff on disk short-circuits re-dispatch. A crash in the narrow window between 'agent finished' and 'DBOS checkpointed' does not cost a second claude -p invocation.

The pipeline

Six agents, one spec, durable handoffs between every step. The workflow body is deterministic glue — every side effect (dispatch, handoff write, ops action, human gate) is a checkpointed step that DBOS can replay after a crash.

  1. 1

    Planner

    Decompose the spec, write a plan, hand off to Builder.

  2. 2

    Builder

    Implement against the plan, write code + tests.

  3. 3

    Reviewer

    Read the diff cold, raise blockers, request changes.

  4. 4

    Tester

    Run the suite, exercise edge cases, file failures.

  5. 5

    Deployer

    Ship the artifact, verify health, record version.

  6. 6

    Docs

    Update changelog, as-built, and operator docs.

What it actually ships for ebatt.ai

  • Specs survive reboots. Restart the daemon and every in-flight spec resumes from its last completed step. No .active-specs.json reconciliation.
  • Approvals can wait. A human gate set for tomorrow morning still fires tomorrow morning, even if the laptop sleeps overnight. The deadline lives in Postgres.
  • No double LLM spend. If the process dies right after Claude returns but before the handoff is written, the next run sees the cached step output and skips the re-dispatch.
  • State is queryable. Debugging a stuck spec is a few SELECTs, not parsing a black-box dashboard.

Status

  • Phase 0–2 complete. 9/9 tests passing. State engine, dispatcher abstraction, and human-gate primitives all in place.
  • Phase 3 in flight. Real claude -p dispatch via tmux panes; shadow-run against live ebatt.ai specs.
  • Scope boundary. Additive — runs alongside the legacy daemon against different handoff and directives roots. Cutover is repointing the launcher. Legacy stays in the tree as a rollback for at least one week of production use.

Want a durable build pipeline for your team?

If you're running Claude Code or another agentic build loop at any scale, the same crash-safety, human-gate, and idempotency properties apply. I can help you adopt the pattern.