Autonomous AI Agents for Software Development: How We Built a Multi-Agent AI System with Claude Code

Artem Rozumenko

Chief Technologist, Testing

DATE

Apr 21, 2026

AI coding assistants are powerful but they're solo players. They can write functions, fix bugs, and autocomplete your thoughts. What they don't do is think in systems. These AI agents struggle with handoffs, drop context between sessions, and have no concept of whether the feature even makes sense before implementation starts.

That's the gap we set out to close and build autonomous AI agents that work the way real software teams do: a PM that breaks down requirements, a BA to write user stories, developers that implement in parallel, and QA to catch regressions. All of these agents will run as separate Claude Code processes, communicate through a shared queue, and report progress back to you via Telegram.

Here is how we designed the AI development team, the failures we hit, and exactly how you can set up your own Claude Code multi-agent setup.

*This is a demo setup for quick start: tmux dashboard, visible browser windows, interactive supervisor prompt. It's designed for verifying the pipeline works and tuning role behavior on your codebase. Production runs are headless. The supervisor polls taskbox and GitHub in the background. Issues get assigned, teams spin up per feature, workers scale horizontally. The same roles, the same pipeline, but fully autonomous.

Claude code multi-agent architecture: Roles, sessions, and a shared task queue

Each role is a separate Claude Code process with its own personality, instructions, and persistent memory. They communicate through a SQLite message queue (taskbox) and document everything on GitHub Issues.

   Octobots tmux layout

You (Telegram)
│ send-keys
▼
tmux "octobots" ─── 6 Claude Code sessions, tiled
├── 📋 pm ← receives your messages, distributes work
├── 📝 ba ← breaks goals into user stories
├── 🏗️ tl ← decomposes stories into tasks
├── 🐍 py ← Python backend development
├── ⚡ js ← JS/TS frontend development
└── 🧪 qa ← testing and verification

1. Build a reliable communication between AI Claude agents

The first question was deceptively simple: how do two Claude Code processes actually talk to each other?

We went through the obvious candidates:

MCP servers are elegant but too many corporate networks block them.
WebSockets felt like bringing a crane to move a couch.

So I went with the most basic solution that would actually work: a SQLite database with a thin Python CLI on top.

What we landed on was almost embarrassingly low-tech: a SQLite database with a thin Python CLI wrapper. Now we didn’t have to deal with multiple servers, daemons, or external dependencies. It’s just sqlite3 from Python’s standard library, running in WAL mode so multiple agents can read and write at the same time.

That’s what became Taskbox.

   Taskbox Relay CLI

# Send a message
python3 octobots/skills/taskbox/scripts/relay.py send 
  --from py 
  --to qa 
  "Issue #103 fixed. PR #45 ready for testing."

# Check inbox
python3 octobots/skills/taskbox/scripts/relay.py inbox 
  --id qa

# Acknowledge
python3 octobots/skills/taskbox/scripts/relay.py ack 
  MSG_ID 
  "All tests pass."

2. Define the AI agent roles for Claude Code (with personality, instructions, and persistent memory)

A Claude Code instance becomes a "role" through three files:

   roles/python-dev/

roles/python-dev/
├── SOUL.md      ← personality, voice, quirks
├── CLAUDE.md    ← technical instructions, workflow, constraints
└── .claude/
    ├── skills/  ← shared capabilities (symlinked)
    └── agents/  ← sub-agents (symlinked)

SOUL.md, inspired by OpenClaw’s SOUL.md, is where things get interesting. This is where each role gets its personality.

Py is calm and methodical, with a bit of dry humor. Runs py_compile almost out of habit. If tests pass on the first try, something feels off.
Jay is energetic and a little opinionated. Says “shipped” after deploys. Notices immediately when someone uses any.
Sage is all about evidence. Takes screenshots for everything and calls flaky tests “trust erosion” and doesn’t let them slide.
Max is sharp and focused. Keeps track of tasks, spots scope creep early, and doesn’t miss it.

The personality changes how the agent communicates. A QA agent that cares about evidence will write much better bug reports than something generic.

3. Create the development pipeline: From user story to verified PR

Work flows through the team like a real development process:

   Workflow Execution Trace

1. You → pm: "We need user authentication"

2. pm → ba: "Analyze auth requirements"

3. ba creates epic + user stories with acceptance criteria

4. ba → tl: "Stories ready for decomposition"

5. tl reads the codebase, designs interfaces, creates tasks

6. tl → pm: "6 tasks, 3 parallel groups, 1 risk"

7. pm → py: "TASK-003 (#103): login endpoint"

8. pm → js: "TASK-004 (#104): login form"

9. py & js work in parallel (isolated git worktrees)

10. py → pm: "Done. PR #45."

11. pm → qa: "Verify #103"

12. qa tests, finds bug, reports on GitHub issue

13. qa → pm → py: "Bug: uppercase email causes 500"

14. py fixes, qa re-verifies

15. All roles → You (Telegram): status updates throughout

4. Orchestrate parallel AI coding agents with a rich terminal dashboard

The “supervisor” TUI manages all workers from a single terminal:

octobots/supervisor.sh

   Octobots CLI

octobots> /status       ← worker states + last output
octobots> /health       ← system health check
octobots> /tasks        ← taskbox queue stats
octobots> /logs py 20   ← last 20 lines from py's pane
octobots> /bridge       ← start Telegram bridge
octobots> /restart qa   ← relaunch QA
octobots> /board        ← team whiteboard

5. Build a mobile command via Telegram to control your agents and route tasks to AI

You don’t have to sit in front of tmux to use it. You can just talk to the team from your phone. Telegram is the default bridge, but it’s not tied to it. The same setup works with Slack, Discord, Teams, anything that can act as a simple connector.

   Agent workflow log

You: @py fix the login bug in authService.ts
→ 🐍 py

[py] Started. Reading authService.ts...
[py] Found it — missing null check on line 47. Fixing.
[py] Done. PR #52. All tests pass.

You: @qa verify PR #52
→ 🧪 qa

[qa] Testing login flow...
[qa] ✅ Verified. Login works with all email formats.

Reply routing works naturally. Swipe-reply to a `[py]` message and it goes back to py:

Short aliases: @pm, @ba, @tl, @py, @js, @qa — two letters, fast to type.

6. Solve context blowup in autonomous AI coding workflows with named sessions

Each GitHub issue maps to a Claude Code named session:

   Session Context Switching

├── Issue #103 
    → session "py-issue-103"  
    ← full context preserved

├── Issue #107 
    → session "py-issue-107"  
    ← separate context

└── Back to #103 
    → /resume py-issue-103  
    ← context restored

The supervisor switches sessions via /resume. Each task gets its own context window. No blowup from accumulating unrelated work. Come back to an issue a week later with full context restored.

7. Eliminate merge conflicts in multi-agent Claude Code with per-worker isolation

Code-writing roles get isolated environments with their own repo clones:

   Worker Directory Structure

octobots/workers/
├── py/
│   ├── core/      
│   │   ← own git clone, own branch
│   ├── services/  
│   │   ← own clones
│   ├── venv → shared  
│   │   ← symlinked dependencies
│   └── .mcp.json  
│       ← own Playwright browser

├── js/
│   └── ... ← same structure

└── qa/
    └── ... ← same structure

Each developer works in their own directory, on their own branch. So they’re not stepping on each other while coding. Integration only happens at the PR stage. The workers also get their own browser instance for Playwright testing. So one agent navigating somewhere doesn’t suddenly break another agent’s test.

8. Fix Git worktree limitations in multi-repo setups with full Clones for AI agents

Most production projects don’t live in a single repo. My test project — OneTest — had 14 of them: a React frontend, 8 Python services, a few shared libraries, test automation, and deployment configs.

All of it sat in one workspace without root git repo tying things together:

git worktree add .worktrees/py-issue-42 ← fails: workspace root isn't a git repo

So the usual approach with git worktrees just didn’t work. Worktrees assume a single repository. Here, there was nothing at the root to branch from. I tried it anyway. The supervisor logged worktree creation failed, and workers couldn’t even start.

So we dropped that idea and went with something simpler: full workspace clones per worker.

During init-project.sh, the setup script walks through all the repos in the workspace and clones each one into the worker’s own directory.

init-project.sh

# What init-project.sh does for each code worker

for repo in core services/gateway services/membership ...; do

  git clone 
  $(git -C $repo remote get-url origin) 
  .octobots/workers/py/$repo

done

The result? Each worker gets its own complete copy of the workspace. Every repo is independently branchable. Shared pieces like the venv, database, and .env are just symlinked instead of duplicated.

Shared (one copy):
├── venv/
├── PostgreSQL
├── .env
├── .mcp.json
└── octobots/


Isolated (per worker):
├── core/       ← own clone
├── services/*/ ← own clones
├── lib/*/      ← own clones
└── .mcp.json  ← own Playwright browser

Why not just use branches in shared repos? Because that only solves versioning, not isolation. Two agents editing the same file in the same directory don’t create merge conflicts, instead overwrite each other. Branches don’t protect you from that.

The trade-off is disk space. Each worker clones all 14 repos. In this case, it came out to around 200MB per worker. With three workers, that’s about 600MB total. On a modern machine, that’s not something you notice.

The setup takes about a minute, and after that, each worker runs completely isolated.

For runtime, workers share the same development database. That works because most features touch different data anyway. Each worker gets its own port through .env.worker, so they can run independently.

The virtual environment is shared as well, so there’s no need to reinstall dependencies unless a branch changes them.

9. Ensure a traceable audit trial with GitHub issues as source of truth

Every meaningful action gets a comment on the relevant GitHub issue:

   Octobots Activity Log

octobotsai [bot]
│ 📋 [pm] Assigned to py. 
│ Priority: high.

octobotsai [bot]
│ 🐍 [py] Started. 
│ Approach: using existing auth middleware.

octobotsai [bot]
│ 🐍 [py] Done. 
│ PR #45. 
│ JWT login + rate limiting. 
│ All tests pass.

octobotsai [bot]
│ 🧪 [qa] Testing: 
│ • Login
│ • Invalid creds
│ • Rate limit
│ • Token expiry

octobotsai [bot]
│ 🧪 [qa] 
│ ✅ Verified. 
│ All scenarios pass.

All posted by the octobotsai GitHub App with full traceability and zero manual documentation.

10. Keep parallel AI coding agents aligned with a shared state layer

Teams need shared state beyond tickets. BOARD.md is the team's whiteboard with all roles read and written:

   Work Coordination


Active Work

ROLE   TASK             ISSUE   STATUS
────   ───────────────  ──────  ──────────────

py     Login endpoint     #103   PR submitted
js     Login form         #104   in progress


Decisions

• JWT over sessions (microservice architecture)  
  → decided by tl


Blockers

• qa blocked on #103  
  waiting for py's PR to merge


Shared Findings

• Auth middleware is deprecated  
  found by py → flagged for tl

Framework vs runtime: How Octobots keeps multi-agent Claude Code upgradeable

You can re-run install.sh or just do a git pull in octobots/ anytime. It’s safe, since that folder only holds the framework.

All your project-specific stuff stays in .octobots/ and .claude/, and those don’t get touched.

If you want to change how an agent behaves, just copy it into .octobots/roles/<role>/ and edit it there.

   Octobots Architecture Layout

Framework (read-only, version-controlled)

octobots/ 
← framework (git pull for updates, read-only)

├── roles/     
← base templates

├── skills/    
← 10 shared skills

├── shared/    
← conventions, agents

└── scripts/   
← supervisor, bridge, relay

Runtime (project-specific, writable)

.octobots/ 
← runtime (project-specific, workers write here)

├── board.md       
← team whiteboard

├── memory/py.md   
← Py remembers across sessions

├── workers/py/    
← isolated repo clones

├── roles/         
← override base roles

└── relay.db       
← taskbox database

11. Give your AI development team Its own GitHub identity with a custom bot app

All comments show up as octobotsai [bot]. Each role prefixes its messages:

   Agent Activity Log


octobotsai [bot]
→ 📋 [pm] 
Bug #41 received. 
Routed to tech lead.


octobotsai [bot]
→ 🏗️ [tl] 
Root cause: 
fetchTags reducer 
doesn't unwrap 
.items. 
One-line fix.


octobotsai [bot]
→ 🐍 [py] 
Fix applied. 
PR #45


octobotsai [bot]
→ 🧪 [qa] 
✅ Verified. 
Tag autocomplete working.

Setting it up takes 5 minutes: create a GitHub App, generate a private key, add credentials to .env.octobots. The supervisor injects GH_TOKEN into every worker automatically.

GitHub projects as a visual pipeline for task routing

Issues assigned to octobotsai[bot] get picked up automatically. A GitHub Projects board tracks the pipeline visually:

   Workflow Pipeline


📥 Inbox     │ 🔍 Triage │ 🐍 Dev: py │ ⚡ Dev: js │ 🧪 Testing │ ✅ Done
──────────────┼────────────┼────────────┼────────────┼────────────┼────────────

#45 new bug  │ #42 pm     │ #41 py     │ #43 js     │ #39 qa     │ #38 ✓
#46 feature  │            │            │            │            │ #37 ✓

The GitHub bridge polls the board, routes new inbox items to PM, and syncs column moves as workers progress. Drag a card manually? The bridge picks it up and notifies the right role.

Setting up for a new project: Deploy the Claude Code AI agents in under 6 commands

   Octobots Quick Start

# 1. Clone octobots into your project
git clone [email protected]:arozumenko/octobots.git octobots

# 2. Install dependencies
pip install -r octobots/requirements.txt

# 3. Initialize runtime directory (creates .octobots/, clones repos for workers)
octobots/scripts/init-project.sh

# 4. Run the scout (explores codebase, generates AGENTS.md)
octobots/start.sh scout

# 5. Start the team
octobots/supervisor.sh
octobots> /bridge
octobots> /health

# 6. Talk to Max via Telegram
"Hey Max, we need to fix the tag autocomplete bug in issue #41"

Inspiration & prior art: Multi-agent AI development tools that shaped Octobots

Multi-agent coding is moving fast. Octobots isn’t the first to try this, and a lot of the ideas here come from projects that were already exploring the space:

OpenClaw SOUL.md introduced the idea of a personality file. “Your agent reads itself into being.” This is something we used directly.
claude-squad showed that a tmux setup can actually work for running multiple Claude instances in parallel.
Composio Agent Orchestrator focused on parallel coding agents with worktrees, CI fixes, and code reviews. Probably the closest in spirit.
Agentrooms takes a desktop app approach, with @mentions routing and a more polished UI.
CrewAI popularized the idea of agents with defined roles.
Anthropic’s work ran 16 parallel Claudes to build a C compiler. That proved the scale is real.
GitHub Agentic Workflows shows where this is heading, with agents embedded directly into the SDLC.

The main inspiration was OpenClaw, but we wanted something simpler. No framework to learn, no SDK to wire up. Just markdown files, bash scripts, and a SQLite queue. The whole setup runs on a single Claude Code Max+ subscription. That’s enough to run a team of six agents on a real project, continuously. In practice, it ends up costing less than a junior developer for a day.

Where Octobots is different: it’s not a framework. It’s just config and scripts. You can open any file and understand what it’s doing. It’s messenger-first, works across multiple repos, and keeps a clean separation between the framework and runtime so updates don’t break your project setup.

The stack behind our Claude Code team

Component	Technology	Why
Agents	Claude Code (Opus)	Full coding capability + tool use
Communication	SQLite (taskbox)	Zero deps, concurrent safe, dead simple
Orchestration	Python + Rich TUI	Interactive supervisor with slash commands
Panes	tmux	Tiled dashboard, detach/attach, themed borders
User interface	Telegram Bot API	Mobile-friendly, reply routing, @aliases
Audit trail	GitHub Issues + App	`octobotsai[bot]` identity, full traceability
Task tracking	GitHub Projects v2	Visual board, column-based pipeline
Skills	agentskills.io format	10 skills, cross-tool compatible
Isolation	Git clones per worker	Parallel devs, own browsers, zero conflicts
Auth	GitHub App + JWT	Bot identity, auto-rotating tokens

*Built with Claude Code. Orchestrated by Octobots. Tested on a real project with 14 repositories and 8 microservices.

8 hard lessons from running a multi-agent team on a real project

Personality matters: A QA agent that’s “evidence-obsessed” and takes screenshots for everything writes much better bug reports than something with generic instructions.
“Act, don’t ask” has to be enforced: Early on, the PM kept asking things like “should I route this?” which slowed everything down. The fix was simple: acting autonomously became a core rule.
Every message needs a response: Agents would finish work but never acknowledge it. The pipeline would stall. Now every task ends the same way: comment on the issue, ack in taskbox, notify the user.
Testing can’t be optional: If you don’t force it, agents skip it. Every coding role now has an explicit “you must test your changes” step.
Session-per-ticket just works: Context buildup was a real concern. Mapping each GitHub issue to its own Claude session solved it cleanly.
SQLite is enough for IPC: I tried thinking through MCP, WebSockets, Redis. None of it was necessary. A single SQLite file with WAL mode handled multiple agents without issues.
Bot identity makes a difference: If everything shows up under your personal account, it’s hard to tell what’s human and what’s not. A GitHub App gives the system its own identity and makes the audit trail clear.
Deduplication matters more than you think: The same task can come from Telegram, GitHub, or taskbox at the same time. Using GitHub labels as the source of truth, plus a simple “check before starting” rule, keeps things from duplicating.

The stack itself is simple, and you can build a similar setup with Claude Code or any agent stack you prefer. If you want to skip the setup and focus on outcomes, EPAM AI/Run can handle the orchestration, infra, and workflows for you.

Explore how to build your own autonomous development team Explore how to build your own autonomous development team

Frequent Searches

Autonomous AI Agents for Software Development: How We Built a Multi-Agent AI System with Claude Code

CATEGORY

Artem Rozumenko

DATE