Harness & Agent Frameworks

Three harnesses to test: Pi, OpenCode, and Gastown

The Three Candidates

HarnessArchitectureCeilingRiskLocal Model Support
Pi Coding Agent Single-agent CLI, extensible High Dev created company OpenAI-compatible API (llama.cpp server)
OpenCode Needs investigation Unknown Less ecosystem Needs testing
Gastown Multi-agent orchestration (20-30 agents) Highest Even SOTA LLMs struggle Coordinates multiple agent instances

1. Pi Coding Agent

github.com/badlogic/pi-mono — by Mario Zechner (@badlogic)

An open-source agentic coding CLI similar to Claude Code. Monorepo with packages for agent core, coding CLI, TUI, web UI, Slack bot, and GPU pod management.

Key Features

Local Model Setup

# Start llama.cpp server
llama-server -hf unsloth/Qwen3.5-9B-GGUF:Q5_K_M \
  -ngl 99 -fa --jinja --port 8080

# Configure Pi to use local model
# Set provider to OpenAI-compatible, point to localhost:8080

Ecosystem

735+ related repos on GitHub including MCP adapters, Emacs frontends, web access extensions. Comparison: pi-vs-claude-code. Awesome list: awesome-pi-agent.

Company risk

The developer created a company around Pi. This could mean better development resources, or it could mean the open-source version gets deprioritized in favor of a paid product. Watch for signs of enshittification.

Qwen 3.5 Compatibility

From community reports: Pi with Qwen 3.5 models is "unbeatable in daily use" — several hundred tool calls and agent loops per day. The key was fixing Jinja template / thinking tag issues specific to Qwen models. Extensions/fixes for Qwen tool call and thinking tags are available in the Pi ecosystem.

2. OpenCode

Needs investigation as an alternative to Pi. Key questions to answer:

3. Gastown

github.com/steveyegge/gastown — by Steve Yegge

A multi-agent orchestration system that coordinates 20-30 AI coding agents (Claude Code, Copilot, Codex, Gemini) working on different tasks. The highest-ceiling option.

Architecture

Uses git-backed "hooks" for work persistence across agent crashes. Requires: Go 1.25+, Git 2.25+, Dolt 1.82.4+.

Even SOTA LLMs struggle with Gastown

The multi-agent coordination pattern requires models to:

This is the hardest agent coordination problem. Local models at 10 t/s will be slow and may struggle with the protocol complexity. But if it works, it is the most powerful architecture available.

Why Try It Anyway

Other Frameworks Worth Knowing

Aider — Best for Code Editing

github.com/Aider-AI/aider

# With llama.cpp server on port 8080
aider --model openai/qwen3.5-9b --openai-api-base http://localhost:8080/v1

OpenHands — Autonomous SWE Agent

github.com/All-Hands-AI/OpenHands

Open Interpreter

github.com/openinterpreter/open-interpreter

Goose (by Block/Square)

github.com/block/goose

Models That Can't Run Locally

These were mentioned in your notes but don't fit on your hardware:

ModelSizeWhy Not
MiniMax M2.5229BAPI only (minimax.io)
Kimi K2.5 Thinking1.1T MoEWay too large
DeepSeek V3.2 Exp685BNeeds 128GB+ even quantized
MiMo-V2-Flash310B600GB+ VRAM needed

Prompt Engineering for Local Models

Reasoning Budget

--reasoning-budget 1024           # Cap thinking tokens (quick tool calls)
--reasoning-budget 4096           # Complex reasoning
--reasoning-budget 0              # Disable thinking for pure tool-calling
--reasoning-budget -1             # Unlimited
Bug: --reasoning-budget has a regression in recent llama.cpp builds (issue #21487). Use build <= 8661 until PR #21594 merges.

Best Practices for Agentic Quantized Models

  1. Use Q5_K_M or Q6_K minimum for tool calling — Q4 degrades structured output reliability
  2. Always enable --jinja with models that have native tool-call template support
  3. For Qwen 3.5: override chat template if you see broken <think> tags
  4. Set --n-predict to 8192-16384 to prevent runaway generation
  5. For agentic loops: don't feed <think> blocks back into conversation history

Sources