Harness & Agent Frameworks

Three harnesses to test: Pi, OpenCode, and Gastown

The Three Candidates

Harness	Architecture	Ceiling	Risk	Local Model Support
Pi Coding Agent	Single-agent CLI, extensible	High	Dev created company	OpenAI-compatible API (llama.cpp server)
OpenCode	Needs investigation	Unknown	Less ecosystem	Needs testing
Gastown	Multi-agent orchestration (20-30 agents)	Highest	Even SOTA LLMs struggle	Coordinates multiple agent instances

1. Pi Coding Agent

github.com/badlogic/pi-mono — by Mario Zechner (@badlogic)

An open-source agentic coding CLI similar to Claude Code. Monorepo with packages for agent core, coding CLI, TUI, web UI, Slack bot, and GPU pod management.

Key Features

Granular hook/event system (session_fork, turn_start, tool_execution_update) — more than Claude Code
15+ extensions: UI customization, safety auditing (damage-control), task management
Multi-agent: subagents (/sub), teams (/team), chains (/chain)
TypeScript SDK for custom extensions
SKILL.md format for declarative workflows
Session sharing via Hugging Face

Local Model Setup

# Start llama.cpp server
llama-server -hf unsloth/Qwen3.5-9B-GGUF:Q5_K_M \
  -ngl 99 -fa --jinja --port 8080

# Configure Pi to use local model
# Set provider to OpenAI-compatible, point to localhost:8080

Ecosystem

735+ related repos on GitHub including MCP adapters, Emacs frontends, web access extensions. Comparison: pi-vs-claude-code. Awesome list: awesome-pi-agent.

Company risk

The developer created a company around Pi. This could mean better development resources, or it could mean the open-source version gets deprioritized in favor of a paid product. Watch for signs of enshittification.

Qwen 3.5 Compatibility

From community reports: Pi with Qwen 3.5 models is "unbeatable in daily use" — several hundred tool calls and agent loops per day. The key was fixing Jinja template / thinking tag issues specific to Qwen models. Extensions/fixes for Qwen tool call and thinking tags are available in the Pi ecosystem.

2. OpenCode

Needs investigation as an alternative to Pi. Key questions to answer:

Does it support OpenAI-compatible APIs (llama.cpp server)?
How mature is the tool calling / agentic loop?
Extension system? Can it match Pi's hooks?
How does it handle Qwen 3.5 thinking tags?

3. Gastown

github.com/steveyegge/gastown — by Steve Yegge

A multi-agent orchestration system that coordinates 20-30 AI coding agents (Claude Code, Copilot, Codex, Gemini) working on different tasks. The highest-ceiling option.

Architecture

Mayor — coordinator that assigns work
Rigs — project containers (isolated workspaces)
Polecats — worker agents that execute tasks
Convoys — work tracking and persistence

Uses git-backed "hooks" for work persistence across agent crashes. Requires: Go 1.25+, Git 2.25+, Dolt 1.82.4+.

Even SOTA LLMs struggle with Gastown

The multi-agent coordination pattern requires models to:

Maintain coherent state across 20-30 simultaneous agent sessions
Correctly parse and follow complex orchestration protocols
Handle git-based work handoffs without corruption
Recover from partial failures in distributed agent execution

This is the hardest agent coordination problem. Local models at 10 t/s will be slow and may struggle with the protocol complexity. But if it works, it is the most powerful architecture available.

Why Try It Anyway

If local models can drive Gastown, you have a 20-30x productivity multiplier
The git-backed persistence means crashed agents don't lose work
You could mix: SOTA cloud models for the Mayor + local models for Polecats
The architecture itself teaches patterns for building robust multi-agent systems

Other Frameworks Worth Knowing

Aider — Best for Code Editing

github.com/Aider-AI/aider

Best-in-class for code editing with local models
Supports Ollama and any OpenAI-compatible API
Features: codebase mapping, git integration, auto-commits, voice-to-code
Set context window in .aider.model.settings.yml: num_ctx: 65536

# With llama.cpp server on port 8080
aider --model openai/qwen3.5-9b --openai-api-base http://localhost:8080/v1

OpenHands — Autonomous SWE Agent

github.com/All-Hands-AI/OpenHands

77.6% on SWE-Bench, full autonomous agent
Explicitly tested on RTX 3060 12GB with Qwen3-Coder-30B-A3B
Minimum 22,000 token context, recommends 32,768
Configure: openai/<model-name> with base URL http://host.docker.internal:<port>/v1

Open Interpreter

github.com/openinterpreter/open-interpreter

Natural language to code execution (Python, JS, Shell)
Default local context: 3000 tokens (needs manual increase)

Goose (by Block/Square)

github.com/block/goose

Desktop app + CLI + API, built in Rust, Apache 2.0
70+ MCP extensions, 15+ providers including Ollama

Models That Can't Run Locally

These were mentioned in your notes but don't fit on your hardware:

Model	Size	Why Not
MiniMax M2.5	229B	API only (minimax.io)
Kimi K2.5 Thinking	1.1T MoE	Way too large
DeepSeek V3.2 Exp	685B	Needs 128GB+ even quantized
MiMo-V2-Flash	310B	600GB+ VRAM needed

Prompt Engineering for Local Models

Reasoning Budget

--reasoning-budget 1024           # Cap thinking tokens (quick tool calls)
--reasoning-budget 4096           # Complex reasoning
--reasoning-budget 0              # Disable thinking for pure tool-calling
--reasoning-budget -1             # Unlimited

Bug: --reasoning-budget has a regression in recent llama.cpp builds (issue #21487). Use build <= 8661 until PR #21594 merges.

Best Practices for Agentic Quantized Models

Use Q5_K_M or Q6_K minimum for tool calling — Q4 degrades structured output reliability
Always enable --jinja with models that have native tool-call template support
For Qwen 3.5: override chat template if you see broken <think> tags
Set --n-predict to 8192-16384 to prevent runaway generation
For agentic loops: don't feed <think> blocks back into conversation history

Sources

← Inference Engines Multimodal & OCR →