Harness & Agent Frameworks
Three harnesses to test: Pi, OpenCode, and Gastown
The Three Candidates
| Harness | Architecture | Ceiling | Risk | Local Model Support |
|---|---|---|---|---|
| Pi Coding Agent | Single-agent CLI, extensible | High | Dev created company | OpenAI-compatible API (llama.cpp server) |
| OpenCode | Needs investigation | Unknown | Less ecosystem | Needs testing |
| Gastown | Multi-agent orchestration (20-30 agents) | Highest | Even SOTA LLMs struggle | Coordinates multiple agent instances |
1. Pi Coding Agent
github.com/badlogic/pi-mono — by Mario Zechner (@badlogic)
An open-source agentic coding CLI similar to Claude Code. Monorepo with packages for agent core, coding CLI, TUI, web UI, Slack bot, and GPU pod management.
Key Features
- Granular hook/event system (
session_fork,turn_start,tool_execution_update) — more than Claude Code - 15+ extensions: UI customization, safety auditing (
damage-control), task management - Multi-agent: subagents (
/sub), teams (/team), chains (/chain) - TypeScript SDK for custom extensions
- SKILL.md format for declarative workflows
- Session sharing via Hugging Face
Local Model Setup
# Start llama.cpp server
llama-server -hf unsloth/Qwen3.5-9B-GGUF:Q5_K_M \
-ngl 99 -fa --jinja --port 8080
# Configure Pi to use local model
# Set provider to OpenAI-compatible, point to localhost:8080
Ecosystem
735+ related repos on GitHub including MCP adapters, Emacs frontends, web access extensions. Comparison: pi-vs-claude-code. Awesome list: awesome-pi-agent.
The developer created a company around Pi. This could mean better development resources, or it could mean the open-source version gets deprioritized in favor of a paid product. Watch for signs of enshittification.
Qwen 3.5 Compatibility
From community reports: Pi with Qwen 3.5 models is "unbeatable in daily use" — several hundred tool calls and agent loops per day. The key was fixing Jinja template / thinking tag issues specific to Qwen models. Extensions/fixes for Qwen tool call and thinking tags are available in the Pi ecosystem.
2. OpenCode
Needs investigation as an alternative to Pi. Key questions to answer:
- Does it support OpenAI-compatible APIs (llama.cpp server)?
- How mature is the tool calling / agentic loop?
- Extension system? Can it match Pi's hooks?
- How does it handle Qwen 3.5 thinking tags?
3. Gastown
github.com/steveyegge/gastown — by Steve Yegge
A multi-agent orchestration system that coordinates 20-30 AI coding agents (Claude Code, Copilot, Codex, Gemini) working on different tasks. The highest-ceiling option.
Architecture
- Mayor — coordinator that assigns work
- Rigs — project containers (isolated workspaces)
- Polecats — worker agents that execute tasks
- Convoys — work tracking and persistence
Uses git-backed "hooks" for work persistence across agent crashes. Requires: Go 1.25+, Git 2.25+, Dolt 1.82.4+.
The multi-agent coordination pattern requires models to:
- Maintain coherent state across 20-30 simultaneous agent sessions
- Correctly parse and follow complex orchestration protocols
- Handle git-based work handoffs without corruption
- Recover from partial failures in distributed agent execution
This is the hardest agent coordination problem. Local models at 10 t/s will be slow and may struggle with the protocol complexity. But if it works, it is the most powerful architecture available.
Why Try It Anyway
- If local models can drive Gastown, you have a 20-30x productivity multiplier
- The git-backed persistence means crashed agents don't lose work
- You could mix: SOTA cloud models for the Mayor + local models for Polecats
- The architecture itself teaches patterns for building robust multi-agent systems
Other Frameworks Worth Knowing
Aider — Best for Code Editing
- Best-in-class for code editing with local models
- Supports Ollama and any OpenAI-compatible API
- Features: codebase mapping, git integration, auto-commits, voice-to-code
- Set context window in
.aider.model.settings.yml:num_ctx: 65536
# With llama.cpp server on port 8080
aider --model openai/qwen3.5-9b --openai-api-base http://localhost:8080/v1
OpenHands — Autonomous SWE Agent
github.com/All-Hands-AI/OpenHands
- 77.6% on SWE-Bench, full autonomous agent
- Explicitly tested on RTX 3060 12GB with Qwen3-Coder-30B-A3B
- Minimum 22,000 token context, recommends 32,768
- Configure:
openai/<model-name>with base URLhttp://host.docker.internal:<port>/v1
Open Interpreter
github.com/openinterpreter/open-interpreter
- Natural language to code execution (Python, JS, Shell)
- Default local context: 3000 tokens (needs manual increase)
Goose (by Block/Square)
- Desktop app + CLI + API, built in Rust, Apache 2.0
- 70+ MCP extensions, 15+ providers including Ollama
Models That Can't Run Locally
These were mentioned in your notes but don't fit on your hardware:
| Model | Size | Why Not |
|---|---|---|
| MiniMax M2.5 | 229B | API only (minimax.io) |
| Kimi K2.5 Thinking | 1.1T MoE | Way too large |
| DeepSeek V3.2 Exp | 685B | Needs 128GB+ even quantized |
| MiMo-V2-Flash | 310B | 600GB+ VRAM needed |
Prompt Engineering for Local Models
Reasoning Budget
--reasoning-budget 1024 # Cap thinking tokens (quick tool calls)
--reasoning-budget 4096 # Complex reasoning
--reasoning-budget 0 # Disable thinking for pure tool-calling
--reasoning-budget -1 # Unlimited
--reasoning-budget has a regression in recent llama.cpp builds (issue #21487). Use build <= 8661 until PR #21594 merges.
Best Practices for Agentic Quantized Models
- Use Q5_K_M or Q6_K minimum for tool calling — Q4 degrades structured output reliability
- Always enable
--jinjawith models that have native tool-call template support - For Qwen 3.5: override chat template if you see broken
<think>tags - Set
--n-predictto 8192-16384 to prevent runaway generation - For agentic loops: don't feed
<think>blocks back into conversation history