SuperHarness: AI Agent Development Framework Benchmarking Claude Code
SuperHarness is a production-grade AI Agent framework built with Rust + Python, benchmarking Claude Code and LangChain. Phase 0 complete with 192 tests passed.
#AI#Agent Framework#Rust#Python#Architecture
Introduction
In the AI Agent development field, Claude Code CLI and LangChain represent two different paradigms:
Claude Code CLI: Terminal product, user-facing, ready to use
LangChain: Development framework, developer-facing, flexible but steep learning curve
SuperHarness attempts to merge both: provide ready-to-use terminal product while offering concise Python SDK for developers to customize.
Input validation, PII sanitization, access control
Layer 1
Core services
LLM client, config management, storage engine
Layer 2
Session & memory
Session management, memory system, state persistence
Layer 3
Tool system
Tool registration, executor, sandbox isolation
Layer 4
Agent runtime
Loop control, tool calling, error recovery
Layer 5
Application layer
CLI, TUI, Python SDK
Python SDK
Concise API
from superharness import Agent# Create Agentagent = Agent( api_key="your-api-key", provider="anthropic", # or "openai", "gemini", "custom" model="claude-sonnet-4-6")# Simple conversationresponse = agent.run("Hello, how are you?")print(response)# Streaming outputfor chunk in agent.run_stream("Tell me a story"): print(chunk, end="", flush=True)
Tool Registration
@agent.tooldef calculate(expression: str) -> float: """Evaluate a mathematical expression.""" return eval(expression)# Use toolresponse = agent.run("What is 123 * 456?")
CLI / TUI
Terminal Interaction
# Enter TUI directlysuperharness# Show helpsuperharness --help# Show versionsuperharness --version
TUI Features
Syntax highlighting
Streaming output display
Keyboard shortcuts
Session management
Tech Stack
Component
Technology
CLI/TUI
Rust + ratatui
Python SDK
Python 3.10+ + httpx
Core
Rust (PyO3 bindings)
Configuration
TOML + Environment variables
Development Progress
Phase 0: SDK/TUI Basic Features ✅ Complete
Module
Status
Tests
SDK LLM Calls
✅
82 passed
SDK Streaming Output
✅
SDK Tool Calling
✅
SDK Session Management
✅
TUI Core Features
✅
110 passed
TUI Code Editor
✅
UI Components
✅
8/8 scenarios
Total: 192 tests all passed
Phase 1: Production Features ⏳ In Progress
Feature
Status
Description
Complete Tool Chain
⏳
Bash/Read/Write/Edit/Grep
Agent Planning
⏳
Task decomposition, self-correction
Git Integration
⏳
diff/commit/PR
MCP Protocol
⏳
MCP client support
Design Philosophy
Core Principles
No MVP/Demo - Develop complete product directly
Benchmark Claude Code/LangChain - Match or exceed
Precise Tasks - Clear completion criteria and acceptance conditions
Real Verification - Verify with actual user workflows
Transparency Three Principles
No Hidden Behavior - All operations visible
State Visualization - Real-time progress display
Explainable Decisions - Explain tool selection and execution