SuperHarness: AI Agent Development Framework Benchmarking Claude Code

Introduction

In the AI Agent development field, Claude Code CLI and LangChain represent two different paradigms:

Claude Code CLI: Terminal product, user-facing, ready to use
LangChain: Development framework, developer-facing, flexible but steep learning curve

SuperHarness attempts to merge both: provide ready-to-use terminal product while offering concise Python SDK for developers to customize.

Project Positioning

Dual Product Strategy

Product	Positioning	Target Users
CLI/TUI	Terminal Agent product	End users, developers
Python SDK	Development framework	AI application developers

Benchmarking Analysis

Feature	SuperHarness	Claude Code	LangChain
Multi-LLM Support	✅	✅	✅
Streaming Output	✅	✅	✅
Tool Calling	✅	✅	✅
Session Management	✅	✅	✅
Code Editing	✅	✅	Needs implementation
Git Integration	⏳	✅	Needs implementation
MCP Protocol	⏳	✅	Needs implementation
Agent Planning	⏳	✅	✅

Six-Layer Architecture

┌─────────────────────────────────────────────┐
│ Layer 5: Application Layer (CLI/TUI)        │  ← User interaction
├─────────────────────────────────────────────┤
│ Layer 4: Agent Runtime                      │  ← Agent runtime
├─────────────────────────────────────────────┤
│ Layer 3: Tool System                        │  ← Tool system
├─────────────────────────────────────────────┤
│ Layer 2: Session & Memory                   │  ← Session & memory
├─────────────────────────────────────────────┤
│ Layer 1: Core Services                      │  ← Core services
├─────────────────────────────────────────────┤
│ Layer 0: Security Foundation                │  ← Security foundation
└─────────────────────────────────────────────┘

Layer Responsibilities

Layer	Responsibility	Key Components
Layer 0	Security foundation	Input validation, PII sanitization, access control
Layer 1	Core services	LLM client, config management, storage engine
Layer 2	Session & memory	Session management, memory system, state persistence
Layer 3	Tool system	Tool registration, executor, sandbox isolation
Layer 4	Agent runtime	Loop control, tool calling, error recovery
Layer 5	Application layer	CLI, TUI, Python SDK

Python SDK

Concise API

from superharness import Agent
 
# Create Agent
agent = Agent(
    api_key="your-api-key",
    provider="anthropic",  # or "openai", "gemini", "custom"
    model="claude-sonnet-4-6"
)
 
# Simple conversation
response = agent.run("Hello, how are you?")
print(response)
 
# Streaming output
for chunk in agent.run_stream("Tell me a story"):
    print(chunk, end="", flush=True)

Tool Registration

@agent.tool
def calculate(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return eval(expression)
 
# Use tool
response = agent.run("What is 123 * 456?")

CLI / TUI

Terminal Interaction

# Enter TUI directly
superharness
 
# Show help
superharness --help
 
# Show version
superharness --version

TUI Features

Syntax highlighting
Streaming output display
Keyboard shortcuts
Session management

Tech Stack

Component	Technology
CLI/TUI	Rust + ratatui
Python SDK	Python 3.10+ + httpx
Core	Rust (PyO3 bindings)
Configuration	TOML + Environment variables

Development Progress

Phase 0: SDK/TUI Basic Features ✅ Complete

Module	Status	Tests
SDK LLM Calls	✅	82 passed
SDK Streaming Output	✅
SDK Tool Calling	✅
SDK Session Management	✅
TUI Core Features	✅	110 passed
TUI Code Editor	✅
UI Components	✅	8/8 scenarios

Total: 192 tests all passed

Phase 1: Production Features ⏳ In Progress

Feature	Status	Description
Complete Tool Chain	⏳	Bash/Read/Write/Edit/Grep
Agent Planning	⏳	Task decomposition, self-correction
Git Integration	⏳	diff/commit/PR
MCP Protocol	⏳	MCP client support

Design Philosophy

Core Principles

No MVP/Demo - Develop complete product directly
Benchmark Claude Code/LangChain - Match or exceed
Precise Tasks - Clear completion criteria and acceptance conditions
Real Verification - Verify with actual user workflows

Transparency Three Principles

No Hidden Behavior - All operations visible
State Visualization - Real-time progress display
Explainable Decisions - Explain tool selection and execution

Project Structure

superharness/
├── python/                  # Python SDK
│   └── superharness_sdk/
│       ├── agent/          # Agent runtime
│       ├── llm/            # LLM client
│       ├── config/         # Configuration system
│       ├── tools/          # Tool system
│       └── memory/         # Memory system
├── rust/                    # Rust core
│   ├── layer0/             # Security foundation
│   ├── layer1/             # Core services
│   ├── layer2/             # Session management
│   └── layer3/             # Tool system
├── cli/                     # CLI/TUI
│   └── src/
│       ├── cli/            # Command line parsing
│       ├── tui/            # TUI interface
│       └── agent/          # Agent client
├── docs/                    # Documentation
└── tests/                   # Tests

Current Status

Phase 0 complete, 192 tests all passed. Phase 1 in progress, aiming to complete production features benchmarking Claude Code's full capabilities.