Back to blog

SuperHarness: AI Agent Development Framework Benchmarking Claude Code

SuperHarness is a production-grade AI Agent framework built with Rust + Python, benchmarking Claude Code and LangChain. Phase 0 complete with 192 tests passed.

#AI#Agent Framework#Rust#Python#Architecture

Introduction

In the AI Agent development field, Claude Code CLI and LangChain represent two different paradigms:

  • Claude Code CLI: Terminal product, user-facing, ready to use
  • LangChain: Development framework, developer-facing, flexible but steep learning curve

SuperHarness attempts to merge both: provide ready-to-use terminal product while offering concise Python SDK for developers to customize.

Project Positioning

Dual Product Strategy

ProductPositioningTarget Users
CLI/TUITerminal Agent productEnd users, developers
Python SDKDevelopment frameworkAI application developers

Benchmarking Analysis

FeatureSuperHarnessClaude CodeLangChain
Multi-LLM Support
Streaming Output
Tool Calling
Session Management
Code EditingNeeds implementation
Git IntegrationNeeds implementation
MCP ProtocolNeeds implementation
Agent Planning

Six-Layer Architecture

┌─────────────────────────────────────────────┐
│ Layer 5: Application Layer (CLI/TUI)        │  ← User interaction
├─────────────────────────────────────────────┤
│ Layer 4: Agent Runtime                      │  ← Agent runtime
├─────────────────────────────────────────────┤
│ Layer 3: Tool System                        │  ← Tool system
├─────────────────────────────────────────────┤
│ Layer 2: Session & Memory                   │  ← Session & memory
├─────────────────────────────────────────────┤
│ Layer 1: Core Services                      │  ← Core services
├─────────────────────────────────────────────┤
│ Layer 0: Security Foundation                │  ← Security foundation
└─────────────────────────────────────────────┘

Layer Responsibilities

LayerResponsibilityKey Components
Layer 0Security foundationInput validation, PII sanitization, access control
Layer 1Core servicesLLM client, config management, storage engine
Layer 2Session & memorySession management, memory system, state persistence
Layer 3Tool systemTool registration, executor, sandbox isolation
Layer 4Agent runtimeLoop control, tool calling, error recovery
Layer 5Application layerCLI, TUI, Python SDK

Python SDK

Concise API

from superharness import Agent
 
# Create Agent
agent = Agent(
    api_key="your-api-key",
    provider="anthropic",  # or "openai", "gemini", "custom"
    model="claude-sonnet-4-6"
)
 
# Simple conversation
response = agent.run("Hello, how are you?")
print(response)
 
# Streaming output
for chunk in agent.run_stream("Tell me a story"):
    print(chunk, end="", flush=True)

Tool Registration

@agent.tool
def calculate(expression: str) -> float:
    """Evaluate a mathematical expression."""
    return eval(expression)
 
# Use tool
response = agent.run("What is 123 * 456?")

CLI / TUI

Terminal Interaction

# Enter TUI directly
superharness
 
# Show help
superharness --help
 
# Show version
superharness --version

TUI Features

  • Syntax highlighting
  • Streaming output display
  • Keyboard shortcuts
  • Session management

Tech Stack

ComponentTechnology
CLI/TUIRust + ratatui
Python SDKPython 3.10+ + httpx
CoreRust (PyO3 bindings)
ConfigurationTOML + Environment variables

Development Progress

Phase 0: SDK/TUI Basic Features ✅ Complete

ModuleStatusTests
SDK LLM Calls82 passed
SDK Streaming Output
SDK Tool Calling
SDK Session Management
TUI Core Features110 passed
TUI Code Editor
UI Components8/8 scenarios

Total: 192 tests all passed

Phase 1: Production Features ⏳ In Progress

FeatureStatusDescription
Complete Tool ChainBash/Read/Write/Edit/Grep
Agent PlanningTask decomposition, self-correction
Git Integrationdiff/commit/PR
MCP ProtocolMCP client support

Design Philosophy

Core Principles

  1. No MVP/Demo - Develop complete product directly
  2. Benchmark Claude Code/LangChain - Match or exceed
  3. Precise Tasks - Clear completion criteria and acceptance conditions
  4. Real Verification - Verify with actual user workflows

Transparency Three Principles

  • No Hidden Behavior - All operations visible
  • State Visualization - Real-time progress display
  • Explainable Decisions - Explain tool selection and execution

Project Structure

superharness/
├── python/                  # Python SDK
│   └── superharness_sdk/
│       ├── agent/          # Agent runtime
│       ├── llm/            # LLM client
│       ├── config/         # Configuration system
│       ├── tools/          # Tool system
│       └── memory/         # Memory system
├── rust/                    # Rust core
│   ├── layer0/             # Security foundation
│   ├── layer1/             # Core services
│   ├── layer2/             # Session management
│   └── layer3/             # Tool system
├── cli/                     # CLI/TUI
│   └── src/
│       ├── cli/            # Command line parsing
│       ├── tui/            # TUI interface
│       └── agent/          # Agent client
├── docs/                    # Documentation
└── tests/                   # Tests

Current Status

Phase 0 complete, 192 tests all passed. Phase 1 in progress, aiming to complete production features benchmarking Claude Code's full capabilities.


Related Links


Last updated: 2026-05-13