// CASE FILE 02 — CODENAME: PYM PROTOCOL — LOCAL-FIRST · ZERO-COST

GO SMALL FIRST, MACH1.

A local-first multi-tool AI agent on LangChain and LangGraph (StateGraph), running open-source LLMs through Ollama — a confidence-scored triage router answers 65%+ of requests locally and escalates only the hard prompts to Claude via MCP. Fast, private, zero-cost. Small when small works; big only when it counts.

Inspired By

Ant-Man

Role

Creator

Inference

Local-First

Escalation

Claude via MCP

View on GitHub ↗ ← Back to Home

Triage·Local First·Tools·Memory·Escalate Only When Worthy· Triage·Local First·Tools·Memory·Escalate Only When Worthy·

FILE 01 — THE MISSION

MOST PROMPTS DON'T NEED A FRONTIER MODEL

Sending every prompt to a frontier API is the engineering equivalent of calling in an Avenger to open a jar. Most requests are small — a small, local model can handle them faster, cheaper, and completely privately.

Mach1's confidence-scored triage router makes that call on every request: 65%+ resolve locally on open-source LLMs through Ollama (Gemma); only genuinely hard prompts escalate to Claude via MCP. The result is an agent that's fast, private, and zero-cost for the bulk of its work — and still has a heavyweight on speed dial.

"Go small first. Escalate only when it counts."

FILE SNAPSHOT
Router — confidence-scored triage
Local — Ollama (Gemma), 65%+ of requests
Escalation — Claude via MCP, hard prompts only
Tools — files, shell, REPL, web & more
Memory — episodic + semantic ChromaDB
Stack — FastAPI · React + TypeScript (Vite)

FILE 02 — THE PIPELINE

TRIAGE → LOCAL → ESCALATE IF WORTHY

Triage

Every request hits the confidence-scored triage router first — it decides in-flight whether local inference is enough.

Answer Locally

Open-source LLMs through Ollama (Gemma) handle 65%+ of traffic on LangChain + LangGraph StateGraph — no API bill, no data leaving the machine.

Run the Tool Loop

A full multi-tool agent loop: file ops, shell, Python REPL, DuckDuckGo, Wikipedia, Stack Overflow, and URL browsing — orchestrated per request.

Remember in Layers

Episodic and semantic vector stores in ChromaDB with top-k RAG recall, plus a periodic summarizer agent compacting history — served over a FastAPI backend.

Escalate When Worthy

Hard prompts route to Claude via MCP — the giant-mode button, pressed only when the router's confidence says the local model is outmatched.

FILE 03 — THE CONTROL ROOM

A UI THAT SHOWS EVERY DECISION

🎚️

Live Triage Controls

Adjustable Threshold Slider

Live confidence and routing tags on every message, plus a threshold slider to tune how eagerly the agent escalates — in real time.

Confidence ScoresRouting Tags

🧰

Per-Message Tool Badges

See What the Agent Used

Every response wears badges for the tools it actually invoked — shell, REPL, web, files — no black-box answers.

7 ToolsTransparent Runs

🗃️

Multi-Session Sidebar

Three-Mode React + TS (Vite) UI

Sessions, memory indicators, and mode switching in one desktop-grade interface built on React + TypeScript.

React + TSVite

🔐

Private by Default

Local Inference · Zero Cost

The 65%+ that stays local costs nothing and leaks nothing — privacy isn't a mode, it's the default route.

OllamaGemmaFastAPI

65%+

Requests Answered Locally

Tools in the Agent Loop

Cost for Local Inference

FILE 04 — THE BLUEPRINT

SMALL CORE, GIANT REACH

React + TS UI

Sessions · Badges · Threshold Slider

▼ ▼ ▼

FastAPI Backend

Agent Loop Serving

Triage Router

Confidence-Scored Routing

▼ ▼ ▼

Local Agent

Ollama (Gemma) · LangGraph

Tool Loop

Files · Shell · REPL · Web

Claude via MCP

Hard Prompts Only

▼ ▼ ▼

ChromaDB Memory

Episodic + Semantic · Top-K RAG · Summarizer Agent

Local by default, escalation by evidence — the router decides, the UI shows its work

END OF FILE

MISSION LOGGED. RETURN TO BASE.

← Return to Home Browse All Case Files GitHub ↗