← Home All Case Files ▦
SCROLL 000%
LOCAL WINS: 65%+
01Home 02Case Files 03About 04Skills 05Contact Resume Hire Me ↗
// CASE FILE 02 — CODENAME: PYM PROTOCOL — LOCAL-FIRST · ZERO-COST

GO SMALL FIRST, MACH1.

A local-first multi-tool AI agent on LangChain and LangGraph (StateGraph), running open-source LLMs through Ollama — a confidence-scored triage router answers 65%+ of requests locally and escalates only the hard prompts to Claude via MCP. Fast, private, zero-cost. Small when small works; big only when it counts.

Inspired By
Ant-Man
Role
Creator
Inference
Local-First
Escalation
Claude via MCP
Triage·Local First·Tools·Memory·Escalate Only When Worthy· Triage·Local First·Tools·Memory·Escalate Only When Worthy·
FILE 01 — THE MISSION

MOST PROMPTS DON'T NEED A FRONTIER MODEL

Sending every prompt to a frontier API is the engineering equivalent of calling in an Avenger to open a jar. Most requests are small — a small, local model can handle them faster, cheaper, and completely privately.

Mach1's confidence-scored triage router makes that call on every request: 65%+ resolve locally on open-source LLMs through Ollama (Gemma); only genuinely hard prompts escalate to Claude via MCP. The result is an agent that's fast, private, and zero-cost for the bulk of its work — and still has a heavyweight on speed dial.

"Go small first. Escalate only when it counts."

  • FILE SNAPSHOT
  • Router — confidence-scored triage
  • Local — Ollama (Gemma), 65%+ of requests
  • Escalation — Claude via MCP, hard prompts only
  • Tools — files, shell, REPL, web & more
  • Memory — episodic + semantic ChromaDB
  • Stack — FastAPI · React + TypeScript (Vite)
FILE 02 — THE PIPELINE

TRIAGE → LOCAL → ESCALATE IF WORTHY

01

Triage

Every request hits the confidence-scored triage router first — it decides in-flight whether local inference is enough.

02

Answer Locally

Open-source LLMs through Ollama (Gemma) handle 65%+ of traffic on LangChain + LangGraph StateGraph — no API bill, no data leaving the machine.

03

Run the Tool Loop

A full multi-tool agent loop: file ops, shell, Python REPL, DuckDuckGo, Wikipedia, Stack Overflow, and URL browsing — orchestrated per request.

04

Remember in Layers

Episodic and semantic vector stores in ChromaDB with top-k RAG recall, plus a periodic summarizer agent compacting history — served over a FastAPI backend.

05

Escalate When Worthy

Hard prompts route to Claude via MCP — the giant-mode button, pressed only when the router's confidence says the local model is outmatched.

FILE 03 — THE CONTROL ROOM

A UI THAT SHOWS EVERY DECISION

🎚️

Live Triage Controls

Adjustable Threshold Slider

Live confidence and routing tags on every message, plus a threshold slider to tune how eagerly the agent escalates — in real time.

Confidence ScoresRouting Tags
🧰

Per-Message Tool Badges

See What the Agent Used

Every response wears badges for the tools it actually invoked — shell, REPL, web, files — no black-box answers.

7 ToolsTransparent Runs
🗃️

Multi-Session Sidebar

Three-Mode React + TS (Vite) UI

Sessions, memory indicators, and mode switching in one desktop-grade interface built on React + TypeScript.

React + TSVite
🔐

Private by Default

Local Inference · Zero Cost

The 65%+ that stays local costs nothing and leaks nothing — privacy isn't a mode, it's the default route.

OllamaGemmaFastAPI
65%+
Requests Answered Locally
7
Tools in the Agent Loop
0
Cost for Local Inference
FILE 04 — THE BLUEPRINT

SMALL CORE, GIANT REACH

React + TS UI

Sessions · Badges · Threshold Slider

▼ ▼ ▼
FastAPI Backend

Agent Loop Serving

Triage Router

Confidence-Scored Routing

▼ ▼ ▼
Local Agent

Ollama (Gemma) · LangGraph

Tool Loop

Files · Shell · REPL · Web

Claude via MCP

Hard Prompts Only

▼ ▼ ▼
ChromaDB Memory

Episodic + Semantic · Top-K RAG · Summarizer Agent

Local by default, escalation by evidence — the router decides, the UI shows its work
END OF FILE

MISSION LOGGED. RETURN TO BASE.