GO SMALL FIRST, MACH1.
A local-first multi-tool AI agent on LangChain and LangGraph (StateGraph), running open-source LLMs through Ollama — a confidence-scored triage router answers 65%+ of requests locally and escalates only the hard prompts to Claude via MCP. Fast, private, zero-cost. Small when small works; big only when it counts.
MOST PROMPTS DON'T NEED A FRONTIER MODEL
Sending every prompt to a frontier API is the engineering equivalent of calling in an Avenger to open a jar. Most requests are small — a small, local model can handle them faster, cheaper, and completely privately.
Mach1's confidence-scored triage router makes that call on every request: 65%+ resolve locally on open-source LLMs through Ollama (Gemma); only genuinely hard prompts escalate to Claude via MCP. The result is an agent that's fast, private, and zero-cost for the bulk of its work — and still has a heavyweight on speed dial.
"Go small first. Escalate only when it counts."
- FILE SNAPSHOT
- Router — confidence-scored triage
- Local — Ollama (Gemma), 65%+ of requests
- Escalation — Claude via MCP, hard prompts only
- Tools — files, shell, REPL, web & more
- Memory — episodic + semantic ChromaDB
- Stack — FastAPI · React + TypeScript (Vite)
TRIAGE → LOCAL → ESCALATE IF WORTHY
Triage
Every request hits the confidence-scored triage router first — it decides in-flight whether local inference is enough.
Answer Locally
Open-source LLMs through Ollama (Gemma) handle 65%+ of traffic on LangChain + LangGraph StateGraph — no API bill, no data leaving the machine.
Run the Tool Loop
A full multi-tool agent loop: file ops, shell, Python REPL, DuckDuckGo, Wikipedia, Stack Overflow, and URL browsing — orchestrated per request.
Remember in Layers
Episodic and semantic vector stores in ChromaDB with top-k RAG recall, plus a periodic summarizer agent compacting history — served over a FastAPI backend.
Escalate When Worthy
Hard prompts route to Claude via MCP — the giant-mode button, pressed only when the router's confidence says the local model is outmatched.
A UI THAT SHOWS EVERY DECISION
Live Triage Controls
Live confidence and routing tags on every message, plus a threshold slider to tune how eagerly the agent escalates — in real time.
Per-Message Tool Badges
Every response wears badges for the tools it actually invoked — shell, REPL, web, files — no black-box answers.
Multi-Session Sidebar
Sessions, memory indicators, and mode switching in one desktop-grade interface built on React + TypeScript.
Private by Default
The 65%+ that stays local costs nothing and leaks nothing — privacy isn't a mode, it's the default route.
SMALL CORE, GIANT REACH
React + TS UI
Sessions · Badges · Threshold Slider
FastAPI Backend
Agent Loop Serving
Triage Router
Confidence-Scored Routing
Local Agent
Ollama (Gemma) · LangGraph
Tool Loop
Files · Shell · REPL · Web
Claude via MCP
Hard Prompts Only
ChromaDB Memory
Episodic + Semantic · Top-K RAG · Summarizer Agent