SOP Agent - Human-Governed Procedure Executor
Built an AI-powered SOP execution engine with document ingestion, hybrid RAG, an 8-node LangGraph workflow, human approval gates, FastAPI, Streamlit monitoring, MCP tools, and traceable Markdown reports.
8
graph nodes
55
tests
4
mcp tools
Problem
Standard Operating Procedures are usually static documents. Operators still have to interpret them, find the relevant steps, decide what evidence applies, and document what happened. That creates risk when procedures are long, compliance-sensitive, or require human approval at critical points.
SOP Agent turns SOP documents into an evidence-grounded execution workflow. It ingests procedure files, builds a plan, retrieves evidence for each step, recommends actions, verifies them independently, pauses for operator approval when needed, and produces a traceable final report.
My Role & Responsibilities
- Built the full SOP execution backend with FastAPI, LangGraph, Pydantic, SQLite, ChromaDB, and SSE streaming
- Implemented multi-format ingestion for PDF, DOCX, TXT, and Markdown with validation, parsing, structure-aware chunking, embeddings, and indexing
- Designed the 8-node LangGraph state graph: intake, planner, evidence router, executor, verifier, approval gate, replanner, and reporter
- Added hybrid retrieval using dense ChromaDB search plus SQLite FTS5 lexical search with Reciprocal Rank Fusion and source-diverse evidence packs
- Built a Streamlit operator console for upload, task entry, live execution monitoring, approval decisions, and report viewing
- Exposed MCP tools for AI IDE/agent integration: ingest SOP, run SOP, approve step, and get report
- Added Docker Compose deployment, environment-driven multi-provider LLM configuration, tests, and documentation
Architecture
Streamlit Operator Console
Upload -> Task -> Monitor -> Approve -> Report
|
FastAPI Backend
ingest, execute, intervene, report, sessions, MCP
|
LangGraph StateGraph
Intake -> Planner -> Evidence Router -> Executor
-> Verifier -> Approval Gate / Replanner -> Reporter
|
Retrieval + Storage
ChromaDB dense vectors + SQLite FTS5 lexical index
The graph uses interrupt_before on the approval gate, which keeps the AI workflow human-governed instead of fully autonomous. Operators can approve, override, skip, abort, or request a replan.
Tech Stack
- Agent framework: LangGraph StateGraph with conditional routing and optional checkpointing
- Backend: FastAPI, Uvicorn, Pydantic v2, SSE endpoints, structured logging
- Frontend: Streamlit operator console with real-time execution monitoring
- Retrieval: ChromaDB dense vectors, SQLite FTS5 BM25 lexical search, Reciprocal Rank Fusion, evidence deduplication, source diversification
- Ingestion: pdfplumber, python-docx, text/Markdown parsing, MIME validation, structure-aware chunking
- LLM providers: Gemini, OpenAI, Anthropic, and Ollama via environment-driven factory configuration
- MCP: tool listing, tool calls, SSE keepalive, and four agent-facing tools
- Deployment: Dockerfile, Dockerfile.frontend, Docker Compose,
.envconfiguration
Platform Preview
Key Features Delivered
Multi-Format SOP Ingestion
The ingestion pipeline validates uploads, parses supported document types, extracts section structure, chunks the content, embeds it, stores dense vectors in ChromaDB, and writes lexical chunks into SQLite FTS5.
Hybrid Evidence Retrieval
Each step gets evidence from dense and lexical retrieval. Reciprocal Rank Fusion merges candidates, near-duplicate chunks are removed, and the final evidence pack is diversified across source files.
Human-on-the-Loop Execution
The agent graph pauses before approval-sensitive actions. The operator can approve, override, skip, abort, or request a replan, and those interventions are reflected in the execution trace.
Traceable Reports and MCP Integration
Completed runs produce Markdown reports with evidence citations and intervention history. The MCP server exposes the same core operations to external AI tools and IDE agents.
Results & Impact
- Implemented an 8-node SOP execution graph with independent verification and approval-gated control flow
- Shipped a complete FastAPI + Streamlit workflow covering upload, execution, monitoring, approval, and reporting
- Added hybrid RAG over ChromaDB and SQLite FTS5, with RRF fusion and evidence-pack construction
- Exposed 4 MCP tools for agentic integration
- Documented a test suite of 55 tests across unit and integration coverage areas
Challenges & Lessons Learned
- Human governance matters: SOP execution can include high-stakes actions, so the system must pause at policy-sensitive points instead of pretending autonomy is always appropriate
- Evidence quality: retrieval needs both semantic and lexical signals because procedural language often contains exact names, thresholds, or compliance terms
- State management: resumable workflow execution requires a clear session model, event timeline, and graph state schema
- Provider portability: keeping LLM and embedding providers environment-driven made the app easier to run with cloud APIs or local Ollama
How AI/Agents Were Used
The application itself is an agentic workflow: planner, evidence router, executor, verifier, approval gate, replanner, and reporter cooperate through a LangGraph state machine. I also used AI-assisted development to iterate on architecture, API schemas, tests, and documentation, while keeping the final behavior constrained by explicit policy, retrieval evidence, and human approval.