SOP Agent - Human-Governed Procedure Executor

Problem

Standard Operating Procedures are usually static documents. Operators still have to interpret them, find the relevant steps, decide what evidence applies, and document what happened. That creates risk when procedures are long, compliance-sensitive, or require human approval at critical points.

SOP Agent turns SOP documents into an evidence-grounded execution workflow. It ingests procedure files, builds a plan, retrieves evidence for each step, recommends actions, verifies them independently, pauses for operator approval when needed, and produces a traceable final report.

My Role & Responsibilities

Built the full SOP execution backend with FastAPI, LangGraph, Pydantic, SQLite, ChromaDB, and SSE streaming
Implemented multi-format ingestion for PDF, DOCX, TXT, and Markdown with validation, parsing, structure-aware chunking, embeddings, and indexing
Designed the 8-node LangGraph state graph: intake, planner, evidence router, executor, verifier, approval gate, replanner, and reporter
Added hybrid retrieval using dense ChromaDB search plus SQLite FTS5 lexical search with Reciprocal Rank Fusion and source-diverse evidence packs
Built a Streamlit operator console for upload, task entry, live execution monitoring, approval decisions, and report viewing
Exposed MCP tools for AI IDE/agent integration: ingest SOP, run SOP, approve step, and get report
Added Docker Compose deployment, environment-driven multi-provider LLM configuration, tests, and documentation

Architecture

Streamlit Operator Console
  Upload -> Task -> Monitor -> Approve -> Report
        |
FastAPI Backend
  ingest, execute, intervene, report, sessions, MCP
        |
LangGraph StateGraph
  Intake -> Planner -> Evidence Router -> Executor
  -> Verifier -> Approval Gate / Replanner -> Reporter
        |
Retrieval + Storage
  ChromaDB dense vectors + SQLite FTS5 lexical index

The graph uses interrupt_before on the approval gate, which keeps the AI workflow human-governed instead of fully autonomous. Operators can approve, override, skip, abort, or request a replan.

Tech Stack

Agent framework: LangGraph StateGraph with conditional routing and optional checkpointing
Backend: FastAPI, Uvicorn, Pydantic v2, SSE endpoints, structured logging
Frontend: Streamlit operator console with real-time execution monitoring
Retrieval: ChromaDB dense vectors, SQLite FTS5 BM25 lexical search, Reciprocal Rank Fusion, evidence deduplication, source diversification
Ingestion: pdfplumber, python-docx, text/Markdown parsing, MIME validation, structure-aware chunking
LLM providers: Gemini, OpenAI, Anthropic, and Ollama via environment-driven factory configuration
MCP: tool listing, tool calls, SSE keepalive, and four agent-facing tools
Deployment: Dockerfile, Dockerfile.frontend, Docker Compose, .env configuration

Platform Preview

Key Features Delivered

Multi-Format SOP Ingestion

The ingestion pipeline validates uploads, parses supported document types, extracts section structure, chunks the content, embeds it, stores dense vectors in ChromaDB, and writes lexical chunks into SQLite FTS5.

Hybrid Evidence Retrieval

Each step gets evidence from dense and lexical retrieval. Reciprocal Rank Fusion merges candidates, near-duplicate chunks are removed, and the final evidence pack is diversified across source files.

Human-on-the-Loop Execution

The agent graph pauses before approval-sensitive actions. The operator can approve, override, skip, abort, or request a replan, and those interventions are reflected in the execution trace.

Traceable Reports and MCP Integration

Completed runs produce Markdown reports with evidence citations and intervention history. The MCP server exposes the same core operations to external AI tools and IDE agents.

Results & Impact

Implemented an 8-node SOP execution graph with independent verification and approval-gated control flow
Shipped a complete FastAPI + Streamlit workflow covering upload, execution, monitoring, approval, and reporting
Added hybrid RAG over ChromaDB and SQLite FTS5, with RRF fusion and evidence-pack construction
Exposed 4 MCP tools for agentic integration
Documented a test suite of 55 tests across unit and integration coverage areas

Challenges & Lessons Learned

Human governance matters: SOP execution can include high-stakes actions, so the system must pause at policy-sensitive points instead of pretending autonomy is always appropriate
Evidence quality: retrieval needs both semantic and lexical signals because procedural language often contains exact names, thresholds, or compliance terms
State management: resumable workflow execution requires a clear session model, event timeline, and graph state schema
Provider portability: keeping LLM and embedding providers environment-driven made the app easier to run with cloud APIs or local Ollama

How AI/Agents Were Used

The application itself is an agentic workflow: planner, evidence router, executor, verifier, approval gate, replanner, and reporter cooperate through a LangGraph state machine. I also used AI-assisted development to iterate on architecture, API schemas, tests, and documentation, while keeping the final behavior constrained by explicit policy, retrieval evidence, and human approval.