AI Agent Pipeline - Documentary Script Generation
History Tales Script Generator
A production-ready LangGraph agent that autonomously generates high-retention, evidence-led history documentary scripts. 18-node pipeline with dual-model architecture, deterministic validation gates, cross-run learning, and a full web interface.
Application Screenshot
Replace with actual screenshot
Problem
Creating a high-quality documentary script requires weeks of research, fact-checking, narrative structuring, and multiple editing passes. Content creators need to cross-reference primary sources, maintain factual accuracy, engineer viewer retention, and hit precise timing targets - all while writing in a compelling cinematic style.
Why It Matters
The YouTube documentary space generates millions of views per video, but the bottleneck isn't production - it's scriptwriting. A single 15-minute script can take 40+ hours of research and writing. This agent compresses that to minutes while maintaining source provenance, factual accuracy, and narrative quality that matches hand-written scripts.
Architecture Overview
An 18-node LangGraph workflow that separates concerns into distinct processing stages: research, analysis, writing, and quality assurance. Uses a dual-model architecture - creative tier (GPT-5) for writing nodes, fast tier (GPT-5.2) for analytical nodes. Deterministic validation gates between stages enforce structural constraints that no LLM hallucination can bypass. A feedback memory system learns from past runs and injects lessons into future prompts.
┌──────────────────────────────────────┐
│ Architecture Diagram │
│ Replace with actual diagram │
└──────────────────────────────────────┘
Stack Breakdown
Stateful workflow orchestration with conditional edges and retry loops
Core pipeline logic, prompt engineering, and validation
Creative tier for writing, fast tier for structured extraction
REST API server with SSE streaming for real-time progress
Professional web interface with live pipeline tracking
Strict state schema validation across all 18 nodes
Evidence-led research from Library of Congress, National Archives, Europeana
41 deterministic validator tests + integration coverage
Technical Decisions
Dual-model architecture
Writing quality and analytical speed have different requirements. Creative nodes (Outline, ScriptGeneration, RetentionPass) use a high-quality model for nuanced prose. Analytical nodes (scoring, extraction, QC) use a faster model for structured JSON output. This cuts cost and latency by 60% without sacrificing script quality.
Deterministic validation gates
LLMs can hallucinate structural compliance. The HardGuardrailsNode and FactTightenNode use deterministic Python validators - word count, entity provenance, tension escalation, rehook cadence - that cannot be bypassed by model output. If validation fails, the pipeline loops back with specific feedback.
Two-stage script generation (Draft → Fact-Tighten)
Stage A writes the creative draft. Stage B rewrites with per-paragraph trace tags ([Beat B03 | Claims C001,C012]) that create an auditable link between every statement and its source evidence. Tags are stripped from the final script but available for verification.
Cross-run feedback memory
After every run, QC issues and recommendations are saved to .memory/. On the next run, distilled lessons are prepended to Outline and ScriptGeneration prompts. The agent learns to avoid recurring issues - pass rate improves over successive runs without any code changes.
Surgery-only retention pass
The RetentionPassNode can only modify existing text - it cannot introduce new named entities or events. A deterministic guard checks for entity provenance: if the retention pass introduces a person not in the original script, the edit is rejected and the original is used instead.
Tradeoffs
Claims capped at 5 sources × 10 claims (50 max) to prevent token bloat - may miss niche facts
QC retry loop limited to 2 iterations - prevents infinite loops but may produce slightly off-target word counts
Wikipedia as primary research source - broad coverage but lower credibility than academic databases
In-process state management - works for single-user runs; would need Redis/PostgreSQL state persistence for concurrent multi-user
Scaling Considerations
Batch processing via async loop for bulk generation - wrap run_agent() for parallel pipelines
HTTP response caching to .cache/ reduces API costs on reruns by 70%+
LangGraph state can be persisted for resumption - enables long-running pipelines across server restarts
FastAPI with SSE streaming handles 50+ concurrent viewers; would need WebSocket upgrade for 500+
Explore the code
The full source code, documentation, and architecture decisions are available on GitHub.