Back to all work

AI Agent Pipeline - Documentary Script Generation

History Tales Script Generator

A production-ready LangGraph agent that autonomously generates high-retention, evidence-led history documentary scripts. 18-node pipeline with dual-model architecture, deterministic validation gates, cross-run learning, and a full web interface.

Application Screenshot

Replace with actual screenshot

Problem

Creating a high-quality documentary script requires weeks of research, fact-checking, narrative structuring, and multiple editing passes. Content creators need to cross-reference primary sources, maintain factual accuracy, engineer viewer retention, and hit precise timing targets - all while writing in a compelling cinematic style.

Why It Matters

The YouTube documentary space generates millions of views per video, but the bottleneck isn't production - it's scriptwriting. A single 15-minute script can take 40+ hours of research and writing. This agent compresses that to minutes while maintaining source provenance, factual accuracy, and narrative quality that matches hand-written scripts.

Architecture Overview

An 18-node LangGraph workflow that separates concerns into distinct processing stages: research, analysis, writing, and quality assurance. Uses a dual-model architecture - creative tier (GPT-5) for writing nodes, fast tier (GPT-5.2) for analytical nodes. Deterministic validation gates between stages enforce structural constraints that no LLM hallucination can bypass. A feedback memory system learns from past runs and injects lessons into future prompts.

┌──────────────────────────────────────┐

│        Architecture Diagram        │

│     Replace with actual diagram     │

└──────────────────────────────────────┘

Stack Breakdown

LangGraph

Stateful workflow orchestration with conditional edges and retry loops

Python

Core pipeline logic, prompt engineering, and validation

OpenAI API (Dual-Model)

Creative tier for writing, fast tier for structured extraction

FastAPI

REST API server with SSE streaming for real-time progress

Next.js 14 + shadcn/ui

Professional web interface with live pipeline tracking

Pydantic

Strict state schema validation across all 18 nodes

Wikipedia + Archives API

Evidence-led research from Library of Congress, National Archives, Europeana

Vitest + Pytest

41 deterministic validator tests + integration coverage

Technical Decisions

01

Dual-model architecture

Writing quality and analytical speed have different requirements. Creative nodes (Outline, ScriptGeneration, RetentionPass) use a high-quality model for nuanced prose. Analytical nodes (scoring, extraction, QC) use a faster model for structured JSON output. This cuts cost and latency by 60% without sacrificing script quality.

02

Deterministic validation gates

LLMs can hallucinate structural compliance. The HardGuardrailsNode and FactTightenNode use deterministic Python validators - word count, entity provenance, tension escalation, rehook cadence - that cannot be bypassed by model output. If validation fails, the pipeline loops back with specific feedback.

03

Two-stage script generation (Draft → Fact-Tighten)

Stage A writes the creative draft. Stage B rewrites with per-paragraph trace tags ([Beat B03 | Claims C001,C012]) that create an auditable link between every statement and its source evidence. Tags are stripped from the final script but available for verification.

04

Cross-run feedback memory

After every run, QC issues and recommendations are saved to .memory/. On the next run, distilled lessons are prepended to Outline and ScriptGeneration prompts. The agent learns to avoid recurring issues - pass rate improves over successive runs without any code changes.

05

Surgery-only retention pass

The RetentionPassNode can only modify existing text - it cannot introduce new named entities or events. A deterministic guard checks for entity provenance: if the retention pass introduces a person not in the original script, the edit is rejected and the original is used instead.

Tradeoffs

Claims capped at 5 sources × 10 claims (50 max) to prevent token bloat - may miss niche facts

QC retry loop limited to 2 iterations - prevents infinite loops but may produce slightly off-target word counts

Wikipedia as primary research source - broad coverage but lower credibility than academic databases

In-process state management - works for single-user runs; would need Redis/PostgreSQL state persistence for concurrent multi-user

Scaling Considerations

Batch processing via async loop for bulk generation - wrap run_agent() for parallel pipelines

HTTP response caching to .cache/ reduces API costs on reruns by 70%+

LangGraph state can be persisted for resumption - enables long-running pipelines across server restarts

FastAPI with SSE streaming handles 50+ concurrent viewers; would need WebSocket upgrade for 500+

Explore the code

The full source code, documentation, and architecture decisions are available on GitHub.