We Rebuilt Our Video Pipeline as an AI Agent

Our video generation pipeline was a mess. Dozens of scripts, manual handoffs, constant babysitting. One bad output and you’d restart from scratch.

So we rewrote it as an AI agent.

The Problem With Linear Pipelines

Traditional video automation looks like this: script → voice → video → done. Linear. Brittle. If the script is bad, you get a bad video. No feedback loops.

We kept hitting the same issues:

Scripts that sounded robotic
Facts that drifted from the source material
Transitions that didn’t flow
Sections that repeated the same points

The fix was always the same: human reviews the output, spots the problems, feeds corrections back in. We were the quality loop.

Making the AI Its Own Critic

The insight: don’t try to get it right the first time. Build in self-correction.

Our agent runs in phases:

Research - Extract facts and evidence from source material
Outline - Build narrative structure
Draft - Write the full script
Quality Loop - Analyze, critique, rewrite until it’s good
Audio - Generate narration
Video - Compose final output

The magic is phase 4. Instead of hoping the first draft is good, we assume it won’t be. The agent analyzes its own work, flags issues, rewrites the weak sections, and checks again.

The Quality Loop

The quality phase runs multiple iterations. Each pass:

Coherence check - Does the argument flow? Are transitions smooth? Is the voice consistent?
Fact validation - Do claims match the source research?
Issue flagging - Which sections need work?
Targeted rewrites - Fix only the flagged parts
Diminishing returns check - Are we still improving?

It keeps looping until quality gates pass or improvements plateau.

This sounds expensive. It’s not. Targeted rewrites are cheap. You’re not regenerating everything—just the weak sections. And catching problems early saves the real cost: your time reviewing garbage output.

What We Learned

1. Separate the Thinker From the Doer

The agent that writes isn’t the same as the agent that critiques. Different prompts, different roles. The critic is harsh. The writer takes feedback and improves.

This mirrors how good human teams work. Writers need editors. The same brain that created something is bad at finding its flaws.

2. Facts Drift Without Grounding

LLMs hallucinate. Everyone knows this. But they also drift—they’ll start with your source material and slowly wander toward generic takes.

The fix: explicit fact validation against the original research. Not “does this sound true” but “does this match what we extracted in phase 1.”

3. Quality Gates Beat Vibes

We used to eyeball outputs. “Yeah, that’s pretty good.” Now we have explicit criteria: argument flow score, evidence variety score, transition quality. Numbers.

The agent can measure itself. No more subjective “good enough.”

4. State Saves Everything

Long pipelines fail. Network issues, API limits, whatever. Our agent saves state after each phase. Crash at minute 45? Resume from where you stopped.

This was annoying to build. Worth it every time something breaks halfway through.

The Result

What used to take an afternoon of babysitting now runs unattended. Drop in a source document, come back to a finished video. The agent handles the quality loop we used to do manually.

Is every output perfect? No. But the floor is higher. Bad outputs are rare instead of common. And when something is off, it’s usually a judgment call, not an obvious mistake the agent should have caught.

The shift: from hoping AI gets it right to expecting it to self-correct. Build the feedback loop into the system. Let the agent be its own critic.

That’s the pattern. Works for video, works for anything where quality matters and first drafts are usually rough.