BLR—— ——

P / 02 · 2025 — 2026

Case study

Media Pipeline

Long-form to short-form, automated

A Django, FastAPI, Celery, and RabbitMQ media pipeline that repurposes long-form video into near-ready short-form deliverables, using durable run/job/output records, artifact-first stage boundaries, PydanticAI-driven analysis, and Remotion on AWS Lambda for parallel rendering.

Product preview

(I) — The premise

Ready to post, ready in minutes.

Manual social clipping took about four weeks per campaign. This pipeline turns a two-hour podcast into 30 to 40 branded social clips in roughly one hour.

Before this, enterprises were paying editors to sit through hours of recording, find the usable moments, cut them down, caption them, and export multiple aspect ratios by hand. The target here was not draft quality. It was near-ready output at production throughput.

The system now lands around 95% ready-to-post quality by pushing the work through twelve cooperating stages that can fail, resume, and reuse work without losing their place.

Branded clips / run

30-40

Turnaround time

1 hr

Ready-to-post quality

95%

Time reduction

4wk→1hr

Pipeline stages

Parallel export

Lambda

(II) — System map

Five cooperating parts.

01django + queues

Control Plane

Django APIs receive repurpose requests
Celery + RabbitMQ drive async execution
FastAPI serves AI-heavy endpoints
Owns run state and render completion

02analysis + orchestration

Pipeline Runtime

Selects the active workflow
Triggers stages when prerequisites are ready
Records job-level progress and failure
Uses PydanticAI + LLMs to analyze and score moments

03short-form renderer

Render Runtime

Consumes prepared clip payloads
Renders branded 9:16, 1:1, and 16:9 variants
Fans out exports through Remotion on AWS Lambda
Hands files back to product storage

04authoring + publish

Style Plane

Authors subtitle and overlay systems
Pins client-specific branding at runtime
Publishes immutable creative packages
Defines runtime-safe contracts

05durable state

State Layer

Run record (one per source video)
Job record (one per stage)
Output record (one per short)
Execution metadata + logs

(III) — DAG

Twelve stages,
explicit edges.

Multiple transcription providers produce timestamped text. Segment detection and scene analysis establish candidate boundaries. PydanticAI-driven LLM passes look for high-energy moments, speaker changes, topic completeness, hook strength, quotability, and likely engagement. Preparation and face tracking then shift to clip-local assets, with the heavier vision path deployed separately on Cloud Run. Composition writes the render index. Rendering fans out through Remotion on AWS Lambda, and the analysis boundary closes before delivery does.

(IV) — Records

Three lifecycles,
one spine.

The run is the top-level control object. Each stage becomes a durable job. Each rendered short becomes its own output. The tight job state model — pending, in_progress, completed, failed — keeps orchestration predictable while still allowing retries, subtree resets, and reuse of completed work.

(V) — Artifact contract

Stages don't call,
they publish.

Each stage reads the artifacts it depends on, computes, writes new structured data or media to durable storage, and marks its job complete. The orchestrator triggers what comes next. Delegated work, retries, and reuse all fall out of that contract. Debugging becomes reading the directory.

(VI) — Render fanout

Webhook or poll,
whichever wins.

The composition index spawns one render job per short. Remotion packages client branding, captions, and aspect-ratio variants, then AWS Lambda fans those renders out in parallel. Each output completes on its own schedule, signalled either by an async webhook or by background polling. Finalization is modeled as a terminal-state transition so a missed callback is recovered by the poll, and a duplicate signal becomes a no-op.

(VII) — Ownership

Authoring out,
production in.

The style and enhancement control plane is a separate subsystem. It authors and publishes immutable creative contracts: subtitle systems, overlays, audio treatment, and visual rules. The pipeline pins them by version at runtime. Creative quality control stays out of the production DAG.

(VIII) — Operational behavior

Reuse, rerun, recover.

Reuse of completed work

Completed jobs are first-class

The trigger layer can attach a new execution to compatible completed jobs for the same source asset and stage type. No separate cache. The orchestrator keys off completed contracts, not task invocations. Latency drops, cost drops, and the graph continues from the reused boundary.

Enhancement-only rerun

Replay from a controlled subtree

Style override reruns enter at the enhancement-planning boundary rather than at the start. The override lives in the run metadata, and downstream composition and rendering regenerate. The DAG is not just executable. It is partially replayable.

Failure boundaries

Workflow ≠ delivery

Workflow completion = analysis + composition done. Delivery completion = every output reached a terminal render outcome. Any failed required stage marks the workflow failed. Per-clip render failures roll up into delivery state, not workflow state.

(IX) — Failure points

Where it could break.

01
Inconsistent source media
Strange codecs, broken containers, missing audio. Transcoding has to be defensive enough that downstream stages can assume a stable input.
02
Long jobs failing late
An hour of analysis cannot collapse on a render error. Resumability has to live at the artifact boundary, not at the task level.
03
Delegated stages and missed callbacks
Stitch Prepare and Rendering both depend on external systems. Polling is the safety net so a dropped webhook doesn't strand a run.
04
Editorial drift in generated plans
Plans can be technically valid but editorially uneven. Treating the plan as an explicit contract, with style binding and enhancement separation, keeps reruns cheap when only the creative layer needs to move.

(X) — Outcomes

What the architecture buys.

Outcome

Four weeks down to one hour

What used to take a campaign-long editorial loop now compresses into a single processing run, because transcription, planning, preparation, and rendering all execute as resumable async stages.

Outcome

95% ready-to-post output

Candidate clips are not just cut on timestamps. They are scored for hook strength, emotional tone, quotability, and completion, then rendered with client branding and captions already applied.

Outcome

30 to 40 clips in one run

Parallel Lambda export turns per-clip rendering into fanout instead of a queue of serial renders, so dozens of outputs land in minutes rather than hours.

(XI) — Learnings

What stayed useful.

01
Make the contract the artifact
When stages publish to durable storage, retries, reuse, and delegated execution all become trivial. The orchestrator stops caring about who computed what.
02
Separate clip selection from creative
Holding clip selection and creative treatment in different stages made enhancement-only reruns possible without rebuilding the whole graph.
03
Two completion signals beat one
Webhooks reduce time-to-finalize on the happy path; polling survives missed callbacks. Treat finalization as a terminal-state transition and the race resolves itself.
04
Pin creative as runtime input
Style is not a render flag. It is a versioned contract published from a separate control plane. The pipeline pins it for consistency, auditability, and safer reruns.

Next case study

P / 03 · 2025

Tessact AI

Video intelligence platform

↗

Media Pipeline

Ready to post, ready in minutes.

Five cooperating parts.

Twelve stages,explicit edges.

Three lifecycles,one spine.

Stages don't call,they publish.

Webhook or poll,whichever wins.

Authoring out,production in.

Reuse, rerun, recover.

Where it could break.

Inconsistent source media

Long jobs failing late

Delegated stages and missed callbacks

Editorial drift in generated plans

What the architecture buys.

What stayed useful.

Make the contract the artifact

Separate clip selection from creative

Two completion signals beat one

Pin creative as runtime input

Tessact AI

Twelve stages,
explicit edges.

Three lifecycles,
one spine.

Stages don't call,
they publish.

Webhook or poll,
whichever wins.

Authoring out,
production in.