BLR—— ——

P / 03 · 2025

Case study

Tessact AI

Video intelligence platform

A video AI product for searching footage, asking questions against a library, and turning responses into edits people can actually work with.

Product preview

(I) — Core premise

Video enters the queue.

A video upload triggers a parallel processing pipeline that produces preview assets, video derivatives, transcription, and indexed scene intelligence.

The system separates fast local work (metadata, thumbnails, scrubs) from heavy GPU-gated work (transcoding, AI analysis) using dedicated background workers. What leaves the pipeline is a structured, searchable asset, not just a stored file.

The engineering challenge was keeping these parallel pipelines observable, failure-isolated, and composable while preserving the ability to route to different analysis paths and turn branches on or off without touching the core upload path.

Output groups

Processing tiers

Join point

Asset classes

Execution model

Async

Branch control

Config

(II) — Pipeline outputs

What leaves the other end.

IPreview assets

Thumbnail (quality-scored)
Scrub contact sheet
Hover-preview sprites
Lightweight web formats

IIVideo derivatives

Normalized transcode
Watermarked copy
Streaming package
Playback-ready output

IIITranscription + timing

Audio extraction
Managed transcription
Speaker labels + subtitles
Voice activity segments

IVVision + intelligence

Face tracks (chunked detection)
Shot / segment boundaries
Per-scene structured extraction
Content categorization

(III) — Pipeline fan-out

One upload,
five parallel branches.

The upload finalizer fans out into independent background tasks. Local processing runs immediately. The GPU-gated AI path and optional branches run in parallel, and none of them block the upload response.

(IV) — Local processing

Fast work, always.

01
Quality-scored thumbnails
The thumbnail generator samples frames across the usable middle of the video, scores candidates by visual quality, and picks the strongest one. The goal is a representative frame that avoids the weak openings and endings common in uploaded footage.
02
Scrub and sprite sheets
Evenly distributed frames are assembled into contact sheets and hover-preview sprites so the library can show visual timeline previews without loading the full video. Very short videos can skip the heavier preview outputs.
03
Technical metadata on arrival
Technical metadata is extracted on first download so duration, dimensions, frame rate, and codec details appear immediately. That makes the asset usable in the library before any heavy downstream processing completes.

(V) — GPU-gated path

Heavy work, conditional.

01
Corruption check before anything expensive
The orchestration task runs an integrity check before starting transcription, face detection, or segment detection. If the file is corrupted, the video is marked and the AI pipeline is skipped. No partial job state to clean up.
02
Transcoding shifts the processing path
When transcoding runs, downstream analysis switches to a normalized working copy instead of the raw upload. That keeps later stages consistent and reduces edge cases in chunked and GPU-heavy processing.
03
Feature flags control the AI fan-out
Runtime configuration decides which analysis branches the orchestration task creates. Organizations can be on different combinations without code changes or upload-path rewrites.

(VI) — AI analysis

Four jobs, in parallel,
one join.

After download and transcoding, the orchestration task fans out into four independent jobs: transcription, voice-activity detection, chunked face detection, and full-video segment detection. Scene assembly starts only after both transcription and segment detection complete.

(VII) — Job orchestration

Every async step leaves a trail.

01
One record per logical job
Each background step creates a durable job record with its type, status, progress context, and failure details. Active work is queryable, and completion updates the record rather than disappearing into the task runner.
02
Chunked face processing
Face detection fans out into independent time-based chunks so long videos can be processed in parallel. Each chunk carries its own tracking record, and the parent job only resolves when all chunk work completes cleanly.
03
No-audio fallback by design
If the video has no audio track, no transcription request is sent. Instead, the pipeline creates placeholder scene structure so downstream assembly can still complete without failing on a missing dependency.

(VIII) — Scene assembly

Transcript meets shots.
Structured extraction fills the gaps.

Scene assembly aligns transcript-derived scene boundaries to shot-detection boundaries, fills any gaps so the full video is covered, cuts each scene into its own clip, and runs per-scene structured extraction. The result is indexed for search and persisted into relational detection tables.

(IX) — Transcription pipeline

Silence as a boundary.

01
Sentence chunks on silence gaps
The transcription processor identifies large silence gaps and uses them as scene chunk boundaries. This produces semantically coherent transcript scenes before shot detection is available, so the two inputs to scene assembly need careful alignment.
02
Parallel subtitle generation
Transcription completion also schedules subtitle generation as a side effect. Caption artifacts are persisted for downstream use without blocking the scene-assembly join.
03
Content categorization from frames
The transcription completion path also triggers local content categorization. A frame grid is generated, classified, and written back to the video record. This runs without the main GPU path and without a separate orchestration branch.

(X) — Observability

Status all the way down.

File status progression

Upload accepted
Queued for processing
Transcoding or analysis in progress
Completed, failed, or corrupted

Job-level tracking

One record per logical job
Chunk-level tracking for parallel work
Progress persisted outside the worker
Live progress updates back to clients

Failure handling

Automatic retry on transient failures
Error tracking for background work
Operational alerts on hard failures
Clean bypass when corruption is detected

(XI) — Performance risks

Where it could break.

01
Heavy work on the upload hot path
Transcription, transcoding, and face detection cannot sit on the upload request. The finalizer returns before any background task completes, and all heavy work moves to the queue.
02
Chunk completion coordination
Face detection dispatches multiple independent chunk jobs per video. Without parent-child job tracking, completion becomes ambiguous. Any chunk failure needs to propagate to the parent without losing the others.
03
Scene assembly joining two async results
Transcription and segment detection run independently. Scene assembly starts only when both complete. A failure in one blocks scene assembly indefinitely without proper completion semantics.
04
Progress moving backwards
Parallel progress reporters can race. The persisted progress model has to prevent later updates from making the asset look less complete than it already is.

(XII) — Tradeoffs

What I locked, what I left.

Strong choices

Queue separation for independent pipelines
Local processing and heavy analysis run in separate worker pools. A slow transcode does not starve lightweight preview work.
Corruption check before expensive work
Integrity validation happens before any AI job is created. A bad file fails cheaply without spawning orphaned work across multiple services.
Parent/child job model for chunked work
Chunked analysis uses a parent-child job model with explicit completion semantics. The parent advances only when all chunk work resolves.
No-audio fallback is a first-class path
Videos without an audio track still produce placeholder scene structure. Scene assembly runs on the same code path with no special cases.

Deliberate tradeoffs

GPU gate at scheduling time, not runtime
The main AI path is only added to the task list when the environment is configured for it. Routing decisions happen at fanout, not inside tasks.
Multiple analysis paths coexist
Newer and older analysis approaches overlap conceptually. The current upload path is clear, but the codebase still shows its evolutionary history.
Scene assembly requires both async results
Transcription and segment detection must both complete before scene assembly starts. If one is slow, assembly waits. No partial scene output.
Not every processor is on the main upload path
Some supporting processors exist in the codebase without being part of the default library pipeline. Scope decisions left them disconnected rather than removed.

(XIII) — Outcomes

What the pipeline produced.

Outcome

Assets usable before AI completes

Thumbnail, scrub, and technical metadata are available seconds after upload. The library shows a usable asset while the GPU path runs in the background.

Outcome

Scene-level search on every video

Scene assembly produces structured, indexed scene payloads for every video that completes the AI path, supporting search, tagging, comments, and repurpose workflows.

Outcome

Per-org AI feature rollout

Feature flags let organizations opt into different pipeline branches independently. New analysis paths ship without touching the upload path or affecting organizations that don't need them.

(XIV) — Learnings

What stayed true.

01
Separate queues for separate concerns
Routing local processing and GPU work to different queues is not overengineering. Without it, a single slow transcode job can delay thumbnail generation for every other upload.
02
Validate before you spend
A corruption check that skips the entire AI pipeline on a bad file is one of the highest-leverage things in the orchestration task. The alternative is orphaned jobs across four providers and no clean terminal state.
03
Feature flags belong at scheduling time
Reading flags when tasks are enqueued, not inside the task, keeps worker logic simple and makes the active pipeline visible from the scheduling call alone.
04
Async joins need explicit completion semantics
Scene assembly waiting on two independent async results is only tractable because each result has a clear terminal state. Without explicit job records, the join becomes a polling loop against unstable state.

Next case study

P / 04 · 2025

Web Video Editor

Multi-track timeline in the browser

↗

Tessact AI

Video enters the queue.

What leaves the other end.

One upload,five parallel branches.

Fast work, always.

Quality-scored thumbnails

Scrub and sprite sheets

Technical metadata on arrival

Heavy work, conditional.

Corruption check before anything expensive

Transcoding shifts the processing path

Feature flags control the AI fan-out

Four jobs, in parallel,one join.

Every async step leaves a trail.

One record per logical job

Chunked face processing

No-audio fallback by design

Transcript meets shots.Structured extraction fills the gaps.

Silence as a boundary.

Sentence chunks on silence gaps

Parallel subtitle generation

Content categorization from frames

Status all the way down.

Where it could break.

Heavy work on the upload hot path

Chunk completion coordination

Scene assembly joining two async results

Progress moving backwards

What I locked, what I left.

Queue separation for independent pipelines

Corruption check before expensive work

Parent/child job model for chunked work

No-audio fallback is a first-class path

GPU gate at scheduling time, not runtime

Multiple analysis paths coexist

Scene assembly requires both async results

Not every processor is on the main upload path

What the pipeline produced.

What stayed true.

Separate queues for separate concerns

Validate before you spend

Feature flags belong at scheduling time

Async joins need explicit completion semantics

Web Video Editor

One upload,
five parallel branches.

Four jobs, in parallel,
one join.

Transcript meets shots.
Structured extraction fills the gaps.