No.001 · Made in Cyprus · Updated today

Long video in.
Watchable clips out.

Not another OpusClip. This one sees the video — gemma4-31b reads every frame, whisper-large-v3 catches every word, the crop tracks the speaker's face across every shot. Six-axis score, per-platform virality read, refuses to ship a mediocre clip.

* Takes 3–5 min per 10 min of source. Ranked by score.
6
scoring axes. Hook, emotion, clarity, payoff, retention, virality. Every clip earns its slot — nothing ships on a whim.
03.

How it thinks

Three ideas set this apart from every OpusClip-shaped tool.

01

It sees the video.

Every 2.5 seconds a frame is sent to the vision model. Face bounding box, shot type, gesture, emotion — all fed back into the crop so the speaker never drifts out of frame.

02

It scores by audience.

Six intrinsic axes plus a platform read — TikTok, Shorts, Reels get different scores because they behave differently. The pipeline tells you where the clip actually belongs.

03

It refuses mediocre clips.

Reject-first prompt. Rather ship one strong clip than five mid ones. Intros, filler, sponsor reads, throat-clearing — all trimmed automatically before you see the result.

Rather return 1 great clip than 5 mid ones. — analyst system prompt
04.

The stack

  1. 01Ingestyt-dlp · 1080p mp4 + isolated audio track
  2. 02Transcribewhisper-large-v3 · segment timestamps
  3. 03Analysegemma4-31b · reject-first prompt · 6 axes
  4. 04Seegemma4-31b vision · face & shot per 2.5s frame
  5. 05DirectLLM plans camera moves per clip
  6. 06Renderffmpeg 9:16 · head-room-aware crop
  7. 07SubtitleASS burn-in · max 3 lines · keyword accent
  8. 08Verifyvision QA · 0–100 composition score