Reference cinema into executable production grammar.
yt2ctx turns a video into timed context, representative stills, and cinematic instructions that downstream coding and generation agents can actually use.
View sourceThe pipeline
A run downloads the source, extracts audio, transcribes speech, samples frames, scores visual salience, selects representative stills, and compiles reusable cinematic grammar.
The output is intentionally practical: Markdown for humans and agents, JSON for systems, frame images for visual grounding, and a ZIP bundle for handoff.
One core, three surfaces
The web app, CLI, MCP server, and HTTP API all call the same TypeScript core. That keeps behavior consistent whether the user is reviewing a run, scripting a batch, or letting an MCP client request context directly.
- Web
- Interactive review and artifact download.
- CLI
- Local automation and long-running jobs.
- MCP
- Agent-native access through watch_youtube.
- API
- Buffered JSON and streaming NDJSON for integrations.
Status
This is an MIT-licensed project. Public deployments should add authentication and usage limits before exposing the analyzer, because each run can incur OpenAI usage.