Description

A large number of interviews are still pending transcription. We will use the local system’s ztranscribe capability (alias for ~/.config/zsh/recipes/yt-transcribe) to download and transcribe these videos directly from YouTube.

Plan

  1. Discovery: Find all video_assets missing a transcript_id that have a valid youtube platform ID.
  2. Transcription: For each video, run the local yt-transcribe pipeline: ~/.config/zsh/recipes/yt-transcribe https://www.youtube.com/watch?v=<id>.
  3. Ingestion: Move the resulting .txt transcripts from ~/Downloads/transcripts/ into a staging area and use the project’s ./bin/transcripts pipeline to ingest them into _data/transcripts/.
  4. Audit: Run the transcript-conversational-audit skill (using rake audit:prepare[slug] and rake audit:ingest[slug]) to clean the transcript, separate speakers, and generate the durable insights and SEO metadata.

Acceptance Criteria

Implementation Plan

Plan (Updated for async capabilities)

  1. Discovery: Find all video_assets missing a transcript_id with a valid youtube platform ID.
  2. Enqueue Jobs: For each video, use zdots-ctx enqueue transcription '{"url": "https://www.youtube.com/watch?v=<id>"}' to queue the download and transcription asynchronously.
  3. Background Worker: Run the zdots-ctx worker in a background process or terminal pane to process the queued transcriptions without blocking the main workflow.
  4. Ingestion & Auditing: As jobs complete and output to ~/Downloads/transcripts/, stage them in tmp/transcript-id-staging/ using their video_asset_id and run ./bin/transcripts ingest. Finally, run the transcript-conversational-audit skill to generate insights and metadata.

Implementation Notes

Background worker is currently processing the queue of 41 videos. Wrote a script bin/stage_completed_transcripts.rb to automatically stage any finished .txt files from the worker. Successfully staged, ingested, and performed a canonical audit on the first completed transcript (david-heinemeier-hansson-dhh-railsconf-2014). The worker will continue processing the remaining ~40 videos in the background over the next few hours. Once done, the staging and ingestion scripts can be run again to process the rest of the batch.

Definition of Done