Description

A large number of interviews are still pending transcription. We will use the local system’s ztranscribe capability (alias for ~/.config/zsh/recipes/yt-transcribe) to download and transcribe these videos directly from YouTube.

Plan

Discovery: Find all video_assets missing a transcript_id that have a valid youtube platform ID.
Transcription: For each video, run the local yt-transcribe pipeline: ~/.config/zsh/recipes/yt-transcribe https://www.youtube.com/watch?v=<id>.
Ingestion: Move the resulting .txt transcripts from ~/Downloads/transcripts/ into a staging area and use the project’s ./bin/transcripts pipeline to ingest them into _data/transcripts/.
Audit: Run the transcript-conversational-audit skill (using rake audit:prepare[slug] and rake audit:ingest[slug]) to clean the transcript, separate speakers, and generate the durable insights and SEO metadata.

Acceptance Criteria

#1 Identify all pending video_assets with a youtube platform id
#2 Use yt-transcribe to download and transcribe the audio
#3 Import the resulting transcripts into the project using the transcript pipeline
#4 Run transcript-conversational-audit on the new transcripts to generate insights and metadata

Implementation Plan

Plan (Updated for async capabilities)

Discovery: Find all video_assets missing a transcript_id with a valid youtube platform ID.
Enqueue Jobs: For each video, use zdots-ctx enqueue transcription '{"url": "https://www.youtube.com/watch?v=<id>"}' to queue the download and transcription asynchronously.
Background Worker: Run the zdots-ctx worker in a background process or terminal pane to process the queued transcriptions without blocking the main workflow.
Ingestion & Auditing: As jobs complete and output to ~/Downloads/transcripts/, stage them in tmp/transcript-id-staging/ using their video_asset_id and run ./bin/transcripts ingest. Finally, run the transcript-conversational-audit skill to generate insights and metadata.

Implementation Notes

Background worker is currently processing the queue of 41 videos. Wrote a script bin/stage_completed_transcripts.rb to automatically stage any finished .txt files from the worker. Successfully staged, ingested, and performed a canonical audit on the first completed transcript (david-heinemeier-hansson-dhh-railsconf-2014). The worker will continue processing the remaining ~40 videos in the background over the next few hours. Once done, the staging and ingestion scripts can be run again to process the rest of the batch.

Definition of Done

#1 AC criteria is completed and the change has been verified