Transcript Import
Canonical Model
- Transcript files live in
_data/transcripts/*.yml. - Video assets reference transcripts via
_data/video_assets.ymltranscript_id. _data/transcripts.ymlis legacy and not used for active content.
Commands
- Audit current repository transcript integrity:
./bin/transcripts audit
- Build ID-suffixed staging files (recommended for ambiguous filenames):
./bin/transcripts prepare --source-dir /Volumes/Dock_1TB/vimeo/outbox --output-dir tmp/transcript-id-staging --min-confidence 0.8 --clean-output
- Run import in dry-run mode:
./bin/transcripts dry-run --source-dir tmp/transcript-id-staging --min-confidence 0.9
- Review output reports:
tmp/transcript-import-report.jsontmp/transcript-import-report.md
- Apply high-confidence mappings:
./bin/transcripts ingest --source-dir tmp/transcript-id-staging --min-confidence 0.9
Direct Import Mode
If filenames already include explicit IDs and do not need staging:
./bin/transcripts dry-run --source-dir /Volumes/Dock_1TB/vimeo/outbox --min-confidence 0.9./bin/transcripts ingest --source-dir /Volumes/Dock_1TB/vimeo/outbox --min-confidence 0.9
Report Files
- Mapping report:
tmp/transcript-import-report.json - Human-readable summary:
tmp/transcript-import-report.md
Legacy sequence (kept for reference)
- Run import in dry-run mode:
./bin/transcripts dry-run --source-dir /Volumes/Dock_1TB/vimeo/outbox --min-confidence 0.9
- Review output reports:
tmp/transcript-import-report.jsontmp/transcript-import-report.md
- Apply high-confidence mappings:
./bin/transcripts ingest --source-dir /Volumes/Dock_1TB/vimeo/outbox --min-confidence 0.9
- Re-run pipeline validation:
./bin/transcripts validate
One-Command Batch Mode
- Ingest + audit + validate + commit:
./bin/transcripts ingest --source-dir /Volumes/Dock_1TB/vimeo/outbox --auto-commit
- Ingest + audit + validate + commit + push:
./bin/transcripts ingest --source-dir /Volumes/Dock_1TB/vimeo/outbox --auto-commit --auto-push
Notes
- Supported source file formats:
.txt,.md,.srt,.vtt. - Existing transcript files are not overwritten unless
--forceis supplied. - Low-confidence mappings are never auto-applied; review those in the report first.