Tasks

This task list is prioritized to protect pipeline correctness first, then data integrity, then non-visual SEO improvements.

Historical Context

Theme 1: Build and CI reliability foundation

Theme 2: Data integrity and generation quality

Theme 3: SEO/semantic policy hardening

Theme 4: Build-time last-modified workflow

Retrospective Process

Critical Constraint

Current Operating Plan

Phase 1: Configuration + Pipeline Correctness

Phase 2: Structural Integrity of Data

Phase 3: Technical SEO Optimizations (Non-Visual)

Logged Recommendations (Current)

  1. Keep plan text synchronized with implementation in the same commit series.
  2. Preserve sitemap-driven smoke scope and avoid reintroducing hardcoded route lists.
  3. Keep SITEMAP_MAX_URLS=5000 unless explicit publication-scope expansion is approved.
  4. If publication model changes, define canonical/robots policy first, then implement validation.
  5. Keep JSON-LD required-field rules in lockstep with template/schema changes.
  6. Keep semantic graph artifacts and documentation snapshots synchronized when schema contracts change.
  7. Keep agent/skill usage explicit: use registered skills for CI/smoke/security tasks; add a dedicated SEO semantic skill only if repeated workflows become too custom.

Agent/Skill Coverage

Short-Term Backlog

  1. Transcript Coverage Expansion
    • Status: deferred (work in progress).
    • Notes: transcript onboarding should continue as new source transcript files are produced; avoid blocking other metadata/SEO work on transcript completeness.
  2. Metadata Completion at Scale (Active Candidate)
    • Status: active candidate.
    • Fill missing video_assets fields (description, topic) in prioritized batches.
    • Fill missing interviews topic values where conference/community context is known.
    • Keep topic/description conventions consistent with canonical slugs and transcript-derived phrasing.
  3. SEO Metadata Quality Cleanup
    • Status: implemented (2026-02-12 pass), monitor and tune.
    • Added global head-level title/description normalization for minimum and maximum lengths.
    • Improved generated interview/video metadata copy to avoid thin descriptions and low-information titles.
    • Ongoing: monitor tmp/seo-metadata-report.json for regressions as new content lands.
  4. Data Model Documentation Alignment
    • Status: implemented, keep synchronized.
    • Update docs to reflect per-file transcripts in _data/transcripts/*.yml as canonical transcript storage.
    • Clarify _data/transcripts.yml is legacy/placeholder and not the active content source.
    • Keep docs synchronized with generators/templates in the same commit series.
  5. Structured Data Object Model Expansion
    • Status: implemented (first complete pass), monitor and tune.
    • Interview schema now encodes richer entity relationships (Interview, Person, Event, Organization, linked VideoObject).
    • Resume schema now enforces Person + Occupation + career ItemList consistency for ATS/search use.
    • Added semantic graph snapshot docs workflow (./bin/pipeline semantic-snapshot) for reviewability.
  6. Ongoing Maintenance
    • Status: continuous.
    • Continue periodic validator hardening only where reports indicate drift.
    • Keep command/documentation grammar aligned to bin/pipeline subcommands.
    • Track retrospective follow-through as first-class process work, not post-hoc cleanup.