Tasks
This task list is prioritized to protect pipeline correctness first, then data integrity, then non-visual SEO improvements.
Historical Context
Theme 1: Build and CI reliability foundation
- Standardized the pipeline around
bin/pipelineas the canonical entrypoint. - Removed deprecated wrapper scripts and aligned docs/CI to canonical commands.
- Kept runtime parity checks, resume artifact checks, and deterministic CI validation as hard gates.
Theme 2: Data integrity and generation quality
- Added strong data validation for uniqueness and required fields.
- Normalized generated interview/video metadata to remove duplication and low-quality copy.
- Kept data checks in the CI path to fail fast on structural regressions.
Theme 3: SEO/semantic policy hardening
- Enforced explicit canonical semantics (
/as root resume route,/home/as homepage route). - Added output validators for canonical/indexability and semantic/schema coverage.
- Added CI artifacts for observability (
seo-metadata-report,schema-coverage-report).
Theme 4: Build-time last-modified workflow
- Implemented build-time last-modified generation and graduated it from experimental to default behavior.
- Added verification checks to ensure rendered
dateModifiedparity across article pages. - Established template/data contract to keep metadata generation and rendered output in sync.
Retrospective Process
- Use the standard retrospective format in
docs/retrospectives/index.mdfor each meaningful implementation cycle. - Every retrospective must explicitly capture:
- what worked
- what did not work
- what went well
- what could be better
- process improvements with owners, checkpoints, and validation methods
- The next retrospective must evaluate prior improvements with status (
upheld,improved,maintained,regressed,dropped) and evidence.
Critical Constraint
- Root route semantics:
/is the default root route and resume page;/home/is homepage content. - Resume artifacts (
/,/resume.txt,/resume.md) must render correctly on every build. - Any task affecting resume or homepage routing/canonical behavior must include automated pass/fail criteria.
Current Operating Plan
Phase 1: Configuration + Pipeline Correctness
- Status: complete and stable.
- Ongoing expectation: keep runtime parity and resume guardrails as blocking CI checks.
Phase 2: Structural Integrity of Data
- Status: complete and stable.
- Ongoing expectation: keep uniqueness/integrity checks in the default CI path.
Phase 3: Technical SEO Optimizations (Non-Visual)
- Status: implemented, actively tuned with metric-driven follow-up.
- Ongoing expectation: preserve canonical/schema contracts and artifact-based observability.
Logged Recommendations (Current)
- Keep plan text synchronized with implementation in the same commit series.
- Preserve sitemap-driven smoke scope and avoid reintroducing hardcoded route lists.
- Keep
SITEMAP_MAX_URLS=5000unless explicit publication-scope expansion is approved. - If publication model changes, define canonical/robots policy first, then implement validation.
- Keep JSON-LD required-field rules in lockstep with template/schema changes.
- Keep semantic graph artifacts and documentation snapshots synchronized when schema contracts change.
- Keep agent/skill usage explicit: use registered skills for CI/smoke/security tasks; add a dedicated SEO semantic skill only if repeated workflows become too custom.
Agent/Skill Coverage
- Sub-agents: not required for the current repository workflow. The existing pipeline + script boundaries are sufficient for deterministic SEO/schema hardening.
- Registered skills: sufficient for current needs (
gh-fix-ci,gh-address-comments,playwright,screenshot, security skills). - Gap to consider later: a dedicated SEO/semantic skill could reduce repetitive analysis steps, but this is optimization work, not a blocker.
Short-Term Backlog
- Transcript Coverage Expansion
- Status: deferred (work in progress).
- Notes: transcript onboarding should continue as new source transcript files are produced; avoid blocking other metadata/SEO work on transcript completeness.
- Metadata Completion at Scale (Active Candidate)
- Status: active candidate.
- Fill missing
video_assetsfields (description,topic) in prioritized batches. - Fill missing
interviewstopicvalues where conference/community context is known. - Keep topic/description conventions consistent with canonical slugs and transcript-derived phrasing.
- SEO Metadata Quality Cleanup
- Status: implemented (2026-02-12 pass), monitor and tune.
- Added global head-level title/description normalization for minimum and maximum lengths.
- Improved generated interview/video metadata copy to avoid thin descriptions and low-information titles.
- Ongoing: monitor
tmp/seo-metadata-report.jsonfor regressions as new content lands.
- Data Model Documentation Alignment
- Status: implemented, keep synchronized.
- Update docs to reflect per-file transcripts in
_data/transcripts/*.ymlas canonical transcript storage. - Clarify
_data/transcripts.ymlis legacy/placeholder and not the active content source. - Keep docs synchronized with generators/templates in the same commit series.
- Structured Data Object Model Expansion
- Status: implemented (first complete pass), monitor and tune.
- Interview schema now encodes richer entity relationships (
Interview,Person,Event,Organization, linkedVideoObject). - Resume schema now enforces
Person+Occupation+ careerItemListconsistency for ATS/search use. - Added semantic graph snapshot docs workflow (
./bin/pipeline semantic-snapshot) for reviewability.
- Ongoing Maintenance
- Status: continuous.
- Continue periodic validator hardening only where reports indicate drift.
- Keep command/documentation grammar aligned to
bin/pipelinesubcommands. - Track retrospective follow-through as first-class process work, not post-hoc cleanup.