Skip to content

Documentation & Issue Review Summary

Date: 2025-12-28 Branch: chore/docs-review Author: Claude (with Sean)

Executive Summary

This review analyzes the current state of plans, documentation, GitHub issues, and Notion to identify: - Outdated or stale information - Completed work not yet archived - Items remaining to work on - Prioritized next steps


1. Active Plans Analysis

Plans Requiring Action

Plan Status Action Needed
2025-12-26-hcp-terraform-migration-design.md COMPLETED Archive - migration complete per commits #436-#449
2025-12-26-hcp-terraform-migration-implementation.md COMPLETED Archive - all tasks done, Windmill Terraform removed
2025-12-26-k8s-oidc-access-design.md NOT STARTED Keep active - issue #398 still open
2025-12-26-k8s-vault-pki-access-implementation.md NOT STARTED Keep active - blocked on design approval
2025-12-25-devcontainer-claude-code-design.md PARTIALLY COMPLETE Keep active - issue #360 still open
2025-12-25-devcontainer-claude-code-implementation.md PARTIALLY COMPLETE Keep active - devcontainer exists but not all features

Archived Plans (in docs/plans/archive/)

36 archived plans properly stored, including: - Authentik IaC migration (complete) - Grafana OIDC (complete) - k3s upgrade system (complete) - Windmill migration (complete) - Teleport deployment design (on hold - issue #300 open) - Router-hosts Ansible role (complete)

Recommendation: Archive the HCP Terraform migration plans to docs/plans/archive/migrations/ since: 1. tf/hcp-terraform/ module exists and is fully populated 2. Commit #449 explicitly removed Windmill Terraform integration 3. docs/hcp-terraform.md documentation is up-to-date


2. GitHub Issues Analysis

Closed Issues (Last 30 Days) - 22 total

HCP Terraform Migration (COMPLETED): - #397 - Evaluate HCP Terraform Runners ✅ - #150 - Set up alerting for workflows ✅ - #149 - Configure Terraform plugin caching ✅ - #148 - Create Grafana dashboards for Windmill ✅ - #147 - Update documentation ✅ - #146 - Remove unused RBAC resources ✅

k3s Refactoring (COMPLETED): - #320 - Per-node block device config for Longhorn ✅ - #319 - Add kube-vip for k8s API endpoint VIP ✅ - #317 - Create Velero backup before k3sup migration ✅ - #316 - Remove k3sup role after migration ✅ - #315 - Update k3s-playbook.yml to use new roles ✅ - #314 - Add Longhorn prerequisites ✅ - #313 - Create calico Ansible role ✅ - #308-312 - Create k3s-server/agent/common/storage roles ✅

Windmill/Discord (COMPLETED): - #338 - Deploy Grafana MCP server ✅ - #337 - Grafana dashboards not displaying metrics ✅ - #297, #296 - Discord approve/reject fixes ✅

Open Issues - 29 total

High Priority (Blocking or Critical)

# Title Labels Notes
401 Narrow core-services Vault policy scope - Security improvement, quick win
398 K8s OIDC access via Vault PKI enhancement, infrastructure Has approved design, needs implementation
341 Migrate to ephemeral Vault secrets (Grafana) enhancement Waiting on Grafana provider support

Medium Priority (Feature Work)

# Title Labels Notes
360 Enable devcontainers for Claude Code enhancement Has design doc, partially implemented
386 LLM inference on RK1 cluster with k8sgpt enhancement, infrastructure Research/exploration
300 Deploy Teleport zero-trust access enhancement Has design doc, on hold
~~298~~ ~~Windmill: Store TF plan/apply as assets~~ ~~enhancement, windmill~~ ✅ Closed - moot after HCP TF migration
245 Velero Phase 2: Migrate kubectl to Chainguard enhancement Incremental improvement

Low Priority (Router-Hosts Backlog)

# Title Notes
352 Router-hosts: certificate hot-reload
351 Router-hosts: Add molecule tests
350 Router-hosts: Add backup strategy
349 Router-hosts: Add monitoring/alerting
348 Router-hosts: Troubleshooting docs
347 Router-hosts: AppRole/PKI validation preflight
346 Router-hosts: Retry logic for Vault ops
345 Router-hosts: Switch to community.docker

~~Low Priority (Windmill Cleanup)~~ - CLOSED

# Title Notes
~~158~~ ~~Document Windmill workflow variables~~ ✅ Closed - Terraform flows removed
~~157~~ ~~Add TypedDict runtime validation~~ ✅ Closed - Terraform flows removed
~~156~~ ~~Replace hardcoded URLs/magic numbers~~ ✅ Closed - Terraform flows removed
~~155~~ ~~Add path validation/URL encoding~~ ✅ Closed - Terraform flows removed

Other

# Title Notes
336 Increase postgres cluster memory Quick fix
335 Disable Grafana stats for Alloy Quick fix
291 Standardize GitHub Actions versions Chore

3. Documentation Status

Up-to-Date Docs

Document Status Notes
docs/hcp-terraform.md ✅ Current Reflects completed migration
docs/windmill.md ✅ Current Updated with deprecation notice
docs/vault.md ✅ Current Core reference
docs/cloudflare.md ✅ Current Includes Worker setup
docs/kubernetes-access.md ✅ Current Vault PKI access fully implemented per #398

Missing Documentation

Topic Notes
Devcontainer quick-start Would improve onboarding experience

4. Notion Documentation Comparison

✅ RESOLVED: Notion API token was rotated and is now working.

The following items were verified against repository reality:

4.1 Services Catalog Verification

Cross-reference argocd/app-configs/ against Notion Services Catalog:

Service Namespace Expected in Notion Verify
HCP Terraform Operator hcp-terraform ADDED
Windmill windmill Update: Note TF flows removed
Authentik authentik Should exist
Vault vault Should exist
Grafana grafana Should exist
Grafana MCP grafana NEW - Must add
CNPG (CloudNative PG) cnpg-system Should exist
Velero velero Should exist
Traefik traefik Should exist
MetalLB metallb-system Should exist
Cert-Manager cert-manager Should exist
Cloudflared cloudflared Should exist
ARC (Actions Runner Controller) arc-systems Should exist
Mealie mealie Should exist
Monitoring (Alloy/Loki/Prometheus) monitoring Should exist

HCP Terraform Operator Service Entry (if missing):

Field Value
Name HCP Terraform Operator
Category Infrastructure
Hostname N/A (internal operator)
Ingress Type None
Auth Method Token (Vault ExternalSecret)
Vault Path secret/fzymgc-house/cluster/hcp-terraform
Namespace hcp-terraform
Status Operational

4.2 Tech References Verification

Technology Category Expected in Notion Verify
HCP Terraform GitOps ADDED
HCP Terraform Operator Kubernetes ADDED
Windmill Infrastructure Update: Note limited to non-TF flows
Vault Dynamic Credentials Security NEW - Must add
Cloudflare Workers Infrastructure Should exist

HCP Terraform Tech Reference Entry (if missing):

Field Value
Technology HCP Terraform
Category GitOps
Docs URL https://developer.hashicorp.com/terraform/cloud-docs
Version Free tier (self-hosted agents)

4.3 HCP Terraform Integration Clarity Checklist

Verify the following is clear to a new user reading Notion documentation:

Question Answer Location Clear?
How do I run Terraform? docs/hcp-terraform.md
Where do runs execute? HCP TF → Agent Pod in K8s
How does authentication work? Vault OIDC/JWT dynamic credentials
What triggers a run? PR (speculative plan) or merge to main (apply)
Where do I see run status? HCP TF console or Discord notifications
What about Windmill? Only for non-Terraform automation now
Which workspaces run locally? cluster-bootstrap, hcp-terraform (circular deps)

4.4 Operations Guide Updates Needed

Page Update Needed
Terraform Operations Document HCP TF workflow, link to docs/hcp-terraform.md
Secrets Management Note Vault dynamic credentials for HCP TF
Troubleshooting Add HCP TF agent debugging section

4.5 Documentation Gaps Identified

Gap Priority Notes
~~Notion Services Catalog missing HCP TF~~ ~~High~~ RESOLVED - Added HCP Terraform Operator
~~Notion Tech References missing HCP TF~~ ~~High~~ RESOLVED - Added HCP Terraform + Operator
Windmill entries need update Medium Scope reduced (TF removed)
Grafana MCP not documented Low Already in Services Catalog (Grafana MCP Server)

5. Prioritized Recommendations

Immediate Actions (This Week)

  1. ~~Archive HCP Terraform plans~~ ✅ DONE
  2. Moved 2025-12-26-hcp-terraform-migration-design.md to archive
  3. Moved 2025-12-26-hcp-terraform-migration-implementation.md to archive

  4. ~~Close stale Windmill issues~~ ✅ DONE

  5. 298, #158, #157, #156, #155 closed with explanatory comments

  6. ~~Fix Notion API token~~ ✅ DONE

  7. Token rotated in Vault at secret/fzymgc-house/cluster/claude-code
  8. Notion API now functional

  9. ~~Update Notion documentation~~ ✅ DONE (partial)

  10. ✅ Added HCP Terraform Operator to Services Catalog
  11. ✅ Added HCP Terraform to Tech References
  12. ✅ Added HCP Terraform Operator to Tech References
  13. ☐ Update Windmill entries (TF flows removed) - pending
  14. ✅ Grafana MCP already exists in Services Catalog

Short-Term (Next 2 Weeks)

  1. Implement #401 - Narrow core-services Vault policy scope
  2. Security improvement
  3. Low effort, high value

  4. Implement #398 - K8s OIDC access via Vault PKI

  5. Design is approved
  6. Implementation plan exists
  7. Enables identity-based cluster access

  8. Quick fixes - #336 (postgres memory), #335 (Grafana stats)

Medium-Term (Next Month)

  1. Complete #360 - Devcontainer Claude Code setup
  2. Devcontainer exists but not all features from design

  3. Evaluate #300 - Teleport deployment

  4. Decide: implement or close as won't-do

  5. Router-hosts backlog - Incremental improvements

    • Prioritize #347 (validation preflight) and #346 (retry logic)

6. Summary Metrics

Category Count Notes
Active plans 4 K8s OIDC, Vault PKI, Devcontainer (×2)
Plans archived this session 2 HCP TF migration design + implementation
Open GitHub issues 24 Reduced from 29 (5 closed)
Quick wins remaining 2 #336, #335
Major features pending 3 #398, #360, #300
Notion entries added 3 ✅ HCP TF Operator (Services), HCP TF (Tech Refs), HCP TF Operator (Tech Refs)
Notion entries to update 1 Windmill (scope reduced)

7. Next Steps

Completed This Session ✅

  1. ~~Archive completed HCP Terraform plans~~
  2. ~~Close stale Windmill issues (#298, #158, #157, #156, #155)~~
  3. ~~Rotate Notion API token~~
  4. ~~Add HCP Terraform entries to Notion~~
  5. HCP Terraform Operator → Services Catalog
  6. HCP Terraform → Tech References
  7. HCP Terraform Operator → Tech References

Immediate Next Actions

  1. Push this branch and create PR
  2. Branch: chore/docs-review
  3. Includes: archived plans, updated review document

  4. Update Windmill Notion entries (optional)

  5. Note that Terraform flows have been removed

Future Work

  1. Implement #401 (Vault policy scope)
  2. Begin #398 implementation (K8s OIDC access)