Documentation & Issue Review Summary¶
Date: 2025-12-28
Branch: chore/docs-review
Author: Claude (with Sean)
Executive Summary¶
This review analyzes the current state of plans, documentation, GitHub issues, and Notion to identify: - Outdated or stale information - Completed work not yet archived - Items remaining to work on - Prioritized next steps
1. Active Plans Analysis¶
Plans Requiring Action¶
| Plan | Status | Action Needed |
|---|---|---|
2025-12-26-hcp-terraform-migration-design.md |
COMPLETED | Archive - migration complete per commits #436-#449 |
2025-12-26-hcp-terraform-migration-implementation.md |
COMPLETED | Archive - all tasks done, Windmill Terraform removed |
2025-12-26-k8s-oidc-access-design.md |
NOT STARTED | Keep active - issue #398 still open |
2025-12-26-k8s-vault-pki-access-implementation.md |
NOT STARTED | Keep active - blocked on design approval |
2025-12-25-devcontainer-claude-code-design.md |
PARTIALLY COMPLETE | Keep active - issue #360 still open |
2025-12-25-devcontainer-claude-code-implementation.md |
PARTIALLY COMPLETE | Keep active - devcontainer exists but not all features |
Archived Plans (in docs/plans/archive/)¶
36 archived plans properly stored, including: - Authentik IaC migration (complete) - Grafana OIDC (complete) - k3s upgrade system (complete) - Windmill migration (complete) - Teleport deployment design (on hold - issue #300 open) - Router-hosts Ansible role (complete)
Recommendation: Archive the HCP Terraform migration plans to docs/plans/archive/migrations/ since:
1. tf/hcp-terraform/ module exists and is fully populated
2. Commit #449 explicitly removed Windmill Terraform integration
3. docs/hcp-terraform.md documentation is up-to-date
2. GitHub Issues Analysis¶
Closed Issues (Last 30 Days) - 22 total¶
HCP Terraform Migration (COMPLETED): - #397 - Evaluate HCP Terraform Runners ✅ - #150 - Set up alerting for workflows ✅ - #149 - Configure Terraform plugin caching ✅ - #148 - Create Grafana dashboards for Windmill ✅ - #147 - Update documentation ✅ - #146 - Remove unused RBAC resources ✅
k3s Refactoring (COMPLETED): - #320 - Per-node block device config for Longhorn ✅ - #319 - Add kube-vip for k8s API endpoint VIP ✅ - #317 - Create Velero backup before k3sup migration ✅ - #316 - Remove k3sup role after migration ✅ - #315 - Update k3s-playbook.yml to use new roles ✅ - #314 - Add Longhorn prerequisites ✅ - #313 - Create calico Ansible role ✅ - #308-312 - Create k3s-server/agent/common/storage roles ✅
Windmill/Discord (COMPLETED): - #338 - Deploy Grafana MCP server ✅ - #337 - Grafana dashboards not displaying metrics ✅ - #297, #296 - Discord approve/reject fixes ✅
Open Issues - 29 total¶
High Priority (Blocking or Critical)¶
| # | Title | Labels | Notes |
|---|---|---|---|
| 401 | Narrow core-services Vault policy scope | - | Security improvement, quick win |
| 398 | K8s OIDC access via Vault PKI | enhancement, infrastructure | Has approved design, needs implementation |
| 341 | Migrate to ephemeral Vault secrets (Grafana) | enhancement | Waiting on Grafana provider support |
Medium Priority (Feature Work)¶
| # | Title | Labels | Notes |
|---|---|---|---|
| 360 | Enable devcontainers for Claude Code | enhancement | Has design doc, partially implemented |
| 386 | LLM inference on RK1 cluster with k8sgpt | enhancement, infrastructure | Research/exploration |
| 300 | Deploy Teleport zero-trust access | enhancement | Has design doc, on hold |
| ~~298~~ | ~~Windmill: Store TF plan/apply as assets~~ | ~~enhancement, windmill~~ | ✅ Closed - moot after HCP TF migration |
| 245 | Velero Phase 2: Migrate kubectl to Chainguard | enhancement | Incremental improvement |
Low Priority (Router-Hosts Backlog)¶
| # | Title | Notes |
|---|---|---|
| 352 | Router-hosts: certificate hot-reload | |
| 351 | Router-hosts: Add molecule tests | |
| 350 | Router-hosts: Add backup strategy | |
| 349 | Router-hosts: Add monitoring/alerting | |
| 348 | Router-hosts: Troubleshooting docs | |
| 347 | Router-hosts: AppRole/PKI validation preflight | |
| 346 | Router-hosts: Retry logic for Vault ops | |
| 345 | Router-hosts: Switch to community.docker |
~~Low Priority (Windmill Cleanup)~~ - CLOSED¶
| # | Title | Notes |
|---|---|---|
| ~~158~~ | ~~Document Windmill workflow variables~~ | ✅ Closed - Terraform flows removed |
| ~~157~~ | ~~Add TypedDict runtime validation~~ | ✅ Closed - Terraform flows removed |
| ~~156~~ | ~~Replace hardcoded URLs/magic numbers~~ | ✅ Closed - Terraform flows removed |
| ~~155~~ | ~~Add path validation/URL encoding~~ | ✅ Closed - Terraform flows removed |
Other¶
| # | Title | Notes |
|---|---|---|
| 336 | Increase postgres cluster memory | Quick fix |
| 335 | Disable Grafana stats for Alloy | Quick fix |
| 291 | Standardize GitHub Actions versions | Chore |
3. Documentation Status¶
Up-to-Date Docs¶
| Document | Status | Notes |
|---|---|---|
docs/hcp-terraform.md |
✅ Current | Reflects completed migration |
docs/windmill.md |
✅ Current | Updated with deprecation notice |
docs/vault.md |
✅ Current | Core reference |
docs/cloudflare.md |
✅ Current | Includes Worker setup |
docs/kubernetes-access.md |
✅ Current | Vault PKI access fully implemented per #398 |
Missing Documentation¶
| Topic | Notes |
|---|---|
| Devcontainer quick-start | Would improve onboarding experience |
4. Notion Documentation Comparison¶
✅ RESOLVED: Notion API token was rotated and is now working.
The following items were verified against repository reality:
4.1 Services Catalog Verification¶
Cross-reference argocd/app-configs/ against Notion Services Catalog:
| Service | Namespace | Expected in Notion | Verify |
|---|---|---|---|
| HCP Terraform Operator | hcp-terraform |
✅ ADDED | ☑ |
| Windmill | windmill |
Update: Note TF flows removed | ☐ |
| Authentik | authentik |
Should exist | ☐ |
| Vault | vault |
Should exist | ☐ |
| Grafana | grafana |
Should exist | ☐ |
| Grafana MCP | grafana |
NEW - Must add | ☐ |
| CNPG (CloudNative PG) | cnpg-system |
Should exist | ☐ |
| Velero | velero |
Should exist | ☐ |
| Traefik | traefik |
Should exist | ☐ |
| MetalLB | metallb-system |
Should exist | ☐ |
| Cert-Manager | cert-manager |
Should exist | ☐ |
| Cloudflared | cloudflared |
Should exist | ☐ |
| ARC (Actions Runner Controller) | arc-systems |
Should exist | ☐ |
| Mealie | mealie |
Should exist | ☐ |
| Monitoring (Alloy/Loki/Prometheus) | monitoring |
Should exist | ☐ |
HCP Terraform Operator Service Entry (if missing):
| Field | Value |
|---|---|
| Name | HCP Terraform Operator |
| Category | Infrastructure |
| Hostname | N/A (internal operator) |
| Ingress Type | None |
| Auth Method | Token (Vault ExternalSecret) |
| Vault Path | secret/fzymgc-house/cluster/hcp-terraform |
| Namespace | hcp-terraform |
| Status | Operational |
4.2 Tech References Verification¶
| Technology | Category | Expected in Notion | Verify |
|---|---|---|---|
| HCP Terraform | GitOps | ✅ ADDED | ☑ |
| HCP Terraform Operator | Kubernetes | ✅ ADDED | ☑ |
| Windmill | Infrastructure | Update: Note limited to non-TF flows | ☐ |
| Vault Dynamic Credentials | Security | NEW - Must add | ☐ |
| Cloudflare Workers | Infrastructure | Should exist | ☐ |
HCP Terraform Tech Reference Entry (if missing):
| Field | Value |
|---|---|
| Technology | HCP Terraform |
| Category | GitOps |
| Docs URL | https://developer.hashicorp.com/terraform/cloud-docs |
| Version | Free tier (self-hosted agents) |
4.3 HCP Terraform Integration Clarity Checklist¶
Verify the following is clear to a new user reading Notion documentation:
| Question | Answer Location | Clear? |
|---|---|---|
| How do I run Terraform? | docs/hcp-terraform.md |
☐ |
| Where do runs execute? | HCP TF → Agent Pod in K8s | ☐ |
| How does authentication work? | Vault OIDC/JWT dynamic credentials | ☐ |
| What triggers a run? | PR (speculative plan) or merge to main (apply) | ☐ |
| Where do I see run status? | HCP TF console or Discord notifications | ☐ |
| What about Windmill? | Only for non-Terraform automation now | ☐ |
| Which workspaces run locally? | cluster-bootstrap, hcp-terraform (circular deps) |
☐ |
4.4 Operations Guide Updates Needed¶
| Page | Update Needed |
|---|---|
| Terraform Operations | Document HCP TF workflow, link to docs/hcp-terraform.md |
| Secrets Management | Note Vault dynamic credentials for HCP TF |
| Troubleshooting | Add HCP TF agent debugging section |
4.5 Documentation Gaps Identified¶
| Gap | Priority | Notes |
|---|---|---|
| ~~Notion Services Catalog missing HCP TF~~ | ~~High~~ | ✅ RESOLVED - Added HCP Terraform Operator |
| ~~Notion Tech References missing HCP TF~~ | ~~High~~ | ✅ RESOLVED - Added HCP Terraform + Operator |
| Windmill entries need update | Medium | Scope reduced (TF removed) |
| Grafana MCP not documented | Low | Already in Services Catalog (Grafana MCP Server) |
5. Prioritized Recommendations¶
Immediate Actions (This Week)¶
- ~~Archive HCP Terraform plans~~ ✅ DONE
- Moved
2025-12-26-hcp-terraform-migration-design.mdto archive -
Moved
2025-12-26-hcp-terraform-migration-implementation.mdto archive -
~~Close stale Windmill issues~~ ✅ DONE
-
298, #158, #157, #156, #155 closed with explanatory comments¶
-
~~Fix Notion API token~~ ✅ DONE
- Token rotated in Vault at
secret/fzymgc-house/cluster/claude-code -
Notion API now functional
-
~~Update Notion documentation~~ ✅ DONE (partial)
- ✅ Added HCP Terraform Operator to Services Catalog
- ✅ Added HCP Terraform to Tech References
- ✅ Added HCP Terraform Operator to Tech References
- ☐ Update Windmill entries (TF flows removed) - pending
- ✅ Grafana MCP already exists in Services Catalog
Short-Term (Next 2 Weeks)¶
- Implement #401 - Narrow core-services Vault policy scope
- Security improvement
-
Low effort, high value
-
Implement #398 - K8s OIDC access via Vault PKI
- Design is approved
- Implementation plan exists
-
Enables identity-based cluster access
-
Quick fixes - #336 (postgres memory), #335 (Grafana stats)
Medium-Term (Next Month)¶
- Complete #360 - Devcontainer Claude Code setup
-
Devcontainer exists but not all features from design
-
Evaluate #300 - Teleport deployment
-
Decide: implement or close as won't-do
-
Router-hosts backlog - Incremental improvements
- Prioritize #347 (validation preflight) and #346 (retry logic)
6. Summary Metrics¶
| Category | Count | Notes |
|---|---|---|
| Active plans | 4 | K8s OIDC, Vault PKI, Devcontainer (×2) |
| Plans archived this session | 2 | HCP TF migration design + implementation |
| Open GitHub issues | 24 | Reduced from 29 (5 closed) |
| Quick wins remaining | 2 | #336, #335 |
| Major features pending | 3 | #398, #360, #300 |
| Notion entries added | 3 | ✅ HCP TF Operator (Services), HCP TF (Tech Refs), HCP TF Operator (Tech Refs) |
| Notion entries to update | 1 | Windmill (scope reduced) |
7. Next Steps¶
Completed This Session ✅¶
- ~~Archive completed HCP Terraform plans~~
- ~~Close stale Windmill issues (#298, #158, #157, #156, #155)~~
- ~~Rotate Notion API token~~
- ~~Add HCP Terraform entries to Notion~~
- HCP Terraform Operator → Services Catalog
- HCP Terraform → Tech References
- HCP Terraform Operator → Tech References
Immediate Next Actions¶
- Push this branch and create PR
- Branch:
chore/docs-review -
Includes: archived plans, updated review document
-
Update Windmill Notion entries (optional)
- Note that Terraform flows have been removed
Future Work¶
- Implement #401 (Vault policy scope)
- Begin #398 implementation (K8s OIDC access)