Router Hosts Operations¶
Operational guide for the router-hosts DNS management system in the fzymgc-house cluster.
Quick Reference¶
| Property | Value |
|---|---|
| Server Address | 192.168.20.1:50051 |
| Auth Method | mTLS (Vault PKI) |
| PKI Role | router-hosts-client |
| Terraform Module | tf/router-hosts/ |
| K8s Operator | router-hosts-operator namespace |
Architecture¶
| Layer | Manages | Location |
|---|---|---|
| Terraform | Infrastructure hosts (nodes, NAS, VIPs) | tf/router-hosts/hosts.tf |
| K8s Operator | Service hosts via IngressRoute annotations | argocd/app-configs/*/ingress*.yaml |
| CLI | Manual operations and verification | Local workstation |
Router Boot Hooks and Backups¶
The Firewalla router runs startup hooks from ~/.firewalla/config/post_main.d.
Key hooks managed by Ansible include:
0000-a-remount-root-for-more-space.sh0000-mount-extdata.sh0050-start-docker.sh0100-install-docker-compose-v2.sh0125-start-alloy.sh0150-start-tailscale.sh0151-start-router-hosts.sh0200-start-vault-unseal.sh
Kopia backups run via the kopia-backup.service and kopia-backup.timer units.
IngressRoute DNS Sync (Kubernetes)¶
The router-hosts-operator watches Traefik IngressRoute and IngressRouteTCP resources with the opt-in annotation and automatically creates DNS entries pointing to the Traefik LoadBalancer IP.
Enabling DNS Sync for an IngressRoute¶
Add the router-hosts.fzymgc.house/enabled annotation to your IngressRoute:
apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
name: myservice
namespace: myservice-namespace
annotations:
router-hosts.fzymgc.house/enabled: "true" # Opt-in to DNS sync
spec:
entryPoints:
- websecure
routes:
- match: Host(`myservice.fzymgc.house`)
kind: Rule
services:
- name: myservice
port: 8080
tls:
secretName: myservice-tls
The operator will:
- Extract hostnames from
Host()matchers in the routes - Resolve the target IP from the Traefik service's LoadBalancer status (
192.168.20.145) - Create DNS entries on the Firewalla router via gRPC
IngressRouteTCP Support¶
TCP routes using HostSNI() matching are also supported:
apiVersion: traefik.io/v1alpha1
kind: IngressRouteTCP
metadata:
name: mydb
namespace: postgres
annotations:
router-hosts.fzymgc.house/enabled: "true"
spec:
entryPoints:
- postgres
routes:
- match: HostSNI(`mydb.fzymgc.house`)
services:
- name: mydb
port: 5432
tls:
passthrough: true
Why Opt-in?¶
The operator uses an explicit opt-in pattern because:
- Safety: DNS entries affect external systems (Firewalla router)
- Control: Not all IngressRoutes need public DNS (e.g., cluster-internal services)
- Auditability: Easy to grep for which services have DNS sync enabled
Operator Authentication¶
The operator authenticates to the gRPC server using mTLS certificates from Vault PKI:
# VaultDynamicSecret issues certificates automatically
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
name: router-hosts-client-cert
spec:
mount: fzymgc-house/v1/ica1/v1
path: issue/router-hosts-client
destination:
name: router-hosts-client-cert
create: true
renewalPercent: 67
Verifying DNS Sync¶
# List all synced hosts
router-hosts host list
# Check operator logs
kubectl logs -n router-hosts-operator -l app.kubernetes.io/name=router-hosts-operator
# Verify DNS resolution
dig @192.168.20.1 myservice.fzymgc.house +short
CLI Installation¶
Automated (Recommended)¶
This installs the CLI via Homebrew, generates a client certificate from Vault PKI,
and writes the config to ~/.config/router-hosts/. Safe to re-run (only renews
certs expiring within 24 hours). Force renewal with -e force_cert_renewal=true.
Manual¶
Prerequisites: jq and vault CLI must be installed and Vault authenticated.
Certificate Setup¶
Bash/Zsh¶
# Create config directory
mkdir -p ~/.config/router-hosts
# Request certificate from Vault (single command to get all parts)
CERT_DATA=$(vault write -format=json fzymgc-house/v1/ica1/v1/issue/router-hosts-client \
common_name="cli-$(whoami)" ttl=720h)
echo "$CERT_DATA" | jq -r '.data.certificate' > ~/.config/router-hosts/tls.crt
echo "$CERT_DATA" | jq -r '.data.private_key' > ~/.config/router-hosts/tls.key
echo "$CERT_DATA" | jq -r '.data.issuing_ca' > ~/.config/router-hosts/ca.crt
# Create config file (uses $HOME for absolute paths)
cat > ~/.config/router-hosts/config.toml << EOF
[server]
address = "192.168.20.1:50051"
[tls]
cert_path = "$HOME/.config/router-hosts/tls.crt"
key_path = "$HOME/.config/router-hosts/tls.key"
ca_cert_path = "$HOME/.config/router-hosts/ca.crt"
EOF
Fish¶
# Create config directory
mkdir -p ~/.config/router-hosts
# Request certificate from Vault
set CERT_DATA (vault write -format=json fzymgc-house/v1/ica1/v1/issue/router-hosts-client \
common_name="cli-"(whoami) ttl=720h)
echo $CERT_DATA | jq -r '.data.certificate' > ~/.config/router-hosts/tls.crt
echo $CERT_DATA | jq -r '.data.private_key' > ~/.config/router-hosts/tls.key
echo $CERT_DATA | jq -r '.data.issuing_ca' > ~/.config/router-hosts/ca.crt
# Create config file (uses $HOME for absolute paths)
printf '[server]
address = "192.168.20.1:50051"
[tls]
cert_path = "%s/.config/router-hosts/tls.crt"
key_path = "%s/.config/router-hosts/tls.key"
ca_cert_path = "%s/.config/router-hosts/ca.crt"
' $HOME $HOME $HOME > ~/.config/router-hosts/config.toml
Common Operations¶
List Hosts¶
# List all hosts
router-hosts host list
# Filter by tag
router-hosts host list --tag terraform
router-hosts host list --tag kubernetes
Get Host Details¶
Add Host (CLI only, prefer Terraform/annotation)¶
Delete Host¶
Verification¶
After Deployment¶
# Verify gRPC server is running
grpcurl 192.168.20.1:50051 list
# List all managed hosts
router-hosts host list
# Test DNS resolution for infrastructure hosts
dig @192.168.20.1 tpi-alpha-1.fzymgc.house +short
dig @192.168.20.1 vault.fzymgc.house +short
# Verify hosts file on router
ssh pi@192.168.20.1 "cat /extdata/router-hosts/hosts/main"
Expected Outputs¶
| Check | Expected Result |
|---|---|
grpcurl list |
Shows router_hosts.v1.HostsService |
router-hosts host list |
Lists all configured hosts with IPs |
dig queries |
Returns configured IP addresses |
| Hosts file | Contains entries in /etc/hosts format |
Troubleshooting¶
Pre-Deployment Validation Errors¶
The Ansible playbook runs preflight checks before deployment. These validate Vault configuration.
AppRole Does Not Exist¶
TASK [router_hosts : Preflight - Validate Vault AppRole exists] ***************
fatal: [fw-gold]: FAILED! => {"msg": "Vault AppRole 'router-hosts-agent' does not exist"}
Cause: AppRole not created in Vault.
Resolution:
# Verify AppRole exists
vault read auth/approle/role/router-hosts-agent/role-id
# If missing, run Terraform
cd tf/vault && terraform apply -target=vault_approle_auth_backend_role.router_hosts_agent
PKI Role Does Not Exist¶
TASK [router_hosts : Preflight - Validate Vault PKI role exists] **************
fatal: [fw-gold]: FAILED! => {"msg": "Vault PKI role 'router-hosts-server' does not exist at path 'fzymgc-house/v1/ica1/v1'"}
Cause: PKI role not created in Vault.
Resolution:
# Verify PKI role exists
vault read fzymgc-house/v1/ica1/v1/roles/router-hosts-server
# If missing, run Terraform
cd tf/vault && terraform apply -target=vault_pki_secret_backend_role.router_hosts_server
VAULT_TOKEN Not Set¶
TASK [router_hosts : Preflight - Verify VAULT_TOKEN is set] *******************
fatal: [fw-gold]: FAILED! => {"msg": "VAULT_TOKEN environment variable not set"}
Cause: Ansible runs preflight checks on localhost, which needs Vault auth.
Resolution:
Permission Denied Reading Vault¶
TASK [router_hosts : Preflight - Validate Vault AppRole exists] ***************
fatal: [fw-gold]: FAILED! => {"msg": "Permission denied reading Vault AppRole"}
Cause: Your Vault token lacks read permission on auth/approle/role/*.
Resolution: Use a token with appropriate permissions or authenticate as an admin.
Server-Side Issues¶
Vault Agent Certificate Timeout¶
TASK [router_hosts : Wait for Vault Agent to generate certificates] ***********
fatal: [fw-gold]: FAILED! => Timeout waiting for file /extdata/router-hosts/certs/bundle.pem
Cause: Vault Agent cannot authenticate or issue certificates.
Debug:
# SSH to router
ssh pi@192.168.20.1
# Check Vault Agent logs
docker logs router-hosts-vault-agent --tail=50
# Verify AppRole credentials
ls -la /extdata/router-hosts/vault-approle/
cat /extdata/router-hosts/vault-approle/role_id
# Test Vault connectivity from container
docker exec router-hosts-vault-agent wget -qO- https://vault.fzymgc.house:8200/v1/sys/health
Common causes: - AppRole secret_id was rotated but not updated on router - Vault unreachable from Docker container (DNS issue) - PKI backend returning errors
Resolution:
# Force new secret_id generation
ansible-playbook -i inventory/hosts.yml router-hosts-playbook.yml \
-e router_hosts_force_new_secret_id=true
Container Won't Start¶
TASK [router_hosts : Wait for router-hosts container to be running] ***********
fatal: [fw-gold]: FAILED! => Timeout waiting for container
Cause: Container depends on Vault Agent being healthy first.
Debug:
ssh pi@192.168.20.1
# Check all container states
docker ps -a --filter "name=router-hosts"
# Check Vault Agent health
docker inspect router-hosts-vault-agent --format='{{.State.Health.Status}}'
# View container logs
docker logs router-hosts-vault-agent
docker logs router-hosts-server
gRPC Port Not Available¶
TASK [router_hosts : Wait for gRPC port to be available] **********************
fatal: [fw-gold]: FAILED! => Timeout waiting for 192.168.20.1:50051
Cause: Server container not running or not binding correctly.
Debug:
ssh pi@192.168.20.1
# Check if container is running
docker ps -f "name=router-hosts-server"
# Check server logs
docker logs router-hosts-server --tail=50
# Verify bind address in config
cat /extdata/router-hosts/config/server.toml | grep bind
Client-Side Issues¶
Connection Refused¶
# Test gRPC connectivity
grpcurl -cacert ~/.config/router-hosts/ca.crt \
-cert ~/.config/router-hosts/tls.crt \
-key ~/.config/router-hosts/tls.key \
192.168.20.1:50051 list
# Check certificate validity
openssl x509 -in ~/.config/router-hosts/tls.crt -text -noout | grep -A2 Validity
Certificate Expired¶
# Re-issue certificate (same commands as setup)
CERT_DATA=$(vault write -format=json fzymgc-house/v1/ica1/v1/issue/router-hosts-client \
common_name="cli-$(whoami)" ttl=720h)
echo "$CERT_DATA" | jq -r '.data.certificate' > ~/.config/router-hosts/tls.crt
echo "$CERT_DATA" | jq -r '.data.private_key' > ~/.config/router-hosts/tls.key
echo "$CERT_DATA" | jq -r '.data.issuing_ca' > ~/.config/router-hosts/ca.crt
DNS Issues¶
DNS Not Resolving¶
# Verify host exists in router-hosts
router-hosts host get <hostname>
# Check hosts file on router
ssh pi@192.168.20.1 "cat /extdata/router-hosts/hosts/main"
# Check dnsmasq is reading hosts file
ssh pi@192.168.20.1 "cat /data/router_hosts/hosts"
# Force dnsmasq reload
ssh pi@192.168.20.1 "killall -HUP firerouter_dns"
Note: Firewalla uses firerouter_dns, not standard dnsmasq.
IngressRoute Not Syncing¶
# Verify annotation is present
kubectl get ingressroute -A -o json | jq '.items[] | select(.metadata.annotations["router-hosts.fzymgc.house/enabled"] == "true") | .metadata.name'
# Check operator logs for errors
kubectl logs -n router-hosts-operator -l app.kubernetes.io/name=router-hosts-operator --tail=50
# Verify RouterHostsConfig exists
kubectl get routerhostsconfig -n router-hosts-operator
Key Files Reference¶
| File | Purpose |
|---|---|
/extdata/router-hosts/config/vault-agent.hcl |
Vault Agent configuration |
/extdata/router-hosts/config/server.toml |
gRPC server configuration |
/extdata/router-hosts/certs/ |
Certificates (server.crt, server.key, ca.crt) |
/extdata/router-hosts/vault-approle/ |
AppRole credentials (mode 0700) |
/extdata/router-hosts/hosts/main |
Generated hosts file |
/home/pi/.firewalla/run/docker/router-hosts/docker-compose.yml |
Container definition |
Log Locations¶
# Vault Agent logs
docker logs router-hosts-vault-agent
# gRPC server logs
docker logs router-hosts-server
# Systemd service logs
journalctl -u docker-compose@router-hosts -n 50
Rollback¶
Remove Individual Host Entry¶
# Via CLI
router-hosts host delete <hostname>
# Via annotation removal (remove annotation from IngressRoute, ArgoCD syncs)
# Or via Terraform (remove from hosts.tf, then apply)
terraform apply
Full Rollback to Manual Hosts¶
If router-hosts needs to be completely disabled:
-
Stop the gRPC server:
-
Restore manual dnsmasq config:
-
Disable Terraform workspace (in HCP Terraform UI):
- Navigate to
main-cluster-router-hostsworkspace -
Settings → Destruction and Deletion → Queue destroy plan
-
Disable K8s operator:
Recovery After Rollback¶
To re-enable router-hosts after a rollback:
- Run Ansible playbook:
ansible-playbook -i inventory/hosts.yml router-hosts-playbook.yml - Verify gRPC server:
grpcurl 192.168.20.1:50051 list - Apply Terraform:
terraform -chdir=tf/router-hosts apply
Related Resources¶
| Resource | Location |
|---|---|
| PKI Role Definition | tf/vault/pki-router-hosts.tf |
| Vault Policy | tf/vault/policy-external-secrets-operator.tf |
| K8s Operator Helm | argocd/cluster-app/templates/router-hosts-operator.yaml |
| Design Document | docs/plans/2025-12-30-router-hosts-integration-design.md |