Skip to content

Router Hosts Operations

Operational guide for the router-hosts DNS management system in the fzymgc-house cluster.

Quick Reference

Property Value
Server Address 192.168.20.1:50051
Auth Method mTLS (Vault PKI)
PKI Role router-hosts-client
Terraform Module tf/router-hosts/
K8s Operator router-hosts-operator namespace

Architecture

Layer Manages Location
Terraform Infrastructure hosts (nodes, NAS, VIPs) tf/router-hosts/hosts.tf
K8s Operator Service hosts via IngressRoute annotations argocd/app-configs/*/ingress*.yaml
CLI Manual operations and verification Local workstation

Router Boot Hooks and Backups

The Firewalla router runs startup hooks from ~/.firewalla/config/post_main.d. Key hooks managed by Ansible include:

  • 0000-a-remount-root-for-more-space.sh
  • 0000-mount-extdata.sh
  • 0050-start-docker.sh
  • 0100-install-docker-compose-v2.sh
  • 0125-start-alloy.sh
  • 0150-start-tailscale.sh
  • 0151-start-router-hosts.sh
  • 0200-start-vault-unseal.sh

Kopia backups run via the kopia-backup.service and kopia-backup.timer units.

IngressRoute DNS Sync (Kubernetes)

The router-hosts-operator watches Traefik IngressRoute and IngressRouteTCP resources with the opt-in annotation and automatically creates DNS entries pointing to the Traefik LoadBalancer IP.

Enabling DNS Sync for an IngressRoute

Add the router-hosts.fzymgc.house/enabled annotation to your IngressRoute:

apiVersion: traefik.io/v1alpha1
kind: IngressRoute
metadata:
  name: myservice
  namespace: myservice-namespace
  annotations:
    router-hosts.fzymgc.house/enabled: "true"  # Opt-in to DNS sync
spec:
  entryPoints:
    - websecure
  routes:
    - match: Host(`myservice.fzymgc.house`)
      kind: Rule
      services:
        - name: myservice
          port: 8080
  tls:
    secretName: myservice-tls

The operator will:

  1. Extract hostnames from Host() matchers in the routes
  2. Resolve the target IP from the Traefik service's LoadBalancer status (192.168.20.145)
  3. Create DNS entries on the Firewalla router via gRPC

IngressRouteTCP Support

TCP routes using HostSNI() matching are also supported:

apiVersion: traefik.io/v1alpha1
kind: IngressRouteTCP
metadata:
  name: mydb
  namespace: postgres
  annotations:
    router-hosts.fzymgc.house/enabled: "true"
spec:
  entryPoints:
    - postgres
  routes:
    - match: HostSNI(`mydb.fzymgc.house`)
      services:
        - name: mydb
          port: 5432
  tls:
    passthrough: true

Why Opt-in?

The operator uses an explicit opt-in pattern because:

  • Safety: DNS entries affect external systems (Firewalla router)
  • Control: Not all IngressRoutes need public DNS (e.g., cluster-internal services)
  • Auditability: Easy to grep for which services have DNS sync enabled

Operator Authentication

The operator authenticates to the gRPC server using mTLS certificates from Vault PKI:

# VaultDynamicSecret issues certificates automatically
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultDynamicSecret
metadata:
  name: router-hosts-client-cert
spec:
  mount: fzymgc-house/v1/ica1/v1
  path: issue/router-hosts-client
  destination:
    name: router-hosts-client-cert
    create: true
  renewalPercent: 67

Verifying DNS Sync

# List all synced hosts
router-hosts host list

# Check operator logs
kubectl logs -n router-hosts-operator -l app.kubernetes.io/name=router-hosts-operator

# Verify DNS resolution
dig @192.168.20.1 myservice.fzymgc.house +short

CLI Installation

ansible-playbook ansible/router-hosts-client-setup.yml

This installs the CLI via Homebrew, generates a client certificate from Vault PKI, and writes the config to ~/.config/router-hosts/. Safe to re-run (only renews certs expiring within 24 hours). Force renewal with -e force_cert_renewal=true.

Manual

brew install fzymgc-house/tap/router-hosts

Prerequisites: jq and vault CLI must be installed and Vault authenticated.

Certificate Setup

Bash/Zsh

# Create config directory
mkdir -p ~/.config/router-hosts

# Request certificate from Vault (single command to get all parts)
CERT_DATA=$(vault write -format=json fzymgc-house/v1/ica1/v1/issue/router-hosts-client \
  common_name="cli-$(whoami)" ttl=720h)

echo "$CERT_DATA" | jq -r '.data.certificate' > ~/.config/router-hosts/tls.crt
echo "$CERT_DATA" | jq -r '.data.private_key' > ~/.config/router-hosts/tls.key
echo "$CERT_DATA" | jq -r '.data.issuing_ca' > ~/.config/router-hosts/ca.crt

# Create config file (uses $HOME for absolute paths)
cat > ~/.config/router-hosts/config.toml << EOF
[server]
address = "192.168.20.1:50051"

[tls]
cert_path = "$HOME/.config/router-hosts/tls.crt"
key_path = "$HOME/.config/router-hosts/tls.key"
ca_cert_path = "$HOME/.config/router-hosts/ca.crt"
EOF

Fish

# Create config directory
mkdir -p ~/.config/router-hosts

# Request certificate from Vault
set CERT_DATA (vault write -format=json fzymgc-house/v1/ica1/v1/issue/router-hosts-client \
  common_name="cli-"(whoami) ttl=720h)

echo $CERT_DATA | jq -r '.data.certificate' > ~/.config/router-hosts/tls.crt
echo $CERT_DATA | jq -r '.data.private_key' > ~/.config/router-hosts/tls.key
echo $CERT_DATA | jq -r '.data.issuing_ca' > ~/.config/router-hosts/ca.crt

# Create config file (uses $HOME for absolute paths)
printf '[server]
address = "192.168.20.1:50051"

[tls]
cert_path = "%s/.config/router-hosts/tls.crt"
key_path = "%s/.config/router-hosts/tls.key"
ca_cert_path = "%s/.config/router-hosts/ca.crt"
' $HOME $HOME $HOME > ~/.config/router-hosts/config.toml

Common Operations

List Hosts

# List all hosts
router-hosts host list

# Filter by tag
router-hosts host list --tag terraform
router-hosts host list --tag kubernetes

Get Host Details

router-hosts host get vault
router-hosts host get tpi-alpha-1

Add Host (CLI only, prefer Terraform/annotation)

router-hosts host add test-entry 192.168.20.99 --comment "Test entry"

Delete Host

router-hosts host delete test-entry

Verification

After Deployment

# Verify gRPC server is running
grpcurl 192.168.20.1:50051 list

# List all managed hosts
router-hosts host list

# Test DNS resolution for infrastructure hosts
dig @192.168.20.1 tpi-alpha-1.fzymgc.house +short
dig @192.168.20.1 vault.fzymgc.house +short

# Verify hosts file on router
ssh pi@192.168.20.1 "cat /extdata/router-hosts/hosts/main"

Expected Outputs

Check Expected Result
grpcurl list Shows router_hosts.v1.HostsService
router-hosts host list Lists all configured hosts with IPs
dig queries Returns configured IP addresses
Hosts file Contains entries in /etc/hosts format

Troubleshooting

Pre-Deployment Validation Errors

The Ansible playbook runs preflight checks before deployment. These validate Vault configuration.

AppRole Does Not Exist

TASK [router_hosts : Preflight - Validate Vault AppRole exists] ***************
fatal: [fw-gold]: FAILED! => {"msg": "Vault AppRole 'router-hosts-agent' does not exist"}

Cause: AppRole not created in Vault.

Resolution:

# Verify AppRole exists
vault read auth/approle/role/router-hosts-agent/role-id

# If missing, run Terraform
cd tf/vault && terraform apply -target=vault_approle_auth_backend_role.router_hosts_agent

PKI Role Does Not Exist

TASK [router_hosts : Preflight - Validate Vault PKI role exists] **************
fatal: [fw-gold]: FAILED! => {"msg": "Vault PKI role 'router-hosts-server' does not exist at path 'fzymgc-house/v1/ica1/v1'"}

Cause: PKI role not created in Vault.

Resolution:

# Verify PKI role exists
vault read fzymgc-house/v1/ica1/v1/roles/router-hosts-server

# If missing, run Terraform
cd tf/vault && terraform apply -target=vault_pki_secret_backend_role.router_hosts_server

VAULT_TOKEN Not Set

TASK [router_hosts : Preflight - Verify VAULT_TOKEN is set] *******************
fatal: [fw-gold]: FAILED! => {"msg": "VAULT_TOKEN environment variable not set"}

Cause: Ansible runs preflight checks on localhost, which needs Vault auth.

Resolution:

export VAULT_TOKEN=$(vault print token)
# Or
vault login

Permission Denied Reading Vault

TASK [router_hosts : Preflight - Validate Vault AppRole exists] ***************
fatal: [fw-gold]: FAILED! => {"msg": "Permission denied reading Vault AppRole"}

Cause: Your Vault token lacks read permission on auth/approle/role/*.

Resolution: Use a token with appropriate permissions or authenticate as an admin.


Server-Side Issues

Vault Agent Certificate Timeout

TASK [router_hosts : Wait for Vault Agent to generate certificates] ***********
fatal: [fw-gold]: FAILED! => Timeout waiting for file /extdata/router-hosts/certs/bundle.pem

Cause: Vault Agent cannot authenticate or issue certificates.

Debug:

# SSH to router
ssh pi@192.168.20.1

# Check Vault Agent logs
docker logs router-hosts-vault-agent --tail=50

# Verify AppRole credentials
ls -la /extdata/router-hosts/vault-approle/
cat /extdata/router-hosts/vault-approle/role_id

# Test Vault connectivity from container
docker exec router-hosts-vault-agent wget -qO- https://vault.fzymgc.house:8200/v1/sys/health

Common causes: - AppRole secret_id was rotated but not updated on router - Vault unreachable from Docker container (DNS issue) - PKI backend returning errors

Resolution:

# Force new secret_id generation
ansible-playbook -i inventory/hosts.yml router-hosts-playbook.yml \
  -e router_hosts_force_new_secret_id=true

Container Won't Start

TASK [router_hosts : Wait for router-hosts container to be running] ***********
fatal: [fw-gold]: FAILED! => Timeout waiting for container

Cause: Container depends on Vault Agent being healthy first.

Debug:

ssh pi@192.168.20.1

# Check all container states
docker ps -a --filter "name=router-hosts"

# Check Vault Agent health
docker inspect router-hosts-vault-agent --format='{{.State.Health.Status}}'

# View container logs
docker logs router-hosts-vault-agent
docker logs router-hosts-server

gRPC Port Not Available

TASK [router_hosts : Wait for gRPC port to be available] **********************
fatal: [fw-gold]: FAILED! => Timeout waiting for 192.168.20.1:50051

Cause: Server container not running or not binding correctly.

Debug:

ssh pi@192.168.20.1

# Check if container is running
docker ps -f "name=router-hosts-server"

# Check server logs
docker logs router-hosts-server --tail=50

# Verify bind address in config
cat /extdata/router-hosts/config/server.toml | grep bind


Client-Side Issues

Connection Refused

# Test gRPC connectivity
grpcurl -cacert ~/.config/router-hosts/ca.crt \
  -cert ~/.config/router-hosts/tls.crt \
  -key ~/.config/router-hosts/tls.key \
  192.168.20.1:50051 list

# Check certificate validity
openssl x509 -in ~/.config/router-hosts/tls.crt -text -noout | grep -A2 Validity

Certificate Expired

# Re-issue certificate (same commands as setup)
CERT_DATA=$(vault write -format=json fzymgc-house/v1/ica1/v1/issue/router-hosts-client \
  common_name="cli-$(whoami)" ttl=720h)
echo "$CERT_DATA" | jq -r '.data.certificate' > ~/.config/router-hosts/tls.crt
echo "$CERT_DATA" | jq -r '.data.private_key' > ~/.config/router-hosts/tls.key
echo "$CERT_DATA" | jq -r '.data.issuing_ca' > ~/.config/router-hosts/ca.crt

DNS Issues

DNS Not Resolving

# Verify host exists in router-hosts
router-hosts host get <hostname>

# Check hosts file on router
ssh pi@192.168.20.1 "cat /extdata/router-hosts/hosts/main"

# Check dnsmasq is reading hosts file
ssh pi@192.168.20.1 "cat /data/router_hosts/hosts"

# Force dnsmasq reload
ssh pi@192.168.20.1 "killall -HUP firerouter_dns"

Note: Firewalla uses firerouter_dns, not standard dnsmasq.

IngressRoute Not Syncing

# Verify annotation is present
kubectl get ingressroute -A -o json | jq '.items[] | select(.metadata.annotations["router-hosts.fzymgc.house/enabled"] == "true") | .metadata.name'

# Check operator logs for errors
kubectl logs -n router-hosts-operator -l app.kubernetes.io/name=router-hosts-operator --tail=50

# Verify RouterHostsConfig exists
kubectl get routerhostsconfig -n router-hosts-operator

Key Files Reference

File Purpose
/extdata/router-hosts/config/vault-agent.hcl Vault Agent configuration
/extdata/router-hosts/config/server.toml gRPC server configuration
/extdata/router-hosts/certs/ Certificates (server.crt, server.key, ca.crt)
/extdata/router-hosts/vault-approle/ AppRole credentials (mode 0700)
/extdata/router-hosts/hosts/main Generated hosts file
/home/pi/.firewalla/run/docker/router-hosts/docker-compose.yml Container definition

Log Locations

# Vault Agent logs
docker logs router-hosts-vault-agent

# gRPC server logs
docker logs router-hosts-server

# Systemd service logs
journalctl -u docker-compose@router-hosts -n 50

Rollback

Remove Individual Host Entry

# Via CLI
router-hosts host delete <hostname>

# Via annotation removal (remove annotation from IngressRoute, ArgoCD syncs)
# Or via Terraform (remove from hosts.tf, then apply)
terraform apply

Full Rollback to Manual Hosts

If router-hosts needs to be completely disabled:

  1. Stop the gRPC server:

    ssh pi@192.168.20.1 "systemctl stop docker-compose@router-hosts"
    

  2. Restore manual dnsmasq config:

    ssh pi@192.168.20.1 "rm /home/pi/.firewalla/config/dnsmasq_local/local-hosts.conf"
    ssh pi@192.168.20.1 "killall -HUP firerouter_dns"
    

  3. Disable Terraform workspace (in HCP Terraform UI):

  4. Navigate to main-cluster-router-hosts workspace
  5. Settings → Destruction and Deletion → Queue destroy plan

  6. Disable K8s operator:

    kubectl delete -n router-hosts-operator helmrelease router-hosts-operator
    

Recovery After Rollback

To re-enable router-hosts after a rollback:

  1. Run Ansible playbook: ansible-playbook -i inventory/hosts.yml router-hosts-playbook.yml
  2. Verify gRPC server: grpcurl 192.168.20.1:50051 list
  3. Apply Terraform: terraform -chdir=tf/router-hosts apply
Resource Location
PKI Role Definition tf/vault/pki-router-hosts.tf
Vault Policy tf/vault/policy-external-secrets-operator.tf
K8s Operator Helm argocd/cluster-app/templates/router-hosts-operator.yaml
Design Document docs/plans/2025-12-30-router-hosts-integration-design.md