# Home Lab Incident Reports --- ## 2026-04-03 — Immich public outage via VPS OOM ### What happened - Immich still worked internally in the homelab - Public access failed first with HTTP 500, later with 502 - Pangolin had been OOM-killed on the VPS - After that, Traefik could no longer resolve pangolin via Docker internal DNS, so it could not fetch dynamic config ### Evidence ```bash # Check for OOM events sudo dmesg -T | grep -i -E 'oom|out of memory|killed process' # Pangolin log showed SIGKILL docker logs pangolin # Traefik log showed: # Get "http://pangolin:3001/api/v1/traefik-config" # lookup pangolin on 127.0.0.11:53 # read: connection refused ``` ### Root cause - **Primary cause:** VPS memory exhaustion - **Secondary cause:** broken Docker service discovery / network state after the OOM event ### Fix ```bash docker compose down docker compose up -d ``` ### Follow-up actions - [ ] Add swap to VPS to prevent OOM cascade - [ ] Add memory monitoring/alerting on VPS - [ ] Consider adding Traefik health-check/config-refresh cron job as a resilience measure