diff --git a/02_Projects/Homelab Todo List.md b/02_Projects/Homelab Todo List.md index 9339450..aabbfc5 100644 --- a/02_Projects/Homelab Todo List.md +++ b/02_Projects/Homelab Todo List.md @@ -1,84 +1,205 @@ # Homelab Todo List -Prioritized list of things Claudio wants to do with his homelab. Last updated: 2026-04-01. +Curated list of open homelab work across task history, memory, and second-brain notes. Last updated: 2026-04-15. -## Backup & Restore +## Current operating priorities -- [ ] Buy a 4-bay NAS for backup at parents' place ← **NEW 2026-04-03** -- [ ] Regular backup for NAS at parents' place -- [ ] Proxmox backup -- [ ] Paperless backup (and public access) -- [ ] Backup test script — verify restores actually work -- [ ] Kopia/Time Machine backup for Claudio's + Alena's machines (dotfiles, etc.) -- [ ] Backup system across entire lab (321 rule: 3 copies, 2 media, 1 offsite) +1. **Backup foundation first** +2. **Simplify hosting and homelab boundaries** +3. **Stabilize access and edge architecture** +4. **Migrate or delete deliberately, not ad hoc** +5. **Document every meaningful infra change** -## Hosting & Apps +## Now -- [ ] Immich: test thoroughly and validate for production use (see [[Immich Testing Plan]]) - - [ ] Automatic phone backup (iOS) - - [ ] Immich library + database backup/restore - - [ ] Public sharing guest experience - - [ ] 1-week stability run +### 1. Define and implement the backup foundation +- [ ] Define the backup policy for each critical system: Synology, Proxmox, VPS, Gitea, Joplin, Immich, Paperless, config files + - Next action: create one backup matrix with source, method, frequency, retention, restore target, and off-site destination +- [ ] Set up Proxmox backup server + - Candidate target: **goodolddell** + - Next action: decide whether goodolddell should host Proxmox Backup Server or remain a generic restic/utility host +- [ ] Decide where Proxmox backups should land first + - Options currently implied by notes: goodolddell, Synology, or both + - Next action: choose primary landing zone and secondary/off-site path +- [ ] Set up lab-wide backup strategy following 3-2-1 + - Next action: map current copies vs missing copies for each critical service +- [ ] Add backup verification, not just backup jobs + - Next action: define one monthly restore drill and one automated verification check +- [ ] Write a backup test script / restore validation workflow + - Next action: start with one service, likely Gitea or Immich +- [ ] Buy a 4-bay NAS for backup at parents' place + - Blocker: hardware purchase decision +- [ ] Define regular backup flow to parents' NAS + - Depends on: backup matrix + parents' NAS target design +- [ ] Set up Kopia/Time Machine backup for Claudio's and Alena's machines + - Next action: choose destination and retention policy -## Infrastructure Cleanup +### 2. Simplify the homelab and hosting architecture +- [ ] Simplify hosting and homelab structure because too many things are mixed together + - Goal: each service should have one clear host, one clear access path, one clear backup path, and one clear reason to exist where it is + - Next action: create a service inventory table with columns: service, host, purpose, audience, access path, backup path, migration status +- [ ] Decide what belongs on VPS vs Proxmox vs Synology vs goodolddell + - Next action: classify each service as edge/public, production internal, backup/infra, or experimental +- [ ] Review whether Proxmox should become the central app platform + - Existing concern: avoid turning it into an unclear catch-all host +- [ ] Keep the VPS minimal + - Existing note direction: public edge and only truly necessary public components +- [ ] Keep goodolddell focused + - Candidate role: backup and always-on infra, not random leftover app host +- [ ] Give Orik passwordless access to its own machine only + - Goal: Orik should be able to operate its own host without interactive password prompts + - Constraint: do **not** grant write-capable access to other machines on the network + - Next action: design a least-privilege access model for the local host vs all remote hosts before changing SSH/sudo setup +- [ ] Ensure Orik does not have write access to other machines on the network + - Next action: separate local-machine automation privileges from remote-machine credentials and confirm remote access should be read-only or absent by default -- [ ] Move Gitea + Joplin from VPS to Proxmox ← **NEW 2026-04-03** -- [ ] Find out if network traffic is limited/throttled through VPS ← **NEW 2026-04-03** -- [ ] Buy a second VPS instance? ← **NEW 2026-04-03** -- [ ] Evaluate: Pangolin + Authentik vs Cloudflare Access (free tier) — do we need both or is Cloudflare enough? -- [ ] Clean up VPS — consolidate from many reverse proxies (pangolin, nginx, caddy, traefik, dokku, cloudflare?) to one proven stack -- [ ] Version control VPS setup (docker files + config files in git) -- [ ] Fix SSH keys: use single key or few keys instead of many -- [ ] Setup `info@frusetik.com` email account + SMTP for all self-hosted apps (Immich, etc.) +### 3. Finish the remote access simplification +- [ ] Adopt one default admin lane + - Recommended target from existing notes: **Tailscale** + - Next action: confirm Tailscale is the default admin path and mark ZeroTier as deprecated unless proven needed +- [ ] Adopt one default user/app access lane + - Recommended target from existing notes: **Cloudflare Tunnel / reverse proxy** + - Next action: list which services are user-facing vs admin-only +- [ ] Evaluate Pangolin + Authentik vs Cloudflare Access free tier + - Next action: write down what problem Authentik is solving today that Cloudflare alone does not +- [ ] Remove overlapping access paths from the critical path + - Next action: document one primary access path per service +- [ ] Invite family/friends only after service access model is clear + - Depends on: service inventory + access policy -## Monitoring & Documentation +### 4. Stabilize the VPS and edge stack +- [ ] Check whether VPS network traffic is limited or throttled + - Next action: inspect Netcup plan limits and current usage +- [ ] Decide whether a second VPS is actually needed + - Next action: answer only after traffic/memory constraints are measured +- [ ] Clean up the VPS reverse-proxy sprawl + - Goal: converge from multiple overlapping edge tools to one proven stack + - Next action: inventory all currently running proxy/edge/auth components on the VPS +- [ ] Version-control VPS setup + - Next action: put docker compose files, env templates, and key config under git +- [ ] Add swap to VPS to reduce OOM risk + - Source: Immich public outage incident on 2026-04-03 +- [ ] Add memory monitoring and alerting on VPS + - Source: Immich public outage incident on 2026-04-03 +- [ ] Consider Traefik health-check / config-refresh resilience measure + - Source: Immich public outage incident on 2026-04-03 +- [ ] Fix SSH key sprawl + - Next action: reduce to one primary key or a very small set +- [ ] Set up `info@frusetik.com` + SMTP for self-hosted apps + - Next action: decide provider and which apps should send mail first -- [ ] Glance / Uptime Kuma page showing all hosted services status -- [ ] Documentation for everything hosted -- [ ] Monthly maintenance reminder + checklist +## Next -## Access & Networking +### 5. Migrate or remove services deliberately +- [ ] Move Gitea from VPS to Proxmox + - Preconditions: backup, restore plan, target host decision, access path +- [ ] Move Joplin from VPS to Proxmox + - Preconditions: backup, restore plan, target host decision, access path +- [ ] Delete Immich from **goodolddell** + - Intent: remove outdated or misplaced deployment from the Dell machine + - Preconditions: confirm no required data or active path still depends on it + - Next action: verify whether Immich on goodolddell is unused/stale, then remove container, volumes, and residual config deliberately +- [ ] Review whether Cloudflare Tunnel management should move off VPS + - Next action: decide if VPS remains public edge only, or if edge shifts elsewhere -- [ ] One admin VPN network (evaluate: ZeroTier vs Tailscale vs Pangolin private) -- [ ] Invite people (family, friends) to appropriate services +### 6. Validate Immich before declaring it production +- [ ] Test automatic phone backup on Claudio's iPhone +- [ ] Test Immich library + database backup and restore +- [ ] Test public sharing guest experience +- [ ] Run a 1-week stability check +- [ ] Keep Immich **not protected by Pangolin auth** if mobile app backup depends on direct access +- [ ] Verify off-site destination for Immich backups + - Likely target: parents' NAS or equivalent off-site storage -## Network Infrastructure +### 7. Cover the missing app backup gaps +- [ ] Paperless backup + - Next action: document data path, DB path, and restore steps +- [ ] Decide Paperless public access policy + - Next action: determine whether it should be public at all or Tailscale-only +- [ ] Define backup rotation for PostgreSQL-backed services + - Existing note source: Self-Hosting backup section +- [ ] Define config-file backup for infrastructure + - Includes compose files, tunnel/proxy config, auth config, DNS-related config -- [ ] Define IP ranges properly (e.g., 10.0.0.0/24 for lab, 10.0.1.0/24 for prod, 10.0.2.0/24 for DMZ) -- [ ] Set up VLANs: separate prod, dev/staging, IoT, guests -- [ ] Document VLAN/subnet map and which services live where -- [ ] Firewall rules between VLANs (default deny, explicit allow) +## Later -## Automation & Maintenance +### 8. Monitoring, maintenance, and observability +- [ ] Set up Uptime Kuma or similar monitoring tool + - Requirement from Claudio: add **read access for Orik** + - Next action: choose tool and hosting location, then define how Orik should access it safely +- [ ] Build a single service status page + - Candidate: Glance or Uptime Kuma +- [ ] Add automated health checks + alerts +- [ ] Build the monthly maintenance checklist +- [ ] Set the monthly maintenance reminder +- [ ] Keep maintenance under 1 hour/month through automation where possible -- [ ] Max 1h/month maintenance target — automate as much as possible -- [ ] Monthly maintenance reminder + checklist (Orik helps build) -- [ ] Automated backup verification (not just "ran", but "actually restoreable") -- [ ] Automated health checks + alerts +### 9. Network architecture cleanup +- [ ] Define IP ranges properly +- [ ] Set up VLANs for prod, dev/staging, IoT, and guests +- [ ] Document VLAN/subnet map +- [ ] Add inter-VLAN firewall rules with default deny and explicit allow -## Environments +### 10. Environment clarity +- [ ] Define what is production, testing, and staging today +- [ ] Keep dev/staging separate from production +- [ ] Establish naming conventions for hosts, services, and environments -- [ ] Proper distinction between production, development, and staging -- [ ] Dev/staging on separate VLAN from production -- [ ] Clear naming conventions for which services are which environment +### 11. Documentation discipline +- [ ] Document everything hosted +- [ ] Keep service inventory current with host, access path, backup method, and owner +- [ ] Record architecture changes in the second brain as they happen -## Notes +## Suggested sequencing -### Priority direction -Backup foundation first, then hosting apps, then cleanup and monitoring. +### Phase A, make the platform safe +- backup matrix +- decide where Proxmox Backup Server lives +- Proxmox backup +- VPS stabilization (swap, monitoring, proxy inventory) +- one primary access path per service -### VPN evaluation criteria -- Ease of setup + maintenance -- Works on all devices (Claudio's + Alena's) -- integrates with existing Cloudflare/Pangolin setup -- performance on mobile +### Phase B, reduce ambiguity +- service inventory with host + purpose + access path + backup path +- confirm Tailscale for admin +- confirm Cloudflare for user-facing apps +- deprecate ZeroTier unless needed +- decide Pangolin/Auth vs Cloudflare role clearly +- decide what goodolddell is for, and remove misplaced services -### Monthly maintenance checklist (to build) -- Verify backups ran successfully -- Check disk usage on all nodes -- Review logs for errors -- Test at least one restore -- Update docker images / system packages -- Check SSL certs expiration -- Verify VPN connectivity -- Review access logs for anomalies +### Phase C, migrate deliberately +- Gitea migration +- Joplin migration +- Immich production validation +- Paperless backup/access decision +- monitoring deployment + +### Phase D, operational polish +- status page +- maintenance checklist and reminder +- VLAN and firewall cleanup +- full documentation coverage + +## Open decisions Claudio still needs to make +- Should **goodolddell** become the Proxmox Backup Server, or stay a simpler backup/restic host? +- Which backup target should be primary for Proxmox and service backups: goodolddell, Synology, or both? +- Buy the parents' 4-bay NAS now or later? +- Is Proxmox meant to be the main long-term app host, or mainly test/transition infrastructure? +- Does Authentik stay in the near-term critical path, or should Cloudflare carry more of the access burden for now? +- Is a second VPS actually needed, or is the current VPS just under-instrumented? +- Where should Uptime Kuma live, and what exact read-only access should Orik have? + +## Source consolidation notes +This list was curated from: +- `TASKS.md` homelab task state +- `02_Projects/Home Lab.md` +- `02_Projects/Home Lab Plan.md` +- `04_Topics/Self-Hosting.md` +- `05_Resources/Home Lab Architecture.md` +- `05_Resources/Home Lab Inventory.md` +- `05_Resources/Home Lab Incidents.md` +- `06_Decisions/Home Lab Principles.md` +- `06_Decisions/Remote Access V1 Proposal.md` +- `02_Projects/Immich Testing Plan.md` +- workspace memory notes referencing homelab follow-ups +- Claudio additions on 2026-04-15: delete Immich from goodolddell, set up Proxmox Backup Server, set up Uptime Kuma or equivalent with read access for Orik, simplify hosting/homelab boundaries