Update Homelab Todo List

This commit is contained in:
2026-04-15 09:55:28 +00:00
parent c74a3213a6
commit eb9962ca9a

View File

@@ -1,84 +1,205 @@
# Homelab Todo List
Prioritized list of things Claudio wants to do with his homelab. Last updated: 2026-04-01.
Curated list of open homelab work across task history, memory, and second-brain notes. Last updated: 2026-04-15.
## Backup & Restore
## Current operating priorities
- [ ] Buy a 4-bay NAS for backup at parents' place ← **NEW 2026-04-03**
- [ ] Regular backup for NAS at parents' place
- [ ] Proxmox backup
- [ ] Paperless backup (and public access)
- [ ] Backup test script — verify restores actually work
- [ ] Kopia/Time Machine backup for Claudio's + Alena's machines (dotfiles, etc.)
- [ ] Backup system across entire lab (321 rule: 3 copies, 2 media, 1 offsite)
1. **Backup foundation first**
2. **Simplify hosting and homelab boundaries**
3. **Stabilize access and edge architecture**
4. **Migrate or delete deliberately, not ad hoc**
5. **Document every meaningful infra change**
## Hosting & Apps
## Now
- [ ] Immich: test thoroughly and validate for production use (see [[Immich Testing Plan]])
- [ ] Automatic phone backup (iOS)
- [ ] Immich library + database backup/restore
- [ ] Public sharing guest experience
- [ ] 1-week stability run
### 1. Define and implement the backup foundation
- [ ] Define the backup policy for each critical system: Synology, Proxmox, VPS, Gitea, Joplin, Immich, Paperless, config files
- Next action: create one backup matrix with source, method, frequency, retention, restore target, and off-site destination
- [ ] Set up Proxmox backup server
- Candidate target: **goodolddell**
- Next action: decide whether goodolddell should host Proxmox Backup Server or remain a generic restic/utility host
- [ ] Decide where Proxmox backups should land first
- Options currently implied by notes: goodolddell, Synology, or both
- Next action: choose primary landing zone and secondary/off-site path
- [ ] Set up lab-wide backup strategy following 3-2-1
- Next action: map current copies vs missing copies for each critical service
- [ ] Add backup verification, not just backup jobs
- Next action: define one monthly restore drill and one automated verification check
- [ ] Write a backup test script / restore validation workflow
- Next action: start with one service, likely Gitea or Immich
- [ ] Buy a 4-bay NAS for backup at parents' place
- Blocker: hardware purchase decision
- [ ] Define regular backup flow to parents' NAS
- Depends on: backup matrix + parents' NAS target design
- [ ] Set up Kopia/Time Machine backup for Claudio's and Alena's machines
- Next action: choose destination and retention policy
## Infrastructure Cleanup
### 2. Simplify the homelab and hosting architecture
- [ ] Simplify hosting and homelab structure because too many things are mixed together
- Goal: each service should have one clear host, one clear access path, one clear backup path, and one clear reason to exist where it is
- Next action: create a service inventory table with columns: service, host, purpose, audience, access path, backup path, migration status
- [ ] Decide what belongs on VPS vs Proxmox vs Synology vs goodolddell
- Next action: classify each service as edge/public, production internal, backup/infra, or experimental
- [ ] Review whether Proxmox should become the central app platform
- Existing concern: avoid turning it into an unclear catch-all host
- [ ] Keep the VPS minimal
- Existing note direction: public edge and only truly necessary public components
- [ ] Keep goodolddell focused
- Candidate role: backup and always-on infra, not random leftover app host
- [ ] Give Orik passwordless access to its own machine only
- Goal: Orik should be able to operate its own host without interactive password prompts
- Constraint: do **not** grant write-capable access to other machines on the network
- Next action: design a least-privilege access model for the local host vs all remote hosts before changing SSH/sudo setup
- [ ] Ensure Orik does not have write access to other machines on the network
- Next action: separate local-machine automation privileges from remote-machine credentials and confirm remote access should be read-only or absent by default
- [ ] Move Gitea + Joplin from VPS to Proxmox ← **NEW 2026-04-03**
- [ ] Find out if network traffic is limited/throttled through VPS ← **NEW 2026-04-03**
- [ ] Buy a second VPS instance? ← **NEW 2026-04-03**
- [ ] Evaluate: Pangolin + Authentik vs Cloudflare Access (free tier) — do we need both or is Cloudflare enough?
- [ ] Clean up VPS — consolidate from many reverse proxies (pangolin, nginx, caddy, traefik, dokku, cloudflare?) to one proven stack
- [ ] Version control VPS setup (docker files + config files in git)
- [ ] Fix SSH keys: use single key or few keys instead of many
- [ ] Setup `info@frusetik.com` email account + SMTP for all self-hosted apps (Immich, etc.)
### 3. Finish the remote access simplification
- [ ] Adopt one default admin lane
- Recommended target from existing notes: **Tailscale**
- Next action: confirm Tailscale is the default admin path and mark ZeroTier as deprecated unless proven needed
- [ ] Adopt one default user/app access lane
- Recommended target from existing notes: **Cloudflare Tunnel / reverse proxy**
- Next action: list which services are user-facing vs admin-only
- [ ] Evaluate Pangolin + Authentik vs Cloudflare Access free tier
- Next action: write down what problem Authentik is solving today that Cloudflare alone does not
- [ ] Remove overlapping access paths from the critical path
- Next action: document one primary access path per service
- [ ] Invite family/friends only after service access model is clear
- Depends on: service inventory + access policy
## Monitoring & Documentation
### 4. Stabilize the VPS and edge stack
- [ ] Check whether VPS network traffic is limited or throttled
- Next action: inspect Netcup plan limits and current usage
- [ ] Decide whether a second VPS is actually needed
- Next action: answer only after traffic/memory constraints are measured
- [ ] Clean up the VPS reverse-proxy sprawl
- Goal: converge from multiple overlapping edge tools to one proven stack
- Next action: inventory all currently running proxy/edge/auth components on the VPS
- [ ] Version-control VPS setup
- Next action: put docker compose files, env templates, and key config under git
- [ ] Add swap to VPS to reduce OOM risk
- Source: Immich public outage incident on 2026-04-03
- [ ] Add memory monitoring and alerting on VPS
- Source: Immich public outage incident on 2026-04-03
- [ ] Consider Traefik health-check / config-refresh resilience measure
- Source: Immich public outage incident on 2026-04-03
- [ ] Fix SSH key sprawl
- Next action: reduce to one primary key or a very small set
- [ ] Set up `info@frusetik.com` + SMTP for self-hosted apps
- Next action: decide provider and which apps should send mail first
- [ ] Glance / Uptime Kuma page showing all hosted services status
- [ ] Documentation for everything hosted
- [ ] Monthly maintenance reminder + checklist
## Next
## Access & Networking
### 5. Migrate or remove services deliberately
- [ ] Move Gitea from VPS to Proxmox
- Preconditions: backup, restore plan, target host decision, access path
- [ ] Move Joplin from VPS to Proxmox
- Preconditions: backup, restore plan, target host decision, access path
- [ ] Delete Immich from **goodolddell**
- Intent: remove outdated or misplaced deployment from the Dell machine
- Preconditions: confirm no required data or active path still depends on it
- Next action: verify whether Immich on goodolddell is unused/stale, then remove container, volumes, and residual config deliberately
- [ ] Review whether Cloudflare Tunnel management should move off VPS
- Next action: decide if VPS remains public edge only, or if edge shifts elsewhere
- [ ] One admin VPN network (evaluate: ZeroTier vs Tailscale vs Pangolin private)
- [ ] Invite people (family, friends) to appropriate services
### 6. Validate Immich before declaring it production
- [ ] Test automatic phone backup on Claudio's iPhone
- [ ] Test Immich library + database backup and restore
- [ ] Test public sharing guest experience
- [ ] Run a 1-week stability check
- [ ] Keep Immich **not protected by Pangolin auth** if mobile app backup depends on direct access
- [ ] Verify off-site destination for Immich backups
- Likely target: parents' NAS or equivalent off-site storage
## Network Infrastructure
### 7. Cover the missing app backup gaps
- [ ] Paperless backup
- Next action: document data path, DB path, and restore steps
- [ ] Decide Paperless public access policy
- Next action: determine whether it should be public at all or Tailscale-only
- [ ] Define backup rotation for PostgreSQL-backed services
- Existing note source: Self-Hosting backup section
- [ ] Define config-file backup for infrastructure
- Includes compose files, tunnel/proxy config, auth config, DNS-related config
- [ ] Define IP ranges properly (e.g., 10.0.0.0/24 for lab, 10.0.1.0/24 for prod, 10.0.2.0/24 for DMZ)
- [ ] Set up VLANs: separate prod, dev/staging, IoT, guests
- [ ] Document VLAN/subnet map and which services live where
- [ ] Firewall rules between VLANs (default deny, explicit allow)
## Later
## Automation & Maintenance
### 8. Monitoring, maintenance, and observability
- [ ] Set up Uptime Kuma or similar monitoring tool
- Requirement from Claudio: add **read access for Orik**
- Next action: choose tool and hosting location, then define how Orik should access it safely
- [ ] Build a single service status page
- Candidate: Glance or Uptime Kuma
- [ ] Add automated health checks + alerts
- [ ] Build the monthly maintenance checklist
- [ ] Set the monthly maintenance reminder
- [ ] Keep maintenance under 1 hour/month through automation where possible
- [ ] Max 1h/month maintenance target — automate as much as possible
- [ ] Monthly maintenance reminder + checklist (Orik helps build)
- [ ] Automated backup verification (not just "ran", but "actually restoreable")
- [ ] Automated health checks + alerts
### 9. Network architecture cleanup
- [ ] Define IP ranges properly
- [ ] Set up VLANs for prod, dev/staging, IoT, and guests
- [ ] Document VLAN/subnet map
- [ ] Add inter-VLAN firewall rules with default deny and explicit allow
## Environments
### 10. Environment clarity
- [ ] Define what is production, testing, and staging today
- [ ] Keep dev/staging separate from production
- [ ] Establish naming conventions for hosts, services, and environments
- [ ] Proper distinction between production, development, and staging
- [ ] Dev/staging on separate VLAN from production
- [ ] Clear naming conventions for which services are which environment
### 11. Documentation discipline
- [ ] Document everything hosted
- [ ] Keep service inventory current with host, access path, backup method, and owner
- [ ] Record architecture changes in the second brain as they happen
## Notes
## Suggested sequencing
### Priority direction
Backup foundation first, then hosting apps, then cleanup and monitoring.
### Phase A, make the platform safe
- backup matrix
- decide where Proxmox Backup Server lives
- Proxmox backup
- VPS stabilization (swap, monitoring, proxy inventory)
- one primary access path per service
### VPN evaluation criteria
- Ease of setup + maintenance
- Works on all devices (Claudio's + Alena's)
- integrates with existing Cloudflare/Pangolin setup
- performance on mobile
### Phase B, reduce ambiguity
- service inventory with host + purpose + access path + backup path
- confirm Tailscale for admin
- confirm Cloudflare for user-facing apps
- deprecate ZeroTier unless needed
- decide Pangolin/Auth vs Cloudflare role clearly
- decide what goodolddell is for, and remove misplaced services
### Monthly maintenance checklist (to build)
- Verify backups ran successfully
- Check disk usage on all nodes
- Review logs for errors
- Test at least one restore
- Update docker images / system packages
- Check SSL certs expiration
- Verify VPN connectivity
- Review access logs for anomalies
### Phase C, migrate deliberately
- Gitea migration
- Joplin migration
- Immich production validation
- Paperless backup/access decision
- monitoring deployment
### Phase D, operational polish
- status page
- maintenance checklist and reminder
- VLAN and firewall cleanup
- full documentation coverage
## Open decisions Claudio still needs to make
- Should **goodolddell** become the Proxmox Backup Server, or stay a simpler backup/restic host?
- Which backup target should be primary for Proxmox and service backups: goodolddell, Synology, or both?
- Buy the parents' 4-bay NAS now or later?
- Is Proxmox meant to be the main long-term app host, or mainly test/transition infrastructure?
- Does Authentik stay in the near-term critical path, or should Cloudflare carry more of the access burden for now?
- Is a second VPS actually needed, or is the current VPS just under-instrumented?
- Where should Uptime Kuma live, and what exact read-only access should Orik have?
## Source consolidation notes
This list was curated from:
- `TASKS.md` homelab task state
- `02_Projects/Home Lab.md`
- `02_Projects/Home Lab Plan.md`
- `04_Topics/Self-Hosting.md`
- `05_Resources/Home Lab Architecture.md`
- `05_Resources/Home Lab Inventory.md`
- `05_Resources/Home Lab Incidents.md`
- `06_Decisions/Home Lab Principles.md`
- `06_Decisions/Remote Access V1 Proposal.md`
- `02_Projects/Immich Testing Plan.md`
- workspace memory notes referencing homelab follow-ups
- Claudio additions on 2026-04-15: delete Immich from goodolddell, set up Proxmox Backup Server, set up Uptime Kuma or equivalent with read access for Orik, simplify hosting/homelab boundaries