Files
second-brain/02_Projects/Homelab Todo List.md
2026-04-15 09:55:28 +00:00

206 lines
10 KiB
Markdown

# Homelab Todo List
Curated list of open homelab work across task history, memory, and second-brain notes. Last updated: 2026-04-15.
## Current operating priorities
1. **Backup foundation first**
2. **Simplify hosting and homelab boundaries**
3. **Stabilize access and edge architecture**
4. **Migrate or delete deliberately, not ad hoc**
5. **Document every meaningful infra change**
## Now
### 1. Define and implement the backup foundation
- [ ] Define the backup policy for each critical system: Synology, Proxmox, VPS, Gitea, Joplin, Immich, Paperless, config files
- Next action: create one backup matrix with source, method, frequency, retention, restore target, and off-site destination
- [ ] Set up Proxmox backup server
- Candidate target: **goodolddell**
- Next action: decide whether goodolddell should host Proxmox Backup Server or remain a generic restic/utility host
- [ ] Decide where Proxmox backups should land first
- Options currently implied by notes: goodolddell, Synology, or both
- Next action: choose primary landing zone and secondary/off-site path
- [ ] Set up lab-wide backup strategy following 3-2-1
- Next action: map current copies vs missing copies for each critical service
- [ ] Add backup verification, not just backup jobs
- Next action: define one monthly restore drill and one automated verification check
- [ ] Write a backup test script / restore validation workflow
- Next action: start with one service, likely Gitea or Immich
- [ ] Buy a 4-bay NAS for backup at parents' place
- Blocker: hardware purchase decision
- [ ] Define regular backup flow to parents' NAS
- Depends on: backup matrix + parents' NAS target design
- [ ] Set up Kopia/Time Machine backup for Claudio's and Alena's machines
- Next action: choose destination and retention policy
### 2. Simplify the homelab and hosting architecture
- [ ] Simplify hosting and homelab structure because too many things are mixed together
- Goal: each service should have one clear host, one clear access path, one clear backup path, and one clear reason to exist where it is
- Next action: create a service inventory table with columns: service, host, purpose, audience, access path, backup path, migration status
- [ ] Decide what belongs on VPS vs Proxmox vs Synology vs goodolddell
- Next action: classify each service as edge/public, production internal, backup/infra, or experimental
- [ ] Review whether Proxmox should become the central app platform
- Existing concern: avoid turning it into an unclear catch-all host
- [ ] Keep the VPS minimal
- Existing note direction: public edge and only truly necessary public components
- [ ] Keep goodolddell focused
- Candidate role: backup and always-on infra, not random leftover app host
- [ ] Give Orik passwordless access to its own machine only
- Goal: Orik should be able to operate its own host without interactive password prompts
- Constraint: do **not** grant write-capable access to other machines on the network
- Next action: design a least-privilege access model for the local host vs all remote hosts before changing SSH/sudo setup
- [ ] Ensure Orik does not have write access to other machines on the network
- Next action: separate local-machine automation privileges from remote-machine credentials and confirm remote access should be read-only or absent by default
### 3. Finish the remote access simplification
- [ ] Adopt one default admin lane
- Recommended target from existing notes: **Tailscale**
- Next action: confirm Tailscale is the default admin path and mark ZeroTier as deprecated unless proven needed
- [ ] Adopt one default user/app access lane
- Recommended target from existing notes: **Cloudflare Tunnel / reverse proxy**
- Next action: list which services are user-facing vs admin-only
- [ ] Evaluate Pangolin + Authentik vs Cloudflare Access free tier
- Next action: write down what problem Authentik is solving today that Cloudflare alone does not
- [ ] Remove overlapping access paths from the critical path
- Next action: document one primary access path per service
- [ ] Invite family/friends only after service access model is clear
- Depends on: service inventory + access policy
### 4. Stabilize the VPS and edge stack
- [ ] Check whether VPS network traffic is limited or throttled
- Next action: inspect Netcup plan limits and current usage
- [ ] Decide whether a second VPS is actually needed
- Next action: answer only after traffic/memory constraints are measured
- [ ] Clean up the VPS reverse-proxy sprawl
- Goal: converge from multiple overlapping edge tools to one proven stack
- Next action: inventory all currently running proxy/edge/auth components on the VPS
- [ ] Version-control VPS setup
- Next action: put docker compose files, env templates, and key config under git
- [ ] Add swap to VPS to reduce OOM risk
- Source: Immich public outage incident on 2026-04-03
- [ ] Add memory monitoring and alerting on VPS
- Source: Immich public outage incident on 2026-04-03
- [ ] Consider Traefik health-check / config-refresh resilience measure
- Source: Immich public outage incident on 2026-04-03
- [ ] Fix SSH key sprawl
- Next action: reduce to one primary key or a very small set
- [ ] Set up `info@frusetik.com` + SMTP for self-hosted apps
- Next action: decide provider and which apps should send mail first
## Next
### 5. Migrate or remove services deliberately
- [ ] Move Gitea from VPS to Proxmox
- Preconditions: backup, restore plan, target host decision, access path
- [ ] Move Joplin from VPS to Proxmox
- Preconditions: backup, restore plan, target host decision, access path
- [ ] Delete Immich from **goodolddell**
- Intent: remove outdated or misplaced deployment from the Dell machine
- Preconditions: confirm no required data or active path still depends on it
- Next action: verify whether Immich on goodolddell is unused/stale, then remove container, volumes, and residual config deliberately
- [ ] Review whether Cloudflare Tunnel management should move off VPS
- Next action: decide if VPS remains public edge only, or if edge shifts elsewhere
### 6. Validate Immich before declaring it production
- [ ] Test automatic phone backup on Claudio's iPhone
- [ ] Test Immich library + database backup and restore
- [ ] Test public sharing guest experience
- [ ] Run a 1-week stability check
- [ ] Keep Immich **not protected by Pangolin auth** if mobile app backup depends on direct access
- [ ] Verify off-site destination for Immich backups
- Likely target: parents' NAS or equivalent off-site storage
### 7. Cover the missing app backup gaps
- [ ] Paperless backup
- Next action: document data path, DB path, and restore steps
- [ ] Decide Paperless public access policy
- Next action: determine whether it should be public at all or Tailscale-only
- [ ] Define backup rotation for PostgreSQL-backed services
- Existing note source: Self-Hosting backup section
- [ ] Define config-file backup for infrastructure
- Includes compose files, tunnel/proxy config, auth config, DNS-related config
## Later
### 8. Monitoring, maintenance, and observability
- [ ] Set up Uptime Kuma or similar monitoring tool
- Requirement from Claudio: add **read access for Orik**
- Next action: choose tool and hosting location, then define how Orik should access it safely
- [ ] Build a single service status page
- Candidate: Glance or Uptime Kuma
- [ ] Add automated health checks + alerts
- [ ] Build the monthly maintenance checklist
- [ ] Set the monthly maintenance reminder
- [ ] Keep maintenance under 1 hour/month through automation where possible
### 9. Network architecture cleanup
- [ ] Define IP ranges properly
- [ ] Set up VLANs for prod, dev/staging, IoT, and guests
- [ ] Document VLAN/subnet map
- [ ] Add inter-VLAN firewall rules with default deny and explicit allow
### 10. Environment clarity
- [ ] Define what is production, testing, and staging today
- [ ] Keep dev/staging separate from production
- [ ] Establish naming conventions for hosts, services, and environments
### 11. Documentation discipline
- [ ] Document everything hosted
- [ ] Keep service inventory current with host, access path, backup method, and owner
- [ ] Record architecture changes in the second brain as they happen
## Suggested sequencing
### Phase A, make the platform safe
- backup matrix
- decide where Proxmox Backup Server lives
- Proxmox backup
- VPS stabilization (swap, monitoring, proxy inventory)
- one primary access path per service
### Phase B, reduce ambiguity
- service inventory with host + purpose + access path + backup path
- confirm Tailscale for admin
- confirm Cloudflare for user-facing apps
- deprecate ZeroTier unless needed
- decide Pangolin/Auth vs Cloudflare role clearly
- decide what goodolddell is for, and remove misplaced services
### Phase C, migrate deliberately
- Gitea migration
- Joplin migration
- Immich production validation
- Paperless backup/access decision
- monitoring deployment
### Phase D, operational polish
- status page
- maintenance checklist and reminder
- VLAN and firewall cleanup
- full documentation coverage
## Open decisions Claudio still needs to make
- Should **goodolddell** become the Proxmox Backup Server, or stay a simpler backup/restic host?
- Which backup target should be primary for Proxmox and service backups: goodolddell, Synology, or both?
- Buy the parents' 4-bay NAS now or later?
- Is Proxmox meant to be the main long-term app host, or mainly test/transition infrastructure?
- Does Authentik stay in the near-term critical path, or should Cloudflare carry more of the access burden for now?
- Is a second VPS actually needed, or is the current VPS just under-instrumented?
- Where should Uptime Kuma live, and what exact read-only access should Orik have?
## Source consolidation notes
This list was curated from:
- `TASKS.md` homelab task state
- `02_Projects/Home Lab.md`
- `02_Projects/Home Lab Plan.md`
- `04_Topics/Self-Hosting.md`
- `05_Resources/Home Lab Architecture.md`
- `05_Resources/Home Lab Inventory.md`
- `05_Resources/Home Lab Incidents.md`
- `06_Decisions/Home Lab Principles.md`
- `06_Decisions/Remote Access V1 Proposal.md`
- `02_Projects/Immich Testing Plan.md`
- workspace memory notes referencing homelab follow-ups
- Claudio additions on 2026-04-15: delete Immich from goodolddell, set up Proxmox Backup Server, set up Uptime Kuma or equivalent with read access for Orik, simplify hosting/homelab boundaries