Business Continuity & Disaster Recovery
What happens when the primary region falls over.
Recovery objectives
| Recovery Time Objective (RTO) | 4 hours |
| Recovery Point Objective (RPO) | 1 hour |
| Maximum Tolerable Downtime | 8 hours |
Multi-region posture
- Primary — EU-Central-1 (Frankfurt). Postgres, edge functions, file storage.
- Failover — EU-West-1 (Dublin). Read-replica promoted on outage.
- Compute — Vercel auto-routes to the nearest healthy region. No regional pinning required.
Backups
- Supabase PITR — point-in-time recovery to any second within the trailing 7 days.
- Daily logical backups — pg_dump to S3-EU (Frankfurt). Retention 30 days.
- Source code — GitHub primary, second mirror to a cold S3-EU bucket weekly.
- Migrations — versioned in supabase/migrations/ and replay-safe against an empty cluster.
Quarterly DR test runbook
- Restore the latest daily logical backup to a clean Supabase project.
- Run the migration replay to bring schema to head.
- Verify row counts on the 6 highest-traffic tables against the snapshot.
- Deploy a Vercel preview pointing at the restored project.
- Smoke-test 5 critical paths: sign-in, leaderboard, journal write, alert dispatch, packet generation.
- Record RTO/RPO measurements; file in compliance Notion.
Incident response
- P0 (full outage): on-call paged within 5 min.
- P1 (degraded): 30-min SLA to triage.
- Customers notified within 1 hour via /status + email.
- Post-mortem published within 5 business days.