Operations¶

Running the platform in production. Read this if you're on-call or doing any production work.

In this section¶

Deployment — how code goes from PR to production
Terraform State & Accounts — state backend and the account-per-env assume-role model
DNS & Certificates — Cloudflare DNS and ACM cert validation
Environments — dev, staging, production differences
Monitoring & Alerts — CloudWatch dashboards, alert routing
Runbooks — step-by-step responses to specific incidents
Backup & Restore — RDS backups, point-in-time recovery
Incident Response — what to do when things break
Tenant Onboarding — provisioning a new customer

Severity	Response time	Examples
P0 — Critical	15 min	Platform down, data loss, security breach
P1 — High	1 hour	Module broken, sync failing, auth issues
P2 — Medium	Next business day	Performance degradation, minor feature broken
P3 — Low	Next sprint	UX issues, cosmetic bugs

If you're new to on-call, read Incident Response first.