Architecture Design Document¶
Version: 1.0 · Date: May 2025 · Status: Approved
This document defines the technical architecture for the Construo platform — a configurable, multi-tenant construction project management SaaS.
1. Executive Summary¶
The platform is designed to be sold to multiple construction companies, each with different data capture requirements, workflows, and identity providers.
Core priorities:
- Maintainability for a small team starting at 2 engineers
- Security and compliance from day one (ISO 27001, SOC 2, Cyber Essentials)
- Offline-first capability for construction sites with poor connectivity
- Configurability — tenants define their own schemas, modules, and branding
- A clear growth path from 4 tenants at launch to 100+ at scale
Technology decisions at a glance¶
| Layer | Decision |
|---|---|
| Backend | FastAPI (Python 3.12) |
| Frontend | React + TypeScript (Vite) |
| Database | PostgreSQL on AWS RDS — schema-per-tenant |
| Auth | AWS Cognito + Entra ID SAML federation |
| Offline sync | PowerSync (managed sync engine) |
| Hosting | AWS — ECS Fargate, S3, CloudFront |
| IaC | Terraform |
| Primary region | eu-west-2 (London), eu-central-1 (Frankfurt) |
| V1 scope | Manual data entry, all core construction modules |
| V2+ scope | AI features — Smart Import, Licence Scanning |
2. System Architecture¶
High-level layers¶
| Layer | Components | Responsibility |
|---|---|---|
| Edge | CloudFront + WAF + Route 53 | CDN, DDoS, tenant subdomain routing, TLS termination |
| Frontend | React SPA on S3 + CloudFront | UI, offline-capable PWA, service worker, local cache |
| Sync | PowerSync Cloud | Offline-first data sync between client and backend |
| API | FastAPI on ECS Fargate | Business logic, REST API, tenant context, auth middleware |
| Async | SQS + Lambda | Background jobs, imports, notifications, scheduled tasks |
| Data | RDS PostgreSQL (schema-per-tenant) | Persistent structured data with tenant isolation |
| Cache | ElastiCache (Redis) | Session data, rate limiting, short-lived computations |
| Files | S3 (per-tenant prefixes) | Documents, images, exports, licence scan uploads |
| Identity | Cognito + Entra ID | Authentication, federation, token issuance |
| Secrets | AWS Secrets Manager | DB credentials, API keys, tenant config |
| Observability | CloudWatch + X-Ray + Sentry | Logs, tracing, alerts, application errors |
Request flow¶
A typical authenticated API request:
- User browser or mobile app requests
acme.construo.io - Route 53 resolves the wildcard subdomain to CloudFront
- CloudFront applies WAF rules, forwards to ALB
- ALB routes to ECS Fargate task running FastAPI
- FastAPI middleware extracts tenant from subdomain, loads tenant context from Redis (fallback to DB)
- JWT validated against Cognito. Tenant membership and RBAC role attached to request context
- Business logic executes against the tenant's PostgreSQL schema
- Response returned. Async side-effects (audit logs, notifications) dispatched to SQS
Why FastAPI over Next.js¶
The original recommendation was Next.js for developer speed. This was revised because the platform commits to Python for V2 AI features (Smart Import, Licence Scanning). Maintaining two backend languages — JavaScript API routes and Python AI services — would create unnecessary complexity. FastAPI is the right single-language choice.
OpenAPI as the contract: FastAPI auto-generates an OpenAPI spec.
The frontend uses openapi-typescript to generate TypeScript types from it as a pre-build step.
Any breaking API change fails the frontend build immediately — no manual type maintenance.
3. AWS Infrastructure¶
Service map¶
| AWS Service | Configuration | Purpose |
|---|---|---|
| Route 53 | Wildcard *.construo.io ALIAS to CloudFront |
Tenant subdomain DNS |
| CloudFront | WAF attached, S3 + ALB origins, custom SSL | CDN, edge security, TLS |
| AWS WAF | OWASP rule set, rate limiting, geo-blocking | Application firewall |
| ACM | Wildcard cert *.construo.io per region |
TLS certificates |
| S3 (frontend) | Static hosting, versioned, OAC | React SPA assets |
| S3 (files) | Per-tenant prefix, versioning, lifecycle rules | Tenant documents, images |
| ECS Fargate | 2 tasks min, autoscaling, 2 AZs | FastAPI containers |
| ECR | Private registry, image scanning enabled | Docker images |
| ALB | Path-based routing, health checks, access logs | Load balancing |
| RDS PostgreSQL | Multi-AZ, db.t3.medium → db.r6g.large at scale, encrypted | Primary data store |
| ElastiCache Redis | Cluster mode, 2 AZs, encrypted in-transit | Sessions, rate limiting, cache |
| SQS | Standard queues per job type, DLQ on each | Async task queue |
| Lambda | Python 3.12, VPC-attached for DB access | Background job workers |
| Cognito | User Pool per env, Entra ID IdP federation | Identity and token issuance |
| Secrets Manager | Automatic rotation, VPC endpoint | Credentials and secrets |
| KMS | CMKs for RDS, S3, SQS encryption | Encryption key management |
| CloudWatch | Log groups per service, metric alarms, dashboards | Monitoring |
| CloudTrail | All regions, S3 storage, 7-year retention | Audit logging |
| VPC | 3 AZs, public/private/data subnets, NAT GW | Network isolation |
Network architecture¶
Three subnet tiers:
- Public — ALB, NAT Gateway only. Nothing else is directly internet-accessible.
- Private — ECS Fargate tasks, Lambda functions. Outbound via NAT Gateway.
- Data — RDS, ElastiCache. No outbound internet. Accessible only from private subnets.
Multi-region¶
Phase 1 deploys to eu-west-2 (London) only.
EU data residency tenants are added to eu-central-1 (Frankfurt) from Phase 2.
The tenant registry stores each tenant's home region; the application routes accordingly.
Environments¶
| Environment | AWS Account | Data |
|---|---|---|
| development | dev account | Synthetic only |
| staging | non-prod account | Anonymised copy of prod |
| production | prod account | Real tenant data |
Separate accounts, not separate VPCs
Use separate AWS accounts per environment. This is required for SOC 2 and ISO 27001 — it prevents staging credentials from accidentally accessing production resources.
4. Multi-Tenancy and Data Isolation¶
Strategy: schema-per-tenant¶
Each tenant gets a dedicated PostgreSQL schema within the shared RDS instance.
All tenant tables live under their schema (e.g. acme.projects, acme.personnel).
A shared public schema contains the tenant registry and global configuration.
| Approach | Notes |
|---|---|
| Schema-per-tenant (chosen) | Strong isolation, simple GDPR deletion (DROP SCHEMA), per-tenant backup/restore, no row-level filtering bugs, good to ~200 tenants |
| Row-level security | Highest risk — one missing WHERE clause leaks cross-tenant data |
| Database-per-tenant | Maximum isolation but operationally expensive at launch scale |
Tenant registry¶
The public.tenants table is the first lookup on every request:
| Column | Type | Description |
|---|---|---|
| id | UUID | Tenant identifier |
| slug | VARCHAR(63) UNIQUE | Subdomain slug — acme for acme.construo.io |
| schema_name | VARCHAR(63) | PostgreSQL schema name |
| region | VARCHAR(20) | AWS region for data residency |
| plan | ENUM | starter / professional / enterprise |
| status | ENUM | active / suspended / trial / offboarded |
| idp_type | ENUM NULL | none / entra_id / okta / google |
| idp_config | JSONB NULL | SAML/OIDC metadata |
| retention_days | INTEGER | Configurable data retention period |
| modules_enabled | TEXT[] | List of enabled module identifiers |
Tenant provisioning¶
When a new tenant is onboarded:
- Insert row into
public.tenants CREATE SCHEMA {schema_name}- Run Alembic migrations targeting the new schema
- Seed default config (field definitions, module settings, roles)
- Provision Cognito User Pool App Client
- Configure Entra ID federation if applicable
- Create S3 bucket prefix:
s3://construo-files-prod/{tenant_id}/ - Set up CloudWatch log group and metric filters
- Provision tenant subdomain DNS record via Route 53
5. Authentication and Access Control¶
Architecture¶
AWS Cognito is the central identity broker. All authentication methods produce a consistent JWT regardless of the sign-in method.
| Auth method | Use case |
|---|---|
| Username + password | Tenants without a corporate IDP |
| Entra ID (SAML 2.0) | Primary enterprise federation — Microsoft 365 orgs |
| OIDC federation | Other IDPs (Okta, Google Workspace, Ping) |
| API keys | ERP integrations, machine-to-machine |
RBAC roles¶
| Role | Scope | Key permissions |
|---|---|---|
| Platform Admin | Global | Tenant management, platform config — Construo staff only |
| Tenant Admin | Tenant | User management, module config, field schema builder, IDP config |
| Project Manager | Tenant | Full CRUD on assigned projects and sites |
| Site Foreman | Project | Create/edit site diary, personnel, plant on assigned sites |
| Site Operative | Site | View site data, log own attendance, sign inductions |
| Viewer | Project or Tenant | Read-only |
| Integration | Tenant | API key role — scoped access for ERP sync |
Permission enforcement¶
Three layers — defence in depth, required for SOC 2 and ISO 27001:
- FastAPI dependency injection — every route declares required permissions via
Depends(require_permission('sites:write')) - Database row-level checks —
schema_nameis always part of query context. Parameterised queries only. - Frontend route guards — React Router guards for UX only; API is the authoritative enforcement point.
Never trust client-supplied tenant IDs
Tenant context must always be derived from the verified JWT on the server side. This is the most common multi-tenancy vulnerability.
JWT custom claims¶
Cognito-issued JWTs contain these custom claims added via a Lambda trigger:
tenant_id— UUID of the tenanttenant_slug— subdomain slug for routingplatform_roles— list of RBAC rolesproject_access— list of project UUIDs (empty = all within tenant)
6. Offline Sync Architecture¶
Why this is non-trivial¶
Construction sites frequently have no mobile data coverage. Users must be able to record site diary entries, log personnel, capture plant movements, and raise incidents without internet. When connectivity returns, changes must sync reliably without data loss or corruption.
Building a reliable sync engine from scratch is a multi-month engineering effort with subtle failure modes. PowerSync is a managed service that solves this problem.
PowerSync integration¶
| Component | Technology | Role |
|---|---|---|
| Backend connector | PowerSync Python SDK + FastAPI webhook | Publishes data changes to PowerSync Cloud |
| Sync rules | PowerSync YAML schema | Defines which tables/rows sync to which users |
| Client SDK | PowerSync React SDK (web) / React Native SDK (mobile) | Local SQLite, sync engine |
| Conflict resolution | Last-write-wins with server authority | Server is always authoritative |
Sync scope¶
Not all data syncs to all clients:
- Site Foreman: assigned sites, site diary entries (last 90 days), personnel list, plant register, pending forms
- Project Manager: all sites within assigned projects, aggregated view data
- Large binaries (documents, images) — fetched on-demand from S3 via signed URLs, cached by service worker
Conflict resolution¶
V1 uses last-write-wins with server authority.
Each record has updated_at. When a client syncs, conflicts resolve in favour of the
most recent server timestamp. Client changes are queued in a local SQLite upload queue
and replayed in order when connectivity returns.
7. Core Data Model¶
Design principles¶
- Core fields — fixed columns for universally required attributes (IDs, timestamps, relationships)
- Custom fields — a
JSONBcustom_fieldscolumn on each entity stores tenant-defined attributes - Audit trail — every table has a corresponding
_audittable capturing before/after state, user, and timestamp
Universal columns (all tables)¶
| Column | Type |
|---|---|
| id | UUID DEFAULT gen_random_uuid() |
| created_at | TIMESTAMPTZ DEFAULT NOW() |
| updated_at | TIMESTAMPTZ DEFAULT NOW() |
| created_by | UUID FK → users.id |
| updated_by | UUID FK → users.id |
| deleted_at | TIMESTAMPTZ NULL (soft delete) |
| custom_fields | JSONB DEFAULT '{}' |
Core entities¶
Key entities: projects, sites, site_diary_entries, personnel,
site_attendance, plant_equipment, incidents, documents,
deliveries, subcontractors, inductions, field_definitions.
Full column-level schemas are documented in Data Model and will be expanded as each module is built.
8. Configurability Engine¶
Three levels¶
- Field Schema — tenants add custom fields to any entity
- Module Configuration — tenants enable/disable platform modules
- White-Label — branding, subdomain, theme colours
Field definitions table¶
| Column | Type | Description |
|---|---|---|
| entity_type | VARCHAR(50) | e.g. site_diary_entry, personnel |
| field_key | VARCHAR(100) | Snake_case key in custom_fields JSONB |
| label | VARCHAR(255) | Display label shown in UI |
| field_type | ENUM | text / number / date / boolean / select / multi_select / file / user_ref |
| is_required | BOOLEAN | Validation: field must be present |
| options | JSONB NULL | For select types: array of {value, label} |
| display_order | INTEGER | Sort order in forms |
Field definitions are Redis-cached (5-minute TTL), invalidated when a tenant admin saves changes.
V1 Modules¶
| Module key | Description | V1 |
|---|---|---|
| site_diary | Daily site diary entries | ✅ |
| personnel | Worker register and attendance | ✅ |
| plant | Plant and equipment register | ✅ |
| documents | Document register with version control | ✅ |
| incidents | Incident and near-miss reporting | ✅ |
| inductions | Site induction tracking and sign-off | ✅ |
| deliveries | Materials delivery log | ✅ |
| subcontractors | Subcontractor company management | ✅ |
| timesheets | Daily hours per worker per site | V2 |
| rams | Risk assessments and method statements | V2 |
| snag_lists | Inspections and punch lists | V2 |
| reporting | Scheduled reports and dashboards | V2 |
| licence_scan | AI-powered licence OCR scanning | V2 |
| smart_import | AI-assisted spreadsheet import | V2 |
9. Integration Architecture¶
Generic ERP integration layer¶
Rather than building bespoke connectors for each ERP system, the integration model uses a Transform Pipeline approach:
- Platform API — versioned REST API, all data accessible via authenticated API key
- Transform service — a small Python Lambda per ERP integration containing the field mapping logic
- ERP API / file — REST (Procore, Autodesk), file export (Sage, Viewpoint), webhook (Oracle)
- Scheduler — EventBridge triggers the transform Lambda on a defined cadence
Platform API principles¶
- Versioned from day one:
/api/v1/. Never break existing consumers. - Cursor-based pagination on all list endpoints
- Webhook support: tenants register URLs for entity events
- Rate limiting: per API key, enforced at ALB via WAF (1,000 req/min default)
- Idempotency keys on write operations
V2 — Smart Import¶
- Upload CSV/XLSX → S3 → SQS queue
- Lambda parses headers and sample rows
- Claude API interprets columns, proposes field mapping JSON
- User confirms/adjusts the mapping
- FastAPI validates and bulk-inserts with error report on partial failures
V2 — Licence Scanning¶
- Mobile camera → upload to S3 via presigned URL
- Lambda invokes AWS Textract
- Claude API interprets raw Textract output → name, licence number, expiry, categories
- User confirms before saving to
personnel.licencesJSONB
10. Security and Compliance¶
Target frameworks¶
| Framework | Target date | Status |
|---|---|---|
| Cyber Essentials | Phase 3 (Week 38) | Planned |
| Cyber Essentials Plus | Month 10 | Planned |
| SOC 2 Type I | Month 12 | Planned |
| ISO 27001 | Month 18 | Planned |
| SOC 2 Type II | Month 24 | Planned |
Key controls¶
| Control | Implementation |
|---|---|
| Encryption in transit | TLS 1.2+ everywhere; HTTP → HTTPS redirect |
| Encryption at rest | RDS, S3, EBS encrypted via KMS CMK |
| Network segmentation | 3-tier VPC; SGs allow minimum required traffic |
| WAF | CloudFront WAF with AWS Managed Rules (OWASP Core, Known Bad Inputs) |
| Secrets management | All secrets in AWS Secrets Manager; automatic rotation |
| Patch management | ECR image scanning; Dependabot; Fargate managed patches |
| Audit logging | Immutable audit trail on all entity writes; CloudTrail; 7-year retention |
| Backup | RDS automated backups (7-day); S3 versioning; PITR tested quarterly |
| MFA | Enforced for all admin roles via Cognito |
| Pen testing | Annual external test; findings tracked to closure |
GDPR¶
- UK tenants:
eu-west-2. EU tenants:eu-central-1. No transfer outside UK/EU. - Tenant data export: JSON export of all data for a named person (right of access)
- Tenant data deletion: right to erasure — triggers anonymisation workflow
- Data retention: configurable per tenant; nightly Lambda applies retention policies
- DPA: Data Processing Agreement template, signed before any tenant goes live
11. Phased Delivery Plan (Summary)¶
| Phase | Weeks | Goal |
|---|---|---|
| Phase 0 — Foundation | 1–6 | CI/CD, AWS infrastructure, auth end-to-end, tenant provisioning |
| Phase 1 — Core Modules | 7–22 | All 8 V1 modules, offline sync, tested by product owner |
| Phase 2 — Production Ready | 23–32 | White-label, ERP integration, security hardening, pilot tenants |
| Phase 3 — Launch | 33–40 | Pilot feedback, mobile, Cyber Essentials, commercial launch |
For the full sprint-by-sprint breakdown, see Project Plan.
12. Monorepo Structure¶
platform/
├── apps/
│ ├── api/ # FastAPI backend
│ │ ├── src/
│ │ │ ├── core/ # Auth, tenant context, db, audit
│ │ │ ├── modules/ # Sites, personnel, plant, etc.
│ │ │ └── main.py
│ │ ├── tests/
│ │ ├── alembic/
│ │ └── Dockerfile
│ └── web/ # React frontend
│ ├── src/
│ │ ├── core/
│ │ └── modules/
│ └── vite.config.ts
├── packages/
│ ├── types/ # Auto-generated TypeScript from OpenAPI spec
│ ├── db/ # SQLAlchemy models, Alembic migrations
│ └── shared/ # Constants, enums
├── infra/
│ └── terraform/ # All AWS infrastructure
│ ├── modules/
│ └── environments/ # dev, staging, prod
├── .github/
│ └── workflows/ # CI/CD pipelines
└── docs/ # This documentation site
13. Key Risks¶
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Junior accepts AI code without understanding it | High | Critical | Teach-back rule; weekly code walk-through; AI development rules |
| Offline sync conflicts cause data corruption | Medium | High | PowerSync handles at engine level; soft-delete only |
| Schema-per-tenant hits RDS connection limits | High | Medium | PgBouncer from day one; RDS Proxy at scale |
| Tenant data leakage via query bug | Low | Critical | Schema isolation; cross-tenant tests in CI; parameterised queries only |
| Entra ID federation misconfiguration | Low | High | Test in staging before prod; second review of attribute mapping |
| AWS costs spike due to misconfiguration | Medium | Medium | Budgets at 80%/100%; junior cannot provision without approval |
| Compliance reveals architectural rework | Low | High | Audit logging and isolation built into Phase 0 |