Azure Infrastructure Details¶
Complete reference for S5's Azure resources and configuration.
Resource Overview¶
Preview Environment (rg-slidefactory-preview)¶
| Resource | Type | Purpose | Configuration |
|---|---|---|---|
slidefactory-web-preview | Container App | Web application | FastAPI on port 8000, autoscaling 1-3 instances |
slidefactory-worker-preview | Container App | Background tasks | Celery worker, autoscaling 1-2 instances |
postgres-preview | PostgreSQL Flexible Server | Database | v15 with pgvector extension |
redis-preview | Azure Cache for Redis | Caching & Celery broker | Standard tier, 1GB |
slidefactorypreview | Storage Account | File storage | Blob storage with containers |
s5slidefactory | Container Registry | Docker images | Shared with production |
Production Environment (rg-slidefactory-prod)¶
| Resource | Type | Purpose | Configuration |
|---|---|---|---|
slidefactory-web-prod | Container App | Web application | FastAPI on port 8000, autoscaling 2-5 instances |
slidefactory-worker-prod | Container App | Background tasks | Celery worker, autoscaling 1-3 instances |
postgres-prod | PostgreSQL Flexible Server | Database | v15 with pgvector extension |
redis-prod | Azure Cache for Redis | Caching & Celery broker | Standard tier, 2.5GB |
slidefactoryprod | Storage Account | File storage | Blob storage with containers |
s5slidefactory | Container Registry | Docker images | Shared with preview |
Container Apps¶
Web Service Configuration¶
Image: s5slidefactory.azurecr.io/slidefactory:{preview|prod}-{sha}
Environment Variables: - DATABASE_URL - PostgreSQL connection string - REDIS_HOST, REDIS_PORT, REDIS_PASSWORD - Redis configuration - AZURE_STORAGE_* - Blob storage credentials - AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET - Azure AD auth - OPENAI_API_KEY / AZURE_OPENAI_* - AI provider credentials - N8N_API_URL, N8N_API_KEY - N8N integration
Health Probes: - Liveness: GET /health (30s interval) - Readiness: GET /health (10s interval) - Startup: GET /health (5s interval, 60s timeout)
Scaling: - Preview: 1-3 instances (CPU > 75% triggers scale-up) - Production: 2-5 instances (CPU > 70% triggers scale-up)
Resources: - Preview: 0.5 CPU, 1GB memory per instance - Production: 1.0 CPU, 2GB memory per instance
Worker Service Configuration¶
Image: Same as web service
Command Override: /code/scripts/start-worker-azure.sh
Environment Variables: Same as web service (shares configuration)
Scaling: - Preview: 1-2 instances (CPU > 80% triggers scale-up) - Production: 1-3 instances (CPU > 75% triggers scale-up)
Resources: - Preview: 0.5 CPU, 1GB memory per instance - Production: 1.0 CPU, 2GB memory per instance
Database (PostgreSQL)¶
Configuration¶
Version: PostgreSQL 15 Extensions: pgvector (for vector similarity search) Tier: - Preview: Burstable (B1ms, 1 vCore, 2GB RAM) - Production: General Purpose (D2s_v3, 2 vCores, 8GB RAM)
Backup: - Retention: 7 days (preview), 30 days (production) - Geo-redundant: No (preview), Yes (production)
Networking: - Public access: Disabled - Private endpoint: Connected to Container Apps virtual network - Firewall: Allow Azure services
Database Schema¶
Managed by Alembic migrations in slidefactory-core package.
Key tables: - users - User accounts - api_keys - API authentication - templates - PowerPoint templates - n8n_processes - N8N workflow executions - context_documents, context_chunks - Document storage with vector embeddings
Maintenance¶
Automatic: - Minor version updates: Enabled - Maintenance window: Sunday 02:00-06:00 UTC
Manual: - Major version upgrades: Requires manual planning - Index maintenance: Automatic via PostgreSQL autovacuum
Redis Cache¶
Configuration¶
Version: Redis 6.x Tier: - Preview: Standard C1 (1GB) - Production: Standard C2 (2.5GB)
Usage: - Session storage (user sessions) - Celery broker (task queue) - Celery result backend (task results) - Application caching
Networking: - Public access: Disabled - Private endpoint: Connected to Container Apps virtual network - TLS: Required
Eviction Policy: allkeys-lru (least recently used)
Persistence: RDB snapshots (every 1 hour)
Blob Storage¶
Configuration¶
Redundancy: - Preview: LRS (Locally Redundant Storage) - Production: GRS (Geo-Redundant Storage)
Access Tier: Hot (frequently accessed)
Networking: - Public access: Enabled (with SAS tokens for downloads) - Firewall: Allow Azure services + specific IPs
Containers¶
| Container | Purpose | Access Level |
|---|---|---|
presentations | Generated presentation files | Private (SAS URLs for downloads) |
templates | PowerPoint template files | Private |
documents | Uploaded documents for context/RAG | Private |
static | S5 branding static assets | Public read |
Lifecycle Policies¶
Presentations Container: - Move to cool tier after 90 days - Delete after 1 year (preview), 2 years (production)
Templates Container: - No automatic deletion - Manual cleanup required
Container Registry¶
Configuration¶
SKU: Standard Admin Access: Disabled (use service principal) Geo-replication: No
Images: - slidefactory:preview-{sha} - Preview builds - slidefactory:prod-{sha} - Production builds
Retention: Keep last 10 images per tag prefix
Scanning: Enabled (Azure Defender for container registries)
Networking¶
Virtual Network Integration¶
Container Apps Environment: - Virtual network: vnet-slidefactory (10.0.0.0/16) - Subnet: snet-apps (10.0.1.0/24)
Private Endpoints: - PostgreSQL: 10.0.2.4 - Redis: 10.0.2.5 - Storage Account: 10.0.2.6
DNS: - Private DNS zones for private endpoints - Azure-provided DNS for public endpoints
Firewall Rules¶
PostgreSQL: - Allow: Azure services - Allow: Container Apps subnet (10.0.1.0/24)
Redis: - Allow: Container Apps subnet (10.0.1.0/24)
Storage Account: - Allow: Azure services - Allow: Container Apps subnet (10.0.1.0/24) - Allow: Office IPs (for management)
Monitoring & Observability¶
Application Insights¶
Resource: appi-slidefactory-{preview|prod}
Telemetry: - HTTP requests and responses - Dependencies (database, Redis, storage) - Exceptions and errors - Custom events and metrics
Retention: 90 days
Log Analytics¶
Resource: log-slidefactory-{preview|prod}
Data Sources: - Container Apps logs (stdout/stderr) - Application Insights telemetry - Azure resource logs (database, Redis, storage)
Retention: 30 days (preview), 90 days (production)
Alerts¶
Configured Alerts: - Container App unhealthy (health probe failures) - High CPU usage (> 80% for 5 minutes) - High memory usage (> 85% for 5 minutes) - Database connection failures - Redis connection failures - Storage account throttling
Notification: Email to ops team
Security¶
Identity & Access¶
Authentication: - Azure AD integration for user login - API keys for programmatic access
Service Principal: - GitHub Actions deployment: Contributor on resource groups - Container Apps: Identity assigned for accessing other Azure resources
Secrets Management: - Environment variables in Container Apps configuration (encrypted at rest) - Azure Key Vault not used (secrets in GitHub Secrets and Container Apps)
Network Security¶
TLS: - HTTPS enforced for web traffic (Container Apps automatic TLS) - TLS required for Redis - TLS required for PostgreSQL
Firewall: - Database and Redis only accessible from Container Apps subnet - Storage account accessible from Container Apps + management IPs
Data Protection¶
Encryption: - At rest: All Azure services use Microsoft-managed keys - In transit: TLS 1.2+ required for all connections
Backups: - PostgreSQL: Automated backups (7-30 day retention) - Blob Storage: Soft delete enabled (14 days) - Container images: Retained in registry
Cost Optimization¶
Current Costs (Approximate Monthly)¶
Preview Environment: ~\(200/month - Container Apps: ~\)50 - PostgreSQL: ~\(30 - Redis: ~\)20 - Blob Storage: ~\(10 - Other: ~\)90
Production Environment: ~\(500/month - Container Apps: ~\)150 - PostgreSQL: ~\(100 - Redis: ~\)50 - Blob Storage: ~\(30 - Other: ~\)170
Cost Saving Measures¶
- Auto-shutdown: Preview environment can be scaled to 0 instances outside business hours
- Storage Lifecycle: Automated lifecycle policies move old data to cool tier
- Right-sizing: Monitor resource usage and adjust instance sizes
- Reserved Instances: Consider reserved capacity for production database
Disaster Recovery¶
Backup Strategy¶
Database: - Automated backups every hour - Point-in-time restore up to 30 days - Geo-redundant backups (production only)
Blob Storage: - Soft delete enabled (14 days) - Geo-redundant (production only) - Manual export to offline storage quarterly
Container Images: - Retained in registry (last 10 per tag) - Backed up to secondary registry (manual)
Recovery Procedures¶
Database Restore:
az postgres flexible-server restore \
--resource-group rg-slidefactory-prod \
--name postgres-prod-restored \
--source-server postgres-prod \
--restore-time "2025-01-15T10:00:00Z"
Container App Rollback: See Deployment - Rollback
Blob Storage Recovery: Undelete soft-deleted blobs via Azure Portal
RTO/RPO¶
- Recovery Time Objective (RTO): 1 hour
- Recovery Point Objective (RPO): 1 hour (database), 24 hours (blobs)
Maintenance Windows¶
Preferred Maintenance Schedule: - Database: Sunday 02:00-06:00 UTC (low usage period) - Deployments: Any day, after testing on preview - Redis updates: Automatically managed by Azure (minimal downtime)
Communication: - Preview: No advance notice required - Production: Notify users 24 hours before planned maintenance