Skip to content

Azure Infrastructure Details

Complete reference for S5's Azure resources and configuration.

Resource Overview

Preview Environment (rg-slidefactory-preview)

Resource Type Purpose Configuration
slidefactory-web-preview Container App Web application FastAPI on port 8000, autoscaling 1-3 instances
slidefactory-worker-preview Container App Background tasks Celery worker, autoscaling 1-2 instances
postgres-preview PostgreSQL Flexible Server Database v15 with pgvector extension
redis-preview Azure Cache for Redis Caching & Celery broker Standard tier, 1GB
slidefactorypreview Storage Account File storage Blob storage with containers
s5slidefactory Container Registry Docker images Shared with production

Production Environment (rg-slidefactory-prod)

Resource Type Purpose Configuration
slidefactory-web-prod Container App Web application FastAPI on port 8000, autoscaling 2-5 instances
slidefactory-worker-prod Container App Background tasks Celery worker, autoscaling 1-3 instances
postgres-prod PostgreSQL Flexible Server Database v15 with pgvector extension
redis-prod Azure Cache for Redis Caching & Celery broker Standard tier, 2.5GB
slidefactoryprod Storage Account File storage Blob storage with containers
s5slidefactory Container Registry Docker images Shared with preview

Container Apps

Web Service Configuration

Image: s5slidefactory.azurecr.io/slidefactory:{preview|prod}-{sha}

Environment Variables: - DATABASE_URL - PostgreSQL connection string - REDIS_HOST, REDIS_PORT, REDIS_PASSWORD - Redis configuration - AZURE_STORAGE_* - Blob storage credentials - AZURE_TENANT_ID, AZURE_CLIENT_ID, AZURE_CLIENT_SECRET - Azure AD auth - OPENAI_API_KEY / AZURE_OPENAI_* - AI provider credentials - N8N_API_URL, N8N_API_KEY - N8N integration

Health Probes: - Liveness: GET /health (30s interval) - Readiness: GET /health (10s interval) - Startup: GET /health (5s interval, 60s timeout)

Scaling: - Preview: 1-3 instances (CPU > 75% triggers scale-up) - Production: 2-5 instances (CPU > 70% triggers scale-up)

Resources: - Preview: 0.5 CPU, 1GB memory per instance - Production: 1.0 CPU, 2GB memory per instance

Worker Service Configuration

Image: Same as web service

Command Override: /code/scripts/start-worker-azure.sh

Environment Variables: Same as web service (shares configuration)

Scaling: - Preview: 1-2 instances (CPU > 80% triggers scale-up) - Production: 1-3 instances (CPU > 75% triggers scale-up)

Resources: - Preview: 0.5 CPU, 1GB memory per instance - Production: 1.0 CPU, 2GB memory per instance

Database (PostgreSQL)

Configuration

Version: PostgreSQL 15 Extensions: pgvector (for vector similarity search) Tier: - Preview: Burstable (B1ms, 1 vCore, 2GB RAM) - Production: General Purpose (D2s_v3, 2 vCores, 8GB RAM)

Backup: - Retention: 7 days (preview), 30 days (production) - Geo-redundant: No (preview), Yes (production)

Networking: - Public access: Disabled - Private endpoint: Connected to Container Apps virtual network - Firewall: Allow Azure services

Database Schema

Managed by Alembic migrations in slidefactory-core package.

Key tables: - users - User accounts - api_keys - API authentication - templates - PowerPoint templates - n8n_processes - N8N workflow executions - context_documents, context_chunks - Document storage with vector embeddings

Maintenance

Automatic: - Minor version updates: Enabled - Maintenance window: Sunday 02:00-06:00 UTC

Manual: - Major version upgrades: Requires manual planning - Index maintenance: Automatic via PostgreSQL autovacuum

Redis Cache

Configuration

Version: Redis 6.x Tier: - Preview: Standard C1 (1GB) - Production: Standard C2 (2.5GB)

Usage: - Session storage (user sessions) - Celery broker (task queue) - Celery result backend (task results) - Application caching

Networking: - Public access: Disabled - Private endpoint: Connected to Container Apps virtual network - TLS: Required

Eviction Policy: allkeys-lru (least recently used)

Persistence: RDB snapshots (every 1 hour)

Blob Storage

Configuration

Redundancy: - Preview: LRS (Locally Redundant Storage) - Production: GRS (Geo-Redundant Storage)

Access Tier: Hot (frequently accessed)

Networking: - Public access: Enabled (with SAS tokens for downloads) - Firewall: Allow Azure services + specific IPs

Containers

Container Purpose Access Level
presentations Generated presentation files Private (SAS URLs for downloads)
templates PowerPoint template files Private
documents Uploaded documents for context/RAG Private
static S5 branding static assets Public read

Lifecycle Policies

Presentations Container: - Move to cool tier after 90 days - Delete after 1 year (preview), 2 years (production)

Templates Container: - No automatic deletion - Manual cleanup required

Container Registry

Configuration

SKU: Standard Admin Access: Disabled (use service principal) Geo-replication: No

Images: - slidefactory:preview-{sha} - Preview builds - slidefactory:prod-{sha} - Production builds

Retention: Keep last 10 images per tag prefix

Scanning: Enabled (Azure Defender for container registries)

Networking

Virtual Network Integration

Container Apps Environment: - Virtual network: vnet-slidefactory (10.0.0.0/16) - Subnet: snet-apps (10.0.1.0/24)

Private Endpoints: - PostgreSQL: 10.0.2.4 - Redis: 10.0.2.5 - Storage Account: 10.0.2.6

DNS: - Private DNS zones for private endpoints - Azure-provided DNS for public endpoints

Firewall Rules

PostgreSQL: - Allow: Azure services - Allow: Container Apps subnet (10.0.1.0/24)

Redis: - Allow: Container Apps subnet (10.0.1.0/24)

Storage Account: - Allow: Azure services - Allow: Container Apps subnet (10.0.1.0/24) - Allow: Office IPs (for management)

Monitoring & Observability

Application Insights

Resource: appi-slidefactory-{preview|prod}

Telemetry: - HTTP requests and responses - Dependencies (database, Redis, storage) - Exceptions and errors - Custom events and metrics

Retention: 90 days

Log Analytics

Resource: log-slidefactory-{preview|prod}

Data Sources: - Container Apps logs (stdout/stderr) - Application Insights telemetry - Azure resource logs (database, Redis, storage)

Retention: 30 days (preview), 90 days (production)

Alerts

Configured Alerts: - Container App unhealthy (health probe failures) - High CPU usage (> 80% for 5 minutes) - High memory usage (> 85% for 5 minutes) - Database connection failures - Redis connection failures - Storage account throttling

Notification: Email to ops team

Security

Identity & Access

Authentication: - Azure AD integration for user login - API keys for programmatic access

Service Principal: - GitHub Actions deployment: Contributor on resource groups - Container Apps: Identity assigned for accessing other Azure resources

Secrets Management: - Environment variables in Container Apps configuration (encrypted at rest) - Azure Key Vault not used (secrets in GitHub Secrets and Container Apps)

Network Security

TLS: - HTTPS enforced for web traffic (Container Apps automatic TLS) - TLS required for Redis - TLS required for PostgreSQL

Firewall: - Database and Redis only accessible from Container Apps subnet - Storage account accessible from Container Apps + management IPs

Data Protection

Encryption: - At rest: All Azure services use Microsoft-managed keys - In transit: TLS 1.2+ required for all connections

Backups: - PostgreSQL: Automated backups (7-30 day retention) - Blob Storage: Soft delete enabled (14 days) - Container images: Retained in registry

Cost Optimization

Current Costs (Approximate Monthly)

Preview Environment: ~\(200/month - Container Apps: ~\)50 - PostgreSQL: ~\(30 - Redis: ~\)20 - Blob Storage: ~\(10 - Other: ~\)90

Production Environment: ~\(500/month - Container Apps: ~\)150 - PostgreSQL: ~\(100 - Redis: ~\)50 - Blob Storage: ~\(30 - Other: ~\)170

Cost Saving Measures

  1. Auto-shutdown: Preview environment can be scaled to 0 instances outside business hours
  2. Storage Lifecycle: Automated lifecycle policies move old data to cool tier
  3. Right-sizing: Monitor resource usage and adjust instance sizes
  4. Reserved Instances: Consider reserved capacity for production database

Disaster Recovery

Backup Strategy

Database: - Automated backups every hour - Point-in-time restore up to 30 days - Geo-redundant backups (production only)

Blob Storage: - Soft delete enabled (14 days) - Geo-redundant (production only) - Manual export to offline storage quarterly

Container Images: - Retained in registry (last 10 per tag) - Backed up to secondary registry (manual)

Recovery Procedures

Database Restore:

az postgres flexible-server restore \
  --resource-group rg-slidefactory-prod \
  --name postgres-prod-restored \
  --source-server postgres-prod \
  --restore-time "2025-01-15T10:00:00Z"

Container App Rollback: See Deployment - Rollback

Blob Storage Recovery: Undelete soft-deleted blobs via Azure Portal

RTO/RPO

  • Recovery Time Objective (RTO): 1 hour
  • Recovery Point Objective (RPO): 1 hour (database), 24 hours (blobs)

Maintenance Windows

Preferred Maintenance Schedule: - Database: Sunday 02:00-06:00 UTC (low usage period) - Deployments: Any day, after testing on preview - Redis updates: Automatically managed by Azure (minimal downtime)

Communication: - Preview: No advance notice required - Production: Notify users 24 hours before planned maintenance