Skip to content

Monitoring S5 on Azure

Guide to monitoring S5 Slidefactory deployed on Azure Container Apps.

Quick Health Check

Via Web Interface

  1. Open https://slidefactory.sportfive.com (or preview URL)
  2. Check if login page loads
  3. Log in with Azure AD
  4. Verify dashboard loads

Expected: < 2 seconds load time

Via Health Endpoint

# Check health endpoint
curl https://slidefactory.sportfive.com/health

# Expected response
{"status": "healthy", "database": "connected", "redis": "connected"}

Azure Portal Monitoring

Container Apps Overview

  1. Go to Azure Portal → Resource Groups → rg-slidefactory-prod
  2. Select slidefactory-web-prod Container App
  3. View Overview page

Key Metrics: - HTTP requests/sec - Average response time - Replica count - CPU/Memory usage

Log Stream (Real-Time Logs)

  1. Go to Container App → Log stream
  2. View real-time logs from application

What to Look For: - ✅ INFO logs: Normal operations - ⚠️ WARNING logs: Potential issues - ❌ ERROR logs: Problems requiring attention

Example Healthy Logs:

INFO:     Application startup complete
INFO:     GET /health 200 OK
INFO:     GET /presentations/ 200 OK

Example Problem Logs:

ERROR:    Database connection failed
ERROR:    Redis connection timeout
ERROR:    500 Internal Server Error

Metrics Dashboard

  1. Go to Container App → Metrics
  2. Add metrics to chart:
  3. CPU Usage (target: < 70%)
  4. Memory Usage (target: < 80%)
  5. HTTP Requests (track traffic)
  6. HTTP Response Time (target: < 2s)

Application Insights

Access Insights

  1. Go to Azure Portal → Application Insights → appi-slidefactory-prod
  2. View Application Map for dependencies
  3. View Performance for request times
  4. View Failures for errors

Key Queries (Log Analytics)

Failed Requests:

requests
| where success == false
| where timestamp > ago(1h)
| project timestamp, name, resultCode, duration
| order by timestamp desc

Slow Requests:

requests
| where duration > 2000  // > 2 seconds
| where timestamp > ago(1h)
| project timestamp, name, duration, url
| order by duration desc

Exception Summary:

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc

Database Connection Errors:

dependencies
| where type == "SQL"
| where success == false
| where timestamp > ago(1h)
| project timestamp, target, duration, resultCode

Alerts

Configured Alerts

Alert Condition Action
Container App Unhealthy Health probe fails for 3 minutes Email ops team
High CPU CPU > 80% for 5 minutes Email ops team, auto-scale
High Memory Memory > 85% for 5 minutes Email ops team
Many Failed Requests > 10% requests fail in 5 minutes Email ops team
Database Connection Errors > 5 errors in 5 minutes Email ops team, page on-call
Redis Connection Errors > 5 errors in 5 minutes Email ops team

Create Custom Alert

  1. Go to Container App → AlertsCreate alert rule
  2. Select metric (e.g., CPU Usage)
  3. Set condition (e.g., > 80% for 5 minutes)
  4. Add action group (email notification)
  5. Name alert and create

User Activity Monitoring

Active Users

Query (Application Insights):

customEvents
| where name == "UserLogin"
| where timestamp > ago(1h)
| summarize count() by user_id

Presentation Generation

Query (Application Insights):

customEvents
| where name == "PresentationGenerated"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h)
| render timechart

N8N Workflow Executions

Query (Application Insights):

dependencies
| where type == "HTTP"
| where target contains "n8n"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h), success
| render timechart

Database Monitoring

Via Azure Portal

  1. Go to PostgreSQL Flexible Server → MonitoringMetrics
  2. Add metrics:
  3. Connections (track connection pool usage)
  4. CPU Percent (target: < 70%)
  5. Memory Percent (target: < 80%)
  6. IOPS (disk I/O)

Query Performance

Slow Queries (connect to database):

-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries (> 1 second average)
SELECT
  query,
  calls,
  total_time,
  mean_time,
  max_time
FROM pg_stat_statements
WHERE mean_time > 1000  -- milliseconds
ORDER BY mean_time DESC
LIMIT 10;

Database Size

-- Check database size
SELECT
  pg_database.datname,
  pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
WHERE datname = 'slidefactory';

-- Check table sizes
SELECT
  schemaname || '.' || tablename AS table,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Redis Monitoring

Via Azure Portal

  1. Go to Azure Cache for Redis → Metrics
  2. Add metrics:
  3. Connected Clients (should be steady)
  4. Server Load (target: < 70%)
  5. Cache Hits / Cache Misses (ratio)
  6. Used Memory (target: < 80%)

Redis CLI Monitoring

# Connect to Redis (with TLS)
redis-cli -h redis-prod.redis.cache.windows.net \
  -p 6380 \
  -a <password> \
  --tls

# Check memory usage
INFO memory

# Check connected clients
CLIENT LIST

# Monitor commands in real-time
MONITOR

Storage Monitoring

Via Azure Portal

  1. Go to Storage Account → Metrics
  2. Add metrics:
  3. Transactions (request count)
  4. Success E2E Latency (response time)
  5. Used Capacity (storage usage)
  6. Egress (data transfer out)

Storage Analytics

Large Objects:

# List largest files in presentations container
az storage blob list \
  --account-name slidefactoryprod \
  --container-name presentations \
  --query "[].{name:name, size:properties.contentLength}" \
  --output table \
  | sort -k2 -rn | head -20

Storage Usage by Container:

# Get size of each container
for container in presentations templates documents; do
  echo -n "$container: "
  az storage blob list \
    --account-name slidefactoryprod \
    --container-name $container \
    --query "sum([].properties.contentLength)" \
    --output tsv | awk '{printf "%.2f GB\n", $1/1024/1024/1024}'
done

Cost Monitoring

View Current Costs

  1. Go to Azure Portal → Cost Management → Cost analysis
  2. Filter by resource group: rg-slidefactory-prod
  3. Group by: Service name or Resource

Expected Monthly Costs: - Preview: ~\(200/month - Production: ~\)500/month

Cost Alerts

Set up budget alerts: 1. Go to Cost Management → Budgets 2. Create budget with threshold (e.g., $600/month for production) 3. Set alerts at 80%, 90%, 100% of budget

Performance Baselines

Expected Performance

Metric Target Warning Critical
Page Load Time < 1s > 2s > 5s
API Response Time < 500ms > 1s > 2s
Presentation Generation 2-5 min > 10 min > 30 min
Database Query Time < 100ms > 500ms > 1s
CPU Usage < 50% > 70% > 85%
Memory Usage < 60% > 80% > 90%

Daily Checks (Production)

  • Check Container App status (green in Azure Portal)
  • Review errors in Application Insights (< 5 errors/hour)
  • Check database connections (no connection errors)
  • Verify Redis is responsive (< 10ms latency)
  • Review overnight deployments (if any)

Weekly Checks (Production)

  • Review performance trends (response times stable)
  • Check storage usage growth (plan cleanup if needed)
  • Review database size (plan optimization if > 50GB)
  • Audit security alerts (review any security warnings)
  • Review costs (within budget)

Troubleshooting Dashboard

Quick Links: - Azure Portal - Application Insights - Container Apps - S5 Troubleshooting Guide