Monitoring S5 on Azure¶

Guide to monitoring S5 Slidefactory deployed on Azure Container Apps.

Quick Health Check¶

Via Web Interface¶

Open https://slidefactory.sportfive.com (or preview URL)
Check if login page loads
Log in with Azure AD
Verify dashboard loads

Expected: < 2 seconds load time

Via Health Endpoint¶

# Check health endpoint
curl https://slidefactory.sportfive.com/health

# Expected response
{"status": "healthy", "database": "connected", "redis": "connected"}

Azure Portal Monitoring¶

Container Apps Overview¶

Go to Azure Portal → Resource Groups → rg-slidefactory-prod
Select slidefactory-web-prod Container App
View Overview page

Key Metrics: - HTTP requests/sec - Average response time - Replica count - CPU/Memory usage

Log Stream (Real-Time Logs)¶

Go to Container App → Log stream
View real-time logs from application

What to Look For: - ✅ INFO logs: Normal operations - ⚠️ WARNING logs: Potential issues - ❌ ERROR logs: Problems requiring attention

Example Healthy Logs:

INFO:     Application startup complete
INFO:     GET /health 200 OK
INFO:     GET /presentations/ 200 OK

Example Problem Logs:

ERROR:    Database connection failed
ERROR:    Redis connection timeout
ERROR:    500 Internal Server Error

Metrics Dashboard¶

Go to Container App → Metrics
Add metrics to chart:
CPU Usage (target: < 70%)
Memory Usage (target: < 80%)
HTTP Requests (track traffic)
HTTP Response Time (target: < 2s)

Application Insights¶

Access Insights¶

Go to Azure Portal → Application Insights → appi-slidefactory-prod
View Application Map for dependencies
View Performance for request times
View Failures for errors

Key Queries (Log Analytics)¶

Failed Requests:

requests
| where success == false
| where timestamp > ago(1h)
| project timestamp, name, resultCode, duration
| order by timestamp desc

Slow Requests:

requests
| where duration > 2000  // > 2 seconds
| where timestamp > ago(1h)
| project timestamp, name, duration, url
| order by duration desc

Exception Summary:

exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc

Database Connection Errors:

dependencies
| where type == "SQL"
| where success == false
| where timestamp > ago(1h)
| project timestamp, target, duration, resultCode

Alerts¶

Configured Alerts¶

Alert	Condition	Action
Container App Unhealthy	Health probe fails for 3 minutes	Email ops team
High CPU	CPU > 80% for 5 minutes	Email ops team, auto-scale
High Memory	Memory > 85% for 5 minutes	Email ops team
Many Failed Requests	> 10% requests fail in 5 minutes	Email ops team
Database Connection Errors	> 5 errors in 5 minutes	Email ops team, page on-call
Redis Connection Errors	> 5 errors in 5 minutes	Email ops team

Create Custom Alert¶

Go to Container App → Alerts → Create alert rule
Select metric (e.g., CPU Usage)
Set condition (e.g., > 80% for 5 minutes)
Add action group (email notification)
Name alert and create

User Activity Monitoring¶

Active Users¶

Query (Application Insights):

customEvents
| where name == "UserLogin"
| where timestamp > ago(1h)
| summarize count() by user_id

Presentation Generation¶

Query (Application Insights):

customEvents
| where name == "PresentationGenerated"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h)
| render timechart

N8N Workflow Executions¶

Query (Application Insights):

dependencies
| where type == "HTTP"
| where target contains "n8n"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h), success
| render timechart

Database Monitoring¶

Via Azure Portal¶

Go to PostgreSQL Flexible Server → Monitoring → Metrics
Add metrics:
Connections (track connection pool usage)
CPU Percent (target: < 70%)
Memory Percent (target: < 80%)
IOPS (disk I/O)

Query Performance¶

Slow Queries (connect to database):

-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

-- Find slow queries (> 1 second average)
SELECT
  query,
  calls,
  total_time,
  mean_time,
  max_time
FROM pg_stat_statements
WHERE mean_time > 1000  -- milliseconds
ORDER BY mean_time DESC
LIMIT 10;

Database Size¶

-- Check database size
SELECT
  pg_database.datname,
  pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
WHERE datname = 'slidefactory';

-- Check table sizes
SELECT
  schemaname || '.' || tablename AS table,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

Redis Monitoring¶

Via Azure Portal¶

Go to Azure Cache for Redis → Metrics
Add metrics:
Connected Clients (should be steady)
Server Load (target: < 70%)
Cache Hits / Cache Misses (ratio)
Used Memory (target: < 80%)

Redis CLI Monitoring¶

# Connect to Redis (with TLS)
redis-cli -h redis-prod.redis.cache.windows.net \
  -p 6380 \
  -a <password> \
  --tls

# Check memory usage
INFO memory

# Check connected clients
CLIENT LIST

# Monitor commands in real-time
MONITOR

Storage Monitoring¶

Via Azure Portal¶

Go to Storage Account → Metrics
Add metrics:
Transactions (request count)
Success E2E Latency (response time)
Used Capacity (storage usage)
Egress (data transfer out)

Storage Analytics¶

Large Objects:

# List largest files in presentations container
az storage blob list \
  --account-name slidefactoryprod \
  --container-name presentations \
  --query "[].{name:name, size:properties.contentLength}" \
  --output table \
  | sort -k2 -rn | head -20

Storage Usage by Container:

# Get size of each container
for container in presentations templates documents; do
  echo -n "$container: "
  az storage blob list \
    --account-name slidefactoryprod \
    --container-name $container \
    --query "sum([].properties.contentLength)" \
    --output tsv | awk '{printf "%.2f GB\n", $1/1024/1024/1024}'
done

Cost Monitoring¶

View Current Costs¶

Go to Azure Portal → Cost Management → Cost analysis
Filter by resource group: rg-slidefactory-prod
Group by: Service name or Resource

Expected Monthly Costs: - Preview: ~$200/month - Production: ~$500/month

Cost Alerts¶

Set up budget alerts: 1. Go to Cost Management → Budgets 2. Create budget with threshold (e.g., $600/month for production) 3. Set alerts at 80%, 90%, 100% of budget

Performance Baselines¶

Expected Performance¶

Metric	Target	Warning	Critical
Page Load Time	< 1s	> 2s	> 5s
API Response Time	< 500ms	> 1s	> 2s
Presentation Generation	2-5 min	> 10 min	> 30 min
Database Query Time	< 100ms	> 500ms	> 1s
CPU Usage	< 50%	> 70%	> 85%
Memory Usage	< 60%	> 80%	> 90%

Daily Checks (Production)¶

Check Container App status (green in Azure Portal)
Review errors in Application Insights (< 5 errors/hour)
Check database connections (no connection errors)
Verify Redis is responsive (< 10ms latency)
Review overnight deployments (if any)

Weekly Checks (Production)¶

Review performance trends (response times stable)
Check storage usage growth (plan cleanup if needed)
Review database size (plan optimization if > 50GB)
Audit security alerts (review any security warnings)
Review costs (within budget)

Troubleshooting Dashboard¶

Quick Links: - Azure Portal - Application Insights - Container Apps - S5 Troubleshooting Guide

Monitoring S5 on Azure¶

Quick Health Check¶

Via Web Interface¶

Via Health Endpoint¶

Azure Portal Monitoring¶

Container Apps Overview¶

Log Stream (Real-Time Logs)¶

Metrics Dashboard¶

Application Insights¶

Access Insights¶

Key Queries (Log Analytics)¶

Alerts¶

Configured Alerts¶

Create Custom Alert¶

User Activity Monitoring¶

Active Users¶

Presentation Generation¶

N8N Workflow Executions¶

Database Monitoring¶

Via Azure Portal¶

Query Performance¶

Database Size¶

Redis Monitoring¶

Via Azure Portal¶

Redis CLI Monitoring¶

Storage Monitoring¶

Via Azure Portal¶

Storage Analytics¶

Cost Monitoring¶

View Current Costs¶

Cost Alerts¶

Performance Baselines¶

Expected Performance¶

Daily Checks (Production)¶

Weekly Checks (Production)¶

Troubleshooting Dashboard¶

Related Documentation¶