Monitoring S5 on Azure¶
Guide to monitoring S5 Slidefactory deployed on Azure Container Apps.
Quick Health Check¶
Via Web Interface¶
- Open https://slidefactory.sportfive.com (or preview URL)
- Check if login page loads
- Log in with Azure AD
- Verify dashboard loads
Expected: < 2 seconds load time
Via Health Endpoint¶
# Check health endpoint
curl https://slidefactory.sportfive.com/health
# Expected response
{"status": "healthy", "database": "connected", "redis": "connected"}
Azure Portal Monitoring¶
Container Apps Overview¶
- Go to Azure Portal → Resource Groups →
rg-slidefactory-prod - Select
slidefactory-web-prodContainer App - View Overview page
Key Metrics: - HTTP requests/sec - Average response time - Replica count - CPU/Memory usage
Log Stream (Real-Time Logs)¶
- Go to Container App → Log stream
- View real-time logs from application
What to Look For: - ✅ INFO logs: Normal operations - ⚠️ WARNING logs: Potential issues - ❌ ERROR logs: Problems requiring attention
Example Healthy Logs:
Example Problem Logs:
Metrics Dashboard¶
- Go to Container App → Metrics
- Add metrics to chart:
- CPU Usage (target: < 70%)
- Memory Usage (target: < 80%)
- HTTP Requests (track traffic)
- HTTP Response Time (target: < 2s)
Application Insights¶
Access Insights¶
- Go to Azure Portal → Application Insights →
appi-slidefactory-prod - View Application Map for dependencies
- View Performance for request times
- View Failures for errors
Key Queries (Log Analytics)¶
Failed Requests:
requests
| where success == false
| where timestamp > ago(1h)
| project timestamp, name, resultCode, duration
| order by timestamp desc
Slow Requests:
requests
| where duration > 2000 // > 2 seconds
| where timestamp > ago(1h)
| project timestamp, name, duration, url
| order by duration desc
Exception Summary:
exceptions
| where timestamp > ago(24h)
| summarize count() by type, outerMessage
| order by count_ desc
Database Connection Errors:
dependencies
| where type == "SQL"
| where success == false
| where timestamp > ago(1h)
| project timestamp, target, duration, resultCode
Alerts¶
Configured Alerts¶
| Alert | Condition | Action |
|---|---|---|
| Container App Unhealthy | Health probe fails for 3 minutes | Email ops team |
| High CPU | CPU > 80% for 5 minutes | Email ops team, auto-scale |
| High Memory | Memory > 85% for 5 minutes | Email ops team |
| Many Failed Requests | > 10% requests fail in 5 minutes | Email ops team |
| Database Connection Errors | > 5 errors in 5 minutes | Email ops team, page on-call |
| Redis Connection Errors | > 5 errors in 5 minutes | Email ops team |
Create Custom Alert¶
- Go to Container App → Alerts → Create alert rule
- Select metric (e.g., CPU Usage)
- Set condition (e.g., > 80% for 5 minutes)
- Add action group (email notification)
- Name alert and create
User Activity Monitoring¶
Active Users¶
Query (Application Insights):
Presentation Generation¶
Query (Application Insights):
customEvents
| where name == "PresentationGenerated"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h)
| render timechart
N8N Workflow Executions¶
Query (Application Insights):
dependencies
| where type == "HTTP"
| where target contains "n8n"
| where timestamp > ago(24h)
| summarize count() by bin(timestamp, 1h), success
| render timechart
Database Monitoring¶
Via Azure Portal¶
- Go to PostgreSQL Flexible Server → Monitoring → Metrics
- Add metrics:
- Connections (track connection pool usage)
- CPU Percent (target: < 70%)
- Memory Percent (target: < 80%)
- IOPS (disk I/O)
Query Performance¶
Slow Queries (connect to database):
-- Enable pg_stat_statements extension
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
-- Find slow queries (> 1 second average)
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
WHERE mean_time > 1000 -- milliseconds
ORDER BY mean_time DESC
LIMIT 10;
Database Size¶
-- Check database size
SELECT
pg_database.datname,
pg_size_pretty(pg_database_size(pg_database.datname)) AS size
FROM pg_database
WHERE datname = 'slidefactory';
-- Check table sizes
SELECT
schemaname || '.' || tablename AS table,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size
FROM pg_tables
WHERE schemaname = 'public'
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
Redis Monitoring¶
Via Azure Portal¶
- Go to Azure Cache for Redis → Metrics
- Add metrics:
- Connected Clients (should be steady)
- Server Load (target: < 70%)
- Cache Hits / Cache Misses (ratio)
- Used Memory (target: < 80%)
Redis CLI Monitoring¶
# Connect to Redis (with TLS)
redis-cli -h redis-prod.redis.cache.windows.net \
-p 6380 \
-a <password> \
--tls
# Check memory usage
INFO memory
# Check connected clients
CLIENT LIST
# Monitor commands in real-time
MONITOR
Storage Monitoring¶
Via Azure Portal¶
- Go to Storage Account → Metrics
- Add metrics:
- Transactions (request count)
- Success E2E Latency (response time)
- Used Capacity (storage usage)
- Egress (data transfer out)
Storage Analytics¶
Large Objects:
# List largest files in presentations container
az storage blob list \
--account-name slidefactoryprod \
--container-name presentations \
--query "[].{name:name, size:properties.contentLength}" \
--output table \
| sort -k2 -rn | head -20
Storage Usage by Container:
# Get size of each container
for container in presentations templates documents; do
echo -n "$container: "
az storage blob list \
--account-name slidefactoryprod \
--container-name $container \
--query "sum([].properties.contentLength)" \
--output tsv | awk '{printf "%.2f GB\n", $1/1024/1024/1024}'
done
Cost Monitoring¶
View Current Costs¶
- Go to Azure Portal → Cost Management → Cost analysis
- Filter by resource group:
rg-slidefactory-prod - Group by: Service name or Resource
Expected Monthly Costs: - Preview: ~\(200/month - Production: ~\)500/month
Cost Alerts¶
Set up budget alerts: 1. Go to Cost Management → Budgets 2. Create budget with threshold (e.g., $600/month for production) 3. Set alerts at 80%, 90%, 100% of budget
Performance Baselines¶
Expected Performance¶
| Metric | Target | Warning | Critical |
|---|---|---|---|
| Page Load Time | < 1s | > 2s | > 5s |
| API Response Time | < 500ms | > 1s | > 2s |
| Presentation Generation | 2-5 min | > 10 min | > 30 min |
| Database Query Time | < 100ms | > 500ms | > 1s |
| CPU Usage | < 50% | > 70% | > 85% |
| Memory Usage | < 60% | > 80% | > 90% |
Daily Checks (Production)¶
- Check Container App status (green in Azure Portal)
- Review errors in Application Insights (< 5 errors/hour)
- Check database connections (no connection errors)
- Verify Redis is responsive (< 10ms latency)
- Review overnight deployments (if any)
Weekly Checks (Production)¶
- Review performance trends (response times stable)
- Check storage usage growth (plan cleanup if needed)
- Review database size (plan optimization if > 50GB)
- Audit security alerts (review any security warnings)
- Review costs (within budget)
Troubleshooting Dashboard¶
Quick Links: - Azure Portal - Application Insights - Container Apps - S5 Troubleshooting Guide