N8N Scaling Assessment and Improvement Plan¶

Date: 2025-11-10 Branch: preview Author: Claude Code Assessment

Executive Summary¶

The current n8n setup runs in regular (non-scalable) mode with a single instance handling all workloads. This creates a bottleneck for workflow execution and limits horizontal scaling. This assessment provides a comprehensive plan to migrate both Docker (local) and Azure (production/preview) environments to n8n Queue Mode with dedicated worker instances for improved scalability and performance.

Current State Analysis¶

Docker Environment (Local Development)¶

Location: docker-compose.override.yml (lines 44-80)

Current Configuration:

n8n:
  image: n8nio/n8n:latest
  environment:
    - EXECUTIONS_MODE=regular  # ⚠️ Problem: Single-process mode
    - DB_TYPE=postgresdb
    - DB_POSTGRESDB_HOST=postgres
    # No Redis queue configuration

Issues: - Single n8n instance handles both UI/API and workflow execution - EXECUTIONS_MODE=regular means no horizontal scaling capability - No worker processes for parallel workflow execution - Redis is available in the stack but not used by n8n - Cannot scale to handle multiple simultaneous workflows efficiently

Azure Environment (Production/Preview)¶

Location: .github/workflows/preview.yml, .github/workflows/production.yml

Current Configuration: - n8n deployed separately (not in GitHub Actions workflows) - S5 Slidefactory connects via N8N_API_URL and N8N_API_KEY environment variables - Deployment details unknown, but likely single instance based on architecture

Issues: - No visibility into current n8n deployment configuration - Likely same single-instance limitation as Docker setup - Cannot scale workers independently from main instance - No documented infrastructure-as-code for n8n deployment

N8N Queue Mode Architecture¶

Overview¶

N8N supports horizontal scaling through Queue Mode, which separates concerns:

┌─────────────────────────────────────────────────────────────┐
│                     N8N QUEUE MODE ARCHITECTURE              │
├─────────────────────────────────────────────────────────────┤
│                                                              │
│  ┌──────────────────┐         ┌──────────────────┐         │
│  │   Main Instance   │         │      Redis       │         │
│  │                  │         │   (Message Queue) │         │
│  │  - UI/Editor     │────────▶│                  │         │
│  │  - API           │         │  - Job Queue     │         │
│  │  - Webhooks      │         │  - Job Results   │         │
│  │  - Scheduling    │         │                  │         │
│  └──────────────────┘         └──────────────────┘         │
│                                         │                    │
│                                         │ Jobs               │
│                                         ▼                    │
│               ┌─────────────────────────────────────────┐   │
│               │                                          │   │
│  ┌────────────┼──────────────┬──────────────┬──────────┼─┐ │
│  │ Worker 1   │  Worker 2    │  Worker 3    │  Worker N│ │ │
│  │            │              │              │          │ │ │
│  │ Executes   │  Executes    │  Executes    │  Executes│ │ │
│  │ Workflows  │  Workflows   │  Workflows   │  Workflows│ │ │
│  └────────────┴──────────────┴──────────────┴──────────┴─┘ │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │            PostgreSQL Database                        │  │
│  │        (Shared by all instances)                      │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

Key Components¶

Main Instance (1 replica)
Handles UI, API, webhooks, and cron triggers
Writes workflow execution jobs to Redis queue
Does NOT execute workflows
Environment: EXECUTIONS_MODE=queue
Worker Instances (N replicas, scalable)
Pull jobs from Redis queue
Execute workflows in parallel
Write results back to Redis and PostgreSQL
Environment: EXECUTIONS_MODE=queue + worker-specific config
Redis (Message Broker)
Stores pending workflow jobs
Manages job distribution to workers
Stores execution results temporarily
Bull queue library under the hood
PostgreSQL (Shared Database)
Workflow definitions
Execution history and logs
Credentials and settings
Must be accessible by all instances

Benefits¶

Horizontal Scaling: Add/remove workers based on workload
High Availability: Workers can fail without affecting the UI
Performance: Parallel execution of multiple workflows
Resource Isolation: Separate resources for UI and execution
Cost Optimization: Scale workers independently

Improvement Plan¶

Phase 1: Docker Environment (Local Development)¶

1.1 Update docker-compose.override.yml¶

Changes Required:

services:
  # N8N Main Instance - UI, API, Webhooks
  n8n:
    image: n8nio/n8n:latest
    ports:
      - "5678:5678"
    environment:
      # Queue Mode Configuration
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PORT=6379
      - QUEUE_BULL_REDIS_DB=1  # Separate from Celery (uses DB 0)

      # Authentication
      - N8N_BASIC_AUTH_ACTIVE=true
      - N8N_BASIC_AUTH_USER=admin
      - N8N_BASIC_AUTH_PASSWORD=admin

      # Server Configuration
      - N8N_HOST=localhost
      - N8N_PORT=5678
      - N8N_PROTOCOL=http
      - WEBHOOK_URL=http://localhost:5678/
      - GENERIC_TIMEZONE=Europe/Berlin

      # Database
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=n8n
      - DB_POSTGRESDB_USER=postgres
      - DB_POSTGRESDB_PASSWORD=postgres

      # Encryption (must be same across all instances)
      - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY:-change-me-in-production}

      # Metrics
      - N8N_METRICS=true

      # User Management
      - N8N_USER_MANAGEMENT_DISABLED=false

    volumes:
      - n8n_data:/home/node/.n8n

    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy

    healthcheck:
      test: ["CMD", "wget", "--spider", "-q", "http://localhost:5678/healthz"]
      interval: 30s
      timeout: 10s
      retries: 3

  # N8N Worker Instances - Workflow Execution
  n8n-worker:
    image: n8nio/n8n:latest
    command: worker
    environment:
      # Queue Mode Configuration
      - EXECUTIONS_MODE=queue
      - QUEUE_BULL_REDIS_HOST=redis
      - QUEUE_BULL_REDIS_PORT=6379
      - QUEUE_BULL_REDIS_DB=1

      # Database (shared with main)
      - DB_TYPE=postgresdb
      - DB_POSTGRESDB_HOST=postgres
      - DB_POSTGRESDB_PORT=5432
      - DB_POSTGRESDB_DATABASE=n8n
      - DB_POSTGRESDB_USER=postgres
      - DB_POSTGRESDB_PASSWORD=postgres

      # Encryption (MUST match main instance)
      - N8N_ENCRYPTION_KEY=${N8N_ENCRYPTION_KEY:-change-me-in-production}

      # Timezone
      - GENERIC_TIMEZONE=Europe/Berlin

    volumes:
      - n8n_data:/home/node/.n8n

    depends_on:
      postgres:
        condition: service_healthy
      redis:
        condition: service_healthy
      n8n:
        condition: service_healthy

    healthcheck:
      test: ["CMD", "pgrep", "-f", "n8n worker"]
      interval: 30s
      timeout: 10s
      retries: 3

    # Resource limits (adjust based on workflow complexity)
    deploy:
      resources:
        limits:
          memory: 2G
        reservations:
          memory: 1G
      # Scale workers: docker compose up -d --scale n8n-worker=3
      replicas: 2

volumes:
  n8n_data:

Key Changes: 1. Set EXECUTIONS_MODE=queue on main instance 2. Add QUEUE_BULL_REDIS_* configuration pointing to existing Redis 3. Create n8n-worker service with command: worker 4. Use Redis DB 1 for n8n (Celery uses DB 0) 5. Set replicas: 2 for workers (easily scalable with --scale) 6. Ensure N8N_ENCRYPTION_KEY is identical across all instances

1.2 Update .env.local Template¶

Add to .env.local:

# N8N Configuration
N8N_ENCRYPTION_KEY=your-secure-encryption-key-here-32-chars-min
N8N_API_KEY=your-n8n-api-key-from-ui
N8N_API_URL=http://n8n:5678

1.3 Testing Procedure¶

# 1. Stop existing services
docker compose down

# 2. Start with queue mode
docker compose up -d

# 3. Verify main instance is running
docker compose logs n8n | grep "queue mode"

# 4. Verify workers are running
docker compose logs n8n-worker | grep "worker started"

# 5. Check Redis connection
docker compose exec redis redis-cli ping

# 6. Scale workers up
docker compose up -d --scale n8n-worker=3

# 7. Check worker count
docker compose ps | grep n8n-worker

Phase 2: Azure Environment (Production/Preview)¶

2.1 Architecture Decision¶

Option A: Azure Container Apps (Recommended) - Separate container apps for n8n main and workers - Built-in autoscaling for workers based on CPU/memory - Easier to manage within existing infrastructure - Cost-effective with consumption-based pricing

Option B: Azure Kubernetes Service (AKS) - More complex but more control - Better for very large scale (100+ workers) - Higher operational overhead - Requires Helm chart management

Recommendation: Use Option A (Azure Container Apps) for consistency with existing S5 Slidefactory deployment.

2.2 Infrastructure as Code¶

Create new deployment workflow: .github/workflows/deploy-n8n.yml

name: Deploy N8N to Azure

on:
  workflow_dispatch:
    inputs:
      environment:
        description: 'Environment to deploy to'
        required: true
        type: choice
        options:
          - preview
          - production

jobs:
  deploy-n8n:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set environment variables
        id: env
        run: |
          if [ "${{ github.event.inputs.environment }}" == "production" ]; then
            echo "ENV_SUFFIX=" >> $GITHUB_OUTPUT
            echo "DATABASE_URL=${{ secrets.PROD_N8N_DATABASE_URL }}" >> $GITHUB_OUTPUT
            echo "REDIS_HOST=${{ secrets.PROD_REDIS_HOST }}" >> $GITHUB_OUTPUT
            echo "REDIS_PASSWORD=${{ secrets.PROD_REDIS_PASSWORD }}" >> $GITHUB_OUTPUT
          else
            echo "ENV_SUFFIX=-preview" >> $GITHUB_OUTPUT
            echo "DATABASE_URL=${{ secrets.PREVIEW_N8N_DATABASE_URL }}" >> $GITHUB_OUTPUT
            echo "REDIS_HOST=${{ secrets.PREVIEW_REDIS_HOST }}" >> $GITHUB_OUTPUT
            echo "REDIS_PASSWORD=${{ secrets.PREVIEW_REDIS_PASSWORD }}" >> $GITHUB_OUTPUT
          fi

      - name: Login to Azure
        uses: azure/login@v2
        with:
          auth-type: SERVICE_PRINCIPAL
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      # Deploy N8N Main Instance
      - name: Deploy N8N Main Instance
        run: |
          az containerapp create \
            --name n8n-main${{ steps.env.outputs.ENV_SUFFIX }} \
            --resource-group ${{ secrets.AZURE_RESOURCE_GROUP }} \
            --image n8nio/n8n:latest \
            --environment ${{ secrets.AZURE_CONTAINER_APP_ENVIRONMENT }} \
            --ingress external \
            --target-port 5678 \
            --min-replicas 1 \
            --max-replicas 1 \
            --cpu 1 \
            --memory 2.0Gi \
            --env-vars \
              EXECUTIONS_MODE=queue \
              QUEUE_BULL_REDIS_HOST=${{ steps.env.outputs.REDIS_HOST }} \
              QUEUE_BULL_REDIS_PORT=6380 \
              QUEUE_BULL_REDIS_PASSWORD=secretRef:redis-password \
              QUEUE_BULL_REDIS_DB=1 \
              QUEUE_BULL_REDIS_TLS=true \
              DB_TYPE=postgresdb \
              DB_POSTGRESDB_HOST=secretRef:db-host \
              DB_POSTGRESDB_DATABASE=n8n \
              DB_POSTGRESDB_USER=secretRef:db-user \
              DB_POSTGRESDB_PASSWORD=secretRef:db-password \
              N8N_ENCRYPTION_KEY=secretRef:encryption-key \
              N8N_BASIC_AUTH_ACTIVE=true \
              N8N_BASIC_AUTH_USER=admin \
              N8N_BASIC_AUTH_PASSWORD=secretRef:n8n-password \
              N8N_METRICS=true \
              N8N_PROTOCOL=https \
              GENERIC_TIMEZONE=Europe/Berlin

      # Deploy N8N Workers
      - name: Deploy N8N Workers
        run: |
          az containerapp create \
            --name n8n-worker${{ steps.env.outputs.ENV_SUFFIX }} \
            --resource-group ${{ secrets.AZURE_RESOURCE_GROUP }} \
            --image n8nio/n8n:latest \
            --environment ${{ secrets.AZURE_CONTAINER_APP_ENVIRONMENT }} \
            --ingress internal \
            --target-port 5678 \
            --min-replicas 2 \
            --max-replicas 10 \
            --cpu 1 \
            --memory 2.0Gi \
            --command "n8n" "worker" \
            --scale-rule-name cpu-scaling \
            --scale-rule-type cpu \
            --scale-rule-metadata "type=Utilization" "value=70" \
            --scale-rule-name memory-scaling \
            --scale-rule-type memory \
            --scale-rule-metadata "type=Utilization" "value=80" \
            --env-vars \
              EXECUTIONS_MODE=queue \
              QUEUE_BULL_REDIS_HOST=${{ steps.env.outputs.REDIS_HOST }} \
              QUEUE_BULL_REDIS_PORT=6380 \
              QUEUE_BULL_REDIS_PASSWORD=secretRef:redis-password \
              QUEUE_BULL_REDIS_DB=1 \
              QUEUE_BULL_REDIS_TLS=true \
              DB_TYPE=postgresdb \
              DB_POSTGRESDB_HOST=secretRef:db-host \
              DB_POSTGRESDB_DATABASE=n8n \
              DB_POSTGRESDB_USER=secretRef:db-user \
              DB_POSTGRESDB_PASSWORD=secretRef:db-password \
              N8N_ENCRYPTION_KEY=secretRef:encryption-key \
              GENERIC_TIMEZONE=Europe/Berlin

2.3 Required Secrets¶

Add to GitHub Actions Secrets:

Preview Environment: - PREVIEW_N8N_DATABASE_URL: PostgreSQL connection string for n8n database - PREVIEW_REDIS_HOST: Azure Redis hostname - PREVIEW_REDIS_PASSWORD: Azure Redis password - N8N_ENCRYPTION_KEY: 32+ character encryption key (CRITICAL: must be same across all instances) - N8N_ADMIN_PASSWORD: Admin password for n8n UI

Production Environment: - PROD_N8N_DATABASE_URL: PostgreSQL connection string for n8n database - PROD_REDIS_HOST: Azure Redis hostname - PROD_REDIS_PASSWORD: Azure Redis password - Same N8N_ENCRYPTION_KEY as preview - N8N_ADMIN_PASSWORD: Admin password for n8n UI

2.4 Database Setup¶

N8N requires its own database schema. Options:

Option 1: Separate database on same PostgreSQL server

-- Run on existing Azure PostgreSQL
CREATE DATABASE n8n_preview;
CREATE DATABASE n8n_production;

Option 2: Use same database with different schema

-- Run on existing Azure PostgreSQL
CREATE SCHEMA n8n_preview;
CREATE SCHEMA n8n_production;

Update DB_POSTGRESDB_SCHEMA environment variable accordingly.

2.5 Azure Redis Configuration¶

Current Setup: S5 Slidefactory already uses Azure Redis for Celery.

N8N Redis Requirements: - Use separate Redis DB number (DB 1 for n8n, DB 0 for Celery) - Same Azure Redis instance can be shared - Or create dedicated Azure Redis instance for n8n (recommended for production)

Redis Configuration Check:

# Verify Redis max databases (default is 16)
az redis list --resource-group <resource-group> --query "[].{name:name,sku:sku}"

# Update Redis config if needed
az redis update \
  --resource-group <resource-group> \
  --name <redis-name> \
  --set redisConfiguration.maxmemory-policy=allkeys-lru

2.6 Monitoring and Observability¶

Azure Monitor Integration:

# Enable diagnostic settings for n8n container apps
az monitor diagnostic-settings create \
  --name n8n-diagnostics \
  --resource /subscriptions/<sub-id>/resourceGroups/<rg>/providers/Microsoft.App/containerApps/n8n-main \
  --logs '[{"category": "ContainerAppConsoleLogs", "enabled": true}]' \
  --metrics '[{"category": "AllMetrics", "enabled": true}]' \
  --workspace <log-analytics-workspace-id>

Key Metrics to Monitor: - Worker CPU/Memory utilization - Redis queue length (bull:n8n:* keys) - Workflow execution time - Failed workflow count - Worker autoscaling events

Alerts: - Queue length > 100 jobs - Worker CPU > 80% for 5 minutes - Failed workflow rate > 10% - Redis connection failures

Implementation Roadmap¶

Week 1: Local Development Environment¶

Day 1-2: Docker Compose Changes - [ ] Update docker-compose.override.yml with queue mode configuration - [ ] Test with single worker locally - [ ] Verify workflows execute correctly

Day 3-4: Multi-Worker Testing - [ ] Scale to 2-3 workers - [ ] Run load tests with multiple concurrent workflows - [ ] Monitor Redis queue behavior - [ ] Document any issues

Day 5: Documentation - [ ] Update CLAUDE.md with new n8n architecture - [ ] Update local development setup guide - [ ] Create troubleshooting guide

Week 2: Azure Preview Environment¶

Day 1-2: Infrastructure Preparation - [ ] Create n8n database schema in Azure PostgreSQL - [ ] Configure Redis DB 1 for n8n - [ ] Test Redis connectivity - [ ] Generate and store N8N_ENCRYPTION_KEY

Day 3-4: Deployment - [ ] Create .github/workflows/deploy-n8n.yml - [ ] Add required GitHub secrets - [ ] Deploy n8n main instance to preview - [ ] Deploy n8n workers to preview (start with 2) - [ ] Update S5 Slidefactory N8N_API_URL to new endpoint

Day 5: Testing and Validation - [ ] Migrate existing workflows to new n8n instance - [ ] Test workflow execution via S5 Slidefactory - [ ] Verify autoscaling triggers - [ ] Load testing with multiple presentations

Week 3: Production Deployment¶

Day 1-2: Production Preparation - [ ] Create production n8n database - [ ] Configure production Redis - [ ] Deploy n8n to production environment - [ ] Migrate production workflows

Day 3-4: Monitoring and Optimization - [ ] Set up Azure Monitor alerts - [ ] Configure autoscaling rules - [ ] Tune worker min/max replicas - [ ] Optimize Redis memory settings

Day 5: Documentation and Handoff - [ ] Complete deployment documentation - [ ] Create runbook for scaling operations - [ ] Train team on new architecture - [ ] Post-deployment review

Cost Implications¶

Azure Container Apps Pricing (Estimated)¶

Current Setup (Single n8n instance): - 1 instance × 1 vCPU × 2GB RAM × 730 hours/month - Estimated: $50-70/month

Queue Mode Setup (Main + Workers): - Main: 1 instance × 1 vCPU × 2GB RAM × 730 hours = $50-70/month - Workers: 2-10 instances (autoscaling) - Minimum (2 workers): $100-140/month - Average (4 workers): $200-280/month - Peak (10 workers): $500-700/month

Redis: - Already provisioned and paid for - Additional cost: negligible (using separate DB number)

Total Additional Cost: - Minimum: +$100-140/month (2 workers) - **Typical**: +$150-210/month (2-4 workers average) - Peak: +$450-630/month (10 workers during high load)

Cost Optimization Strategies: 1. Aggressive autoscaling down (scale to 1 worker during off-hours) 2. Use Azure Reserved Instances for predictable workloads (-30% cost) 3. Monitor and tune worker resource limits 4. Set budget alerts at $300/month threshold

Risk Assessment¶

Technical Risks¶

Risk	Likelihood	Impact	Mitigation
Data loss during migration	Low	High	Test migration on preview first, backup workflows
Encryption key mismatch	Medium	High	Document key management, validate before deployment
Redis capacity exceeded	Low	Medium	Monitor queue size, alert on high depth
Worker autoscaling too slow	Medium	Medium	Tune scaling rules, set appropriate thresholds
Azure quota limits hit	Low	Medium	Request quota increase proactively
Database connection pool exhaustion	Medium	Medium	Configure max connections per instance

Operational Risks¶

Risk	Likelihood	Impact	Mitigation
Increased complexity	High	Low	Document architecture, create runbooks
Higher costs than expected	Medium	Medium	Set budget alerts, monitor usage
Debugging difficulty	Medium	Low	Centralized logging, distributed tracing
Team training required	High	Low	Documentation, knowledge sharing sessions

Success Metrics¶

Performance Metrics¶

Workflow Throughput: 3-5x improvement in concurrent workflow capacity
Execution Time: No degradation in individual workflow execution time
Queue Latency: Jobs picked up by workers within 5 seconds
Autoscaling Time: Workers scale up within 2 minutes of high load

Reliability Metrics¶

Uptime: 99.9% uptime for main instance
Worker Availability: 99.5% uptime for at least 2 workers
Failed Workflow Rate: <1% due to infrastructure issues
Recovery Time: <5 minutes to recover from worker failure

Business Metrics¶

Cost Efficiency: <30% increase in n8n infrastructure costs
User Satisfaction: No complaints about workflow execution delays
Development Velocity: Reduced time to add new workflows
Scalability Headroom: Ability to handle 10x current workflow volume

Rollback Plan¶

If Issues Occur in Docker (Local)¶

Stop all services: docker compose down
Revert docker-compose.override.yml to previous version
Remove n8n-worker service
Set EXECUTIONS_MODE=regular on main n8n instance
Restart: docker compose up -d

If Issues Occur in Azure¶

Immediate Rollback (< 1 hour): 1. Update S5 Slidefactory N8N_API_URL to old n8n instance 2. Redeploy S5 Slidefactory with old configuration 3. Keep new n8n instances running for investigation

Full Rollback (if needed): 1. Export all workflows from new n8n instance 2. Import workflows back to old n8n instance 3. Update S5 Slidefactory environment variables 4. Delete new n8n container apps 5. Document issues for future retry

Data Recovery: - All workflow definitions in PostgreSQL (no data loss risk) - Execution history preserved in database - Redis queue is ephemeral (jobs can be re-queued)

Appendix¶

A. Environment Variable Reference¶

Required for All Instances:

EXECUTIONS_MODE=queue
N8N_ENCRYPTION_KEY=<same-32-char-key-for-all>
DB_TYPE=postgresdb
DB_POSTGRESDB_HOST=<postgres-host>
DB_POSTGRESDB_DATABASE=n8n
DB_POSTGRESDB_USER=<db-user>
DB_POSTGRESDB_PASSWORD=<db-password>
QUEUE_BULL_REDIS_HOST=<redis-host>
QUEUE_BULL_REDIS_PORT=6379
QUEUE_BULL_REDIS_DB=1
GENERIC_TIMEZONE=Europe/Berlin

Main Instance Only:

N8N_HOST=<domain-or-localhost>
N8N_PORT=5678
N8N_PROTOCOL=http  # or https for production
WEBHOOK_URL=<public-webhook-url>
N8N_BASIC_AUTH_ACTIVE=true
N8N_BASIC_AUTH_USER=admin
N8N_BASIC_AUTH_PASSWORD=<secure-password>
N8N_METRICS=true
N8N_USER_MANAGEMENT_DISABLED=false

Workers Only:

# Workers use same base config but no UI/API settings
# Command: n8n worker

B. Useful Commands¶

Docker:

# Scale workers dynamically
docker compose up -d --scale n8n-worker=5

# View worker logs
docker compose logs -f n8n-worker

# Check Redis queue
docker compose exec redis redis-cli -n 1 KEYS "bull:*"
docker compose exec redis redis-cli -n 1 LLEN "bull:n8n:waiting"

# Monitor worker processes
docker compose exec n8n-worker ps aux | grep n8n

Azure:

# Scale workers manually
az containerapp update \
  --name n8n-worker-preview \
  --resource-group <rg> \
  --min-replicas 5 \
  --max-replicas 15

# View logs
az containerapp logs show \
  --name n8n-worker-preview \
  --resource-group <rg> \
  --follow

# Check worker count
az containerapp revision list \
  --name n8n-worker-preview \
  --resource-group <rg> \
  --query "[].properties.replicas"

C. Troubleshooting Guide¶

Problem: Workers not picking up jobs

Diagnosis:

# Check Redis connection
redis-cli -h <host> -p <port> -a <password> -n 1 PING

# Check queue length
redis-cli -h <host> -p <port> -a <password> -n 1 LLEN "bull:n8n:waiting"

# Check worker logs for errors
docker compose logs n8n-worker | grep -i error

Solution: - Verify QUEUE_BULL_REDIS_* environment variables match - Ensure N8N_ENCRYPTION_KEY is identical across all instances - Check Redis network connectivity - Restart workers: docker compose restart n8n-worker

Problem: High Redis memory usage

Diagnosis:

# Check Redis memory
redis-cli -h <host> -p <port> -a <password> INFO memory

# Check queue sizes
redis-cli -h <host> -p <port> -a <password> -n 1 KEYS "bull:*" | wc -l

Solution: - Set Redis maxmemory policy to allkeys-lru - Increase worker count to process jobs faster - Adjust job retention settings in n8n - Consider dedicated Redis instance for n8n

Problem: Database connection pool exhausted

Diagnosis:

# Check PostgreSQL connections
psql -h <host> -U <user> -d n8n -c "SELECT count(*) FROM pg_stat_activity;"

# View n8n logs
docker compose logs n8n | grep -i "connection pool"

Solution: - Increase PostgreSQL max_connections - Reduce per-instance connection pool size in n8n - Use connection pooler (PgBouncer) in front of PostgreSQL - Scale database instance if CPU is saturated

Conclusion¶

This plan provides a comprehensive roadmap to migrate S5 Slidefactory's n8n deployment from single-instance regular mode to scalable queue mode with dedicated workers. The phased approach (Docker → Preview → Production) minimizes risk while delivering immediate scalability benefits.

Key Takeaways: 1. Architecture: Main instance + N workers + Redis queue 2. Benefits: 3-5x throughput, horizontal scaling, better resource isolation 3. Timeline: 3 weeks from local dev to production deployment 4. Cost: +$150-300/month for typical workload 5. Risk: Low-medium with proper testing and rollback plan

Next Steps: 1. Review and approve this plan 2. Schedule implementation windows 3. Provision Azure resources (database, configure Redis) 4. Begin Week 1 implementation (Docker environment) 5. Continuous monitoring and optimization post-deployment

Document Version: 1.0 Last Updated: 2025-11-10 Review Date: 2025-11-24 (2 weeks post-implementation)