Skip to content

Alembic Production Deployment Workflow

Date: 2025-10-16 Author: Claude Code Status: Preview Database Stamped, Production Workflow Documented


Executive Summary

This document describes the complete workflow for syncing database schema changes from the preview environment to production using Alembic migrations. It covers the development-to-production pipeline, including PR creation, deployment automation, and safety checks.

Key Achievement: Preview database successfully stamped with Alembic. Configuration issues resolved. Production workflow documented and ready for implementation.


Current State

Preview Environment

  • Database: slidefactory-preview on Azure PostgreSQL
  • Alembic Status: Stamped with baseline version ae98aa32e0d2 (head)
  • Configuration: .env file fixed to use direct DATABASE_URL (no shell variable expansion)
  • Alembic Config: alembic/env.py loads .env via dotenv

Production Environment

  • Database: slidefactory on Azure PostgreSQL (needs stamping)
  • Alembic Status: Not yet stamped (one-time operation required)
  • Deployment: Manual migration execution (automation pending)

Configuration Fixes Applied

Issue 1: Shell Variable Expansion in .env

Problem: The .env file used shell variable syntax:

DATABASE_URL=postgresql://${PGUSER}:${PGPASSWORD}@${PGHOST}:${PGPORT}/${PGDATABASE}${PGCONFIG}

Python's os.environ.get() doesn't expand ${...} variables, causing fallback to default postgres:5432.

Solution: Replaced with direct URL:

DATABASE_URL="postgresql+psycopg2://dbadmin:esLVoH6UVweWRVU9yC*u@slidefactory-db.postgres.database.azure.com:5432/slidefactory-preview?sslmode=require&connect_timeout=30"

Issue 2: Alembic Not Loading .env

Problem: alembic/env.py imported app.config.settings but app/config.py doesn't load .env automatically—it only reads os.environ.

Solution: Added to alembic/env.py:

# Load .env file for local development (Docker Compose loads it automatically)
from dotenv import load_dotenv
load_dotenv()


Complete Workflow: Preview → Production

Step 1: Develop Schema Changes (Preview Branch)

On preview branch:

# 1. Modify SQLAlchemy models
# Example: app/context/models.py
class Document(Base):
    # ... existing fields ...
    new_field = Column(String, nullable=True)  # NEW

# 2. Generate migration (auto-detects changes)
.venv/bin/alembic revision --autogenerate -m "add new_field to documents"

# 3. Review generated migration
cat alembic/versions/<revision>_add_new_field_to_documents.py

# 4. Test migration on preview database
.venv/bin/alembic upgrade head

# 5. Verify preview database
.venv/bin/alembic current
# Expected: <new_revision> (head)

# 6. Test rollback (optional but recommended)
.venv/bin/alembic downgrade -1
.venv/bin/alembic upgrade head

# 7. Commit migration file to git
git add alembic/versions/<revision>_*.py
git add app/context/models.py
git commit -m "feat: add new_field to documents table"
git push origin preview

Result: - Preview database has new schema - Migration tested and verified - Migration file committed to git


Step 2: Create Pull Request (preview → main)

Create PR via GitHub:

gh pr create \
  --base main \
  --head preview \
  --title "feat: add new_field to documents" \
  --body "## Changes
- Added new_field to Document model
- Alembic migration included: <revision>_add_new_field

## Testing
- [x] Migration tested on preview database
- [x] Rollback tested successfully
- [x] Application verified working with new schema

## Database Changes
- Table: context_documents
- Action: Add column new_field (String, nullable)"

PR Review Checklist: - [ ] Migration files included in PR - [ ] Migration tested on preview database - [ ] Rollback tested (alembic downgrade -1) - [ ] No data loss in migration - [ ] Application code compatible with both old and new schema (for zero-downtime) - [ ] Migration is idempotent (safe to run multiple times)


Step 3: One-Time Production Database Setup

⚠️ IMPORTANT: Before merging any PRs with migrations, production must be stamped once.

Check if production is stamped:

# Connect to production database
az containerapp exec \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --command "alembic current"

# If output is "ae98aa32e0d2 (head)" → Already stamped ✅
# If output is empty or error → Needs stamping ❌

Stamp production (ONE TIME):

# This only creates alembic_version table and sets baseline
# NO schema changes, NO data changes
az containerapp exec \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --command "alembic stamp head"

# Verify
az containerapp exec \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --command "alembic current"
# Expected: ae98aa32e0d2 (head)

What this does:

-- Only creates tracking table
CREATE TABLE alembic_version (
    version_num VARCHAR(32) NOT NULL,
    PRIMARY KEY (version_num)
);

-- Sets baseline version
INSERT INTO alembic_version (version_num) VALUES ('ae98aa32e0d2');

Safety: - ✅ No schema changes - ✅ No data changes - ✅ Takes <1 second - ✅ Reversible (just drop alembic_version table) - ✅ Non-blocking (app keeps running)


Step 4: Merge PR and Deploy

Merge PR:

# Via GitHub UI or CLI
gh pr merge <pr-number> --squash --delete-branch

# Or manually
git checkout main
git merge preview
git push origin main

GitHub Actions Workflow:

This triggers .github/workflows/production.yml which: 1. Builds Docker image 2. Pushes to Azure Container Registry 3. Deploys to Azure Container Apps


Step 5: Run Database Migrations on Production

Current State: Manual execution required (automation pending)

Option A: Manual Migration Execution (Current, Safest)

# After code deploys to production
az containerapp exec \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --command "alembic upgrade head"

# Verify migration applied
az containerapp exec \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --command "alembic current"
# Expected: <new_revision> (head)

When to run: - After GitHub Actions completes deployment - Before announcing new features to users - During maintenance window if needed

Add to .github/workflows/production.yml:

name: Deploy to Production

on:
  push:
    branches:
      - main

jobs:
  deploy:
    # ... existing build and push steps ...

  migrate-database:
    name: Run Database Migrations
    needs: deploy
    runs-on: ubuntu-latest
    steps:
      - name: Azure Login
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}

      - name: Run Alembic Migrations
        run: |
          az containerapp exec \
            --name slidefactory-prod \
            --resource-group ${{ secrets.AZURE_RESOURCE_GROUP }} \
            --command "alembic upgrade head"
        timeout-minutes: 5

      - name: Verify Migration
        run: |
          VERSION=$(az containerapp exec \
            --name slidefactory-prod \
            --resource-group ${{ secrets.AZURE_RESOURCE_GROUP }} \
            --command "alembic current" | grep -oE '^[a-f0-9]{12}')

          echo "Production database is at version: $VERSION"

          if [ -z "$VERSION" ]; then
            echo "❌ Migration verification failed"
            exit 1
          fi

          echo "✅ Migration successful"

Pros: - ✅ Automated (no manual steps) - ✅ Runs after deployment - ✅ Includes verification step - ✅ Fails pipeline if migration fails

Cons: - ⚠️ Requires Azure CLI access in GitHub Actions - ⚠️ Less control over timing

Option C: Init Container (Best for Azure Container Apps)

Modify Azure Container Apps deployment config:

# In your containerapp definition (IaC or Azure Portal)
properties:
  template:
    initContainers:
      - name: migration-init
        image: slidefactoryacr.azurecr.io/slidefactory:latest
        command: ["alembic", "upgrade", "head"]
        env:
          - name: DATABASE_URL
            secretRef: database-url
          - name: REDIS_HOST
            secretRef: redis-host
          # ... other required env vars ...

    containers:
      - name: web
        image: slidefactoryacr.azurecr.io/slidefactory:latest
        # ... main container config ...

How it works: 1. Container Apps starts init container first 2. Init container runs alembic upgrade head 3. If migration succeeds → main container starts 4. If migration fails → deployment fails (rollback)

Pros: - ✅ Runs before app starts (safest) - ✅ Fully automated - ✅ Fails deployment if migration fails (prevents broken state) - ✅ No manual intervention needed - ✅ Built into Azure Container Apps lifecycle

Cons: - ⚠️ Requires IaC or manual Azure Portal config update - ⚠️ Migration failures block entire deployment

Recommendation: Use Option C (init container) for production-grade automation.


Workflow Diagram

┌─────────────────────────────────────────────────────────────┐
│ Development: Preview Branch                                  │
├─────────────────────────────────────────────────────────────┤
│ 1. Modify model: app/context/models.py                      │
│ 2. Generate migration: alembic revision --autogenerate      │
│ 3. Test locally: alembic upgrade head                       │
│ 4. Commit & push to preview                                 │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions: Deploy to Preview                           │
├─────────────────────────────────────────────────────────────┤
│ 1. Build Docker image                                        │
│ 2. Push to Azure Container Registry                         │
│ 3. Deploy to preview Container App                          │
│ 4. Init container runs: alembic upgrade head                │
│ 5. Main container starts with new schema                    │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Verification: Test on Preview                               │
├─────────────────────────────────────────────────────────────┤
│ 1. Manual testing of new feature                            │
│ 2. Check alembic current (verify version)                   │
│ 3. Monitor logs for errors                                  │
│ 4. Verify rollback works (optional)                         │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Pull Request: preview → main                                │
├─────────────────────────────────────────────────────────────┤
│ 1. Create PR via GitHub                                     │
│ 2. Code review                                              │
│ 3. Migration review                                         │
│ 4. Approve & merge                                          │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ GitHub Actions: Deploy to Production                        │
├─────────────────────────────────────────────────────────────┤
│ 1. Build Docker image (from main branch)                    │
│ 2. Push to Azure Container Registry                         │
│ 3. Deploy to production Container App                       │
│ 4. Init container runs: alembic upgrade head                │
│ 5. Main container starts with new schema                    │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ Production: Schema Synced                                   │
├─────────────────────────────────────────────────────────────┤
│ ✅ Schema matches preview                                    │
│ ✅ Alembic version table updated                            │
│ ✅ Application runs with new schema                         │
│ ✅ Zero downtime (if migration is additive)                 │
└─────────────────────────────────────────────────────────────┘

Safety Checks and Best Practices

Before Creating PR

Development checks:

# Verify migration on preview
alembic current
# Should show: <new_revision> (head)

# Check migration history
alembic history --verbose

# Test rollback
alembic downgrade -1
alembic upgrade head

# Verify data integrity
psql $DATABASE_URL -c "SELECT COUNT(*) FROM context_documents;"

During PR Review

Code review checklist: - [ ] Migration file reviewed line by line - [ ] No DROP COLUMN without data migration - [ ] No DROP TABLE without confirmation - [ ] Default values provided for NOT NULL columns - [ ] Indexes added for new columns (if needed) - [ ] Foreign keys have proper ON DELETE behavior - [ ] Migration is reversible (downgrade() implemented) - [ ] No hardcoded values (use dynamic queries)

After Deployment

Production verification:

# Check Alembic version
az containerapp exec ... --command "alembic current"

# Check database for new columns
az containerapp exec ... --command \
  "psql \$DATABASE_URL -c '\d+ context_documents'"

# Monitor application logs
az containerapp logs tail \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --follow

# Test application endpoints
curl https://slidefactory-prod.azurecontainerapps.io/health


Rollback Procedures

If Migration Fails During Deployment

Automatic rollback (with init container): - Init container failure prevents main container from starting - Previous container revision keeps running - No broken state

Manual rollback:

# Rollback database
az containerapp exec \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --command "alembic downgrade -1"

# Rollback container app
az containerapp revision list \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001

az containerapp ingress traffic set \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --revision-weight <previous-revision>=100

If Migration Causes Data Issues

Emergency rollback process:

# 1. Stop traffic to application
az containerapp ingress disable \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001

# 2. Rollback database migration
az containerapp exec ... --command "alembic downgrade -1"

# 3. Restore from backup (if needed)
# Follow Azure PostgreSQL backup restoration process

# 4. Rollback application code
az containerapp revision copy \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001 \
  --from-revision <previous-stable-revision>

# 5. Re-enable traffic
az containerapp ingress enable \
  --name slidefactory-prod \
  --resource-group rg-slidefactory-prod-001

Zero-Downtime Migration Strategies

Additive Changes (Safe)

Examples: - Adding nullable columns - Adding new tables - Adding indexes

Strategy:

# Migration: Add nullable column
def upgrade():
    op.add_column('documents', sa.Column('new_field', sa.String(), nullable=True))

def downgrade():
    op.drop_column('documents', 'new_field')

Deployment: 1. Deploy migration (adds nullable column) 2. Deploy new code (uses new column) 3. ✅ Zero downtime

Non-Nullable Changes (Requires Strategy)

Example: Adding NOT NULL column

Strategy: 3-Phase Deployment

Phase 1: Add nullable column

def upgrade():
    op.add_column('documents', sa.Column('new_field', sa.String(), nullable=True))
Deploy → All new records will have NULL (old code doesn't set it)

Phase 2: Backfill data

def upgrade():
    # Backfill existing records
    op.execute("UPDATE documents SET new_field = 'default' WHERE new_field IS NULL")
Deploy → All records now have values

Phase 3: Make NOT NULL

def upgrade():
    op.alter_column('documents', 'new_field', nullable=False)
Deploy → Column is now NOT NULL

Breaking Changes (Requires Maintenance)

Examples: - Dropping columns used by old code - Renaming columns - Changing column types

Strategy: Blue-Green Deployment

  1. Schedule maintenance window
  2. Deploy new schema
  3. Deploy new code
  4. Verify functionality
  5. End maintenance window

Or: Expand-Contract Pattern

  1. Expand: Add new column, keep old column
  2. Migrate: Dual-write to both columns
  3. Contract: Remove old column after all code updated

Monitoring and Alerting

Key Metrics to Monitor

During migration: - Migration execution time - Database connection count - Lock wait times - Transaction log size

After migration: - Application error rates - Response time degradation - Database query performance - Failed requests

alerts:
  - name: "Migration Timeout"
    condition: migration_duration > 5m
    action: Notify ops team

  - name: "Migration Failed"
    condition: init_container_exit_code != 0
    action: Block deployment, notify ops

  - name: "Schema Mismatch"
    condition: alembic_version != expected_version
    action: Notify ops team

  - name: "High Error Rate Post-Deployment"
    condition: error_rate > 5% for 5m
    action: Trigger rollback procedure

Common Issues and Solutions

Issue 1: Migration Timeout

Symptom: Init container times out during migration

Causes: - Large data migration - Missing indexes - Lock contention

Solution:

# Break large migrations into batches
def upgrade():
    connection = op.get_bind()

    # Process in batches
    batch_size = 1000
    offset = 0

    while True:
        result = connection.execute(
            f"UPDATE documents SET new_field = 'value' "
            f"WHERE new_field IS NULL "
            f"LIMIT {batch_size} OFFSET {offset}"
        )

        if result.rowcount == 0:
            break

        offset += batch_size

Issue 2: Duplicate Migration Version

Symptom: alembic current shows wrong version after manual SQL changes

Solution:

# Check actual database state
psql $DATABASE_URL -c "SELECT * FROM alembic_version;"

# Fix version mismatch
alembic stamp <correct_revision>

Issue 3: Merge Conflicts in Migrations

Symptom: Two PRs create migrations with same parent revision

Solution:

# After merging first PR, rebase second PR
git checkout feature-branch-2
git rebase main

# Regenerate migration with correct parent
alembic revision --autogenerate -m "feature 2"

# Manually merge any conflicts in migration files


Checklist for Production Migration

Pre-Deployment

  • Production database is stamped with Alembic
  • Migration tested on preview database
  • Rollback tested successfully
  • Code review completed
  • PR merged to main
  • Database backup created
  • Maintenance window scheduled (if needed)
  • Team notified of deployment

During Deployment

  • GitHub Actions workflow starts
  • Docker image builds successfully
  • Image pushed to container registry
  • Container App deployment starts
  • Init container runs migration
  • Migration completes successfully
  • Main container starts
  • Health checks pass

Post-Deployment

  • Verify Alembic version: alembic current
  • Check application logs for errors
  • Test critical user flows
  • Monitor database performance
  • Monitor application metrics
  • Verify rollback plan is ready
  • Document any issues encountered

Next Steps

Immediate Actions

  1. Preview database: Already stamped (ae98aa32e0d2)
  2. Production database: Needs stamping (one-time operation)
  3. CI/CD automation: Implement init container or GitHub Actions job
  4. Documentation: Update deployment runbook

Future Improvements

  1. Add monitoring: Azure Monitor alerts for migration failures
  2. Automated testing: Integration tests for migrations
  3. Blue-green deployments: Zero-downtime for breaking changes
  4. Migration review process: Template for PR reviews
  5. Disaster recovery: Automated backup before migrations


Conclusion

This workflow provides a safe, automated path for syncing database schema changes from preview to production. The key principles are:

  1. Test first: Always test migrations on preview before production
  2. Automate safely: Use init containers to prevent broken deployments
  3. Monitor actively: Watch for issues during and after migrations
  4. Rollback ready: Have a plan to revert if something goes wrong
  5. Document everything: Keep a record of migrations and decisions

Status: Ready for production stamping and automation implementation.


Last Updated: 2025-10-16 Next Review: After first production migration