Problem Statement
Explain Blue-Green deployment implementation including infrastructure setup, traffic switching, database considerations, rollback procedures, and CI/CD integration.
Explanation
Blue-Green deployment maintains two identical production environments enabling instant traffic switching. Infrastructure setup requires: two complete environments (compute, networking, databases), load balancer or DNS for traffic routing, monitoring for both environments, automation for deployment and switching.
Infrastructure patterns:
1. Load Balancer switching (AWS ELB):
```yaml
# Terraform example
resource "aws_lb_target_group" "blue" {
name = "app-blue"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 2
timeout = 5
}
}
resource "aws_lb_target_group" "green" {
name = "app-green"
port = 80
protocol = "HTTP"
vpc_id = aws_vpc.main.id
health_check {
path = "/health"
healthy_threshold = 2
unhealthy_threshold = 2
}
}
resource "aws_lb_listener" "main" {
load_balancer_arn = aws_lb.main.arn
port = 80
protocol = "HTTP"
default_action {
type = "forward"
target_group_arn = var.active_environment == "blue" ?
aws_lb_target_group.blue.arn :
aws_lb_target_group.green.arn
}
}
```
2. Kubernetes Service switching:
```yaml
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
app: myapp
version: blue # Switch to 'green' for deployment
ports:
- port: 80
targetPort: 8080
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-blue
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: blue
template:
metadata:
labels:
app: myapp
version: blue
spec:
containers:
- name: myapp
image: myapp:v1
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-green
spec:
replicas: 3
selector:
matchLabels:
app: myapp
version: green
template:
metadata:
labels:
app: myapp
version: green
spec:
containers:
- name: myapp
image: myapp:v2
```
Deployment process:
1. Deploy to inactive environment (Green)
2. Run smoke tests on Green
3. Run full test suite on Green
4. Switch small percentage for canary testing (optional)
5. Monitor Green environment metrics
6. Switch all traffic to Green
7. Monitor for issues
8. Keep Blue running as backup
Database considerations (biggest challenge):
Backward compatible migrations:
```sql
-- Phase 1: Add new column (nullable)
ALTER TABLE users ADD COLUMN email_verified BOOLEAN DEFAULT NULL;
-- Deploy Green with code using new column
-- Both Blue and Green work
-- Phase 2: Backfill data
UPDATE users SET email_verified = FALSE WHERE email_verified IS NULL;
-- Phase 3: Make non-nullable (after Blue retired)
ALTER TABLE users ALTER COLUMN email_verified SET NOT NULL;
```
Database patterns:
1. Shared database (both Blue and Green use same DB):
- Simple but requires backward compatible migrations
- Blue and Green must handle old and new schema
2. Database per environment:
- Complete isolation
- Complex data synchronization
- Expensive (duplicate data)
3. Read replicas:
- Blue uses primary, Green uses replica for testing
- Promote replica during switch
Traffic switching automation:
```bash
#!/bin/bash
# blue-green-switch.sh
CURRENT_ENV=$1
NEW_ENV=$2
echo "Current: $CURRENT_ENV, Switching to: $NEW_ENV"
# Deploy to new environment
kubectl apply -f k8s/$NEW_ENV/
# Wait for pods ready
kubectl wait --for=condition=ready pod -l version=$NEW_ENV --timeout=300s
# Run smoke tests
./smoke-tests.sh https://$NEW_ENV.internal.example.com
if [ $? -ne 0 ]; then
echo "Smoke tests failed, aborting"
exit 1
fi
# Switch traffic
kubectl patch service myapp -p '{"spec":{"selector":{"version":"'$NEW_ENV'"}}}'
echo "Switched to $NEW_ENV"
# Monitor for 5 minutes
sleep 300
# Check error rate
ERROR_RATE=$(curl -s "https://metrics.example.com/error-rate?env=$NEW_ENV")
if [ "$ERROR_RATE" -gt 5 ]; then
echo "High error rate: $ERROR_RATE%, rolling back"
kubectl patch service myapp -p '{"spec":{"selector":{"version":"'$CURRENT_ENV'"}}}'
exit 1
fi
echo "Deployment successful"
```
CI/CD integration:
```yaml
stages:
- build
- deploy-green
- test-green
- switch-traffic
- cleanup
build:
stage: build
script:
- docker build -t myapp:$CI_COMMIT_SHA .
- docker push myapp:$CI_COMMIT_SHA
deploy-green:
stage: deploy-green
script:
- export NEW_ENV="green"
- export OLD_ENV="blue"
- ./deploy.sh $NEW_ENV $CI_COMMIT_SHA
environment:
name: green
url: https://green.internal.example.com
test-green:
stage: test-green
script:
- ./smoke-tests.sh https://green.internal.example.com
- ./integration-tests.sh https://green.internal.example.com
switch-traffic:
stage: switch-traffic
script:
- ./blue-green-switch.sh blue green
when: manual
environment:
name: production
only:
- main
cleanup-blue:
stage: cleanup
script:
- kubectl scale deployment myapp-blue --replicas=1
when: manual
```
Rollback procedure:
```bash
#!/bin/bash
# Quick rollback
kubectl patch service myapp -p '{"spec":{"selector":{"version":"blue"}}}'
echo "Rolled back to blue"
```
Monitoring during switch:
```python
import requests
import time
def monitor_switch(environment, duration=300):
metrics = ['error_rate', 'latency_p95', 'throughput']
start = time.time()
while time.time() - start < duration:
for metric in metrics:
value = get_metric(environment, metric)
if is_anomaly(metric, value):
print(f"Anomaly detected: {metric}={value}")
rollback()
return False
time.sleep(10)
return True
```
Best practices: test rollback procedures regularly, automate everything, implement health checks, monitor business metrics not just technical, use gradual switch (weighted routing) for additional safety, maintain backward compatible migrations, document database migration strategy, implement automated rollback on anomalies, keep Blue running for configured time after switch, use feature flags for additional safety layer. Understanding Blue-Green enables zero-downtime deployments with instant rollback capability.