How Spot Instances Work
Spot Instances use AWS's spare EC2 capacity. When you run Spot, you bid for unused capacity. If AWS needs that capacity back, they give you a 2-minute interruption notice, then terminate or stop your instance. In exchange: 60–90% discount vs on-demand prices.
Spot interruption rates are lower than most engineers assume. The average interruption frequency for most instance types in major regions is 5–10% per month. Graviton instance types (arm64) tend to have the lowest interruption rates because they have separate capacity pools.
Workloads That Work Well on Spot
// good for spot
- CI/CD build pipelines
- Batch data processing (Spark, EMR)
- ML training jobs
- Stateless web tier (behind load balancer)
- Video transcoding
- Game servers (with checkpoint support)
- Selenium test grids
// not suitable
- Primary databases
- Single-instance critical services
- Stateful workloads without checkpointing
- Long-running jobs without restart logic
- Anything requiring guaranteed uptime SLA
Handling Interruptions Gracefully
AWS sends a 2-minute interruption notice via instance metadata and EventBridge. Your application must handle this. The pattern:
For batch jobs: checkpoint progress every 5–10 minutes. For stateless web: rely on your load balancer's connection draining. For Kubernetes: use node termination handler (AWS Node Termination Handler Helm chart).
Architecture Patterns
Pattern 1 — Diversified instance fleet: Never use a single instance type. Request 6–10 instance families simultaneously (m5, m5a, m5n, m4, m6i, r5, etc.). If one pool is interrupted, capacity remains in others.
Pattern 2 — Spot + On-Demand baseline: Run 20–30% of your fleet as on-demand, 70–80% as Spot. The on-demand instances absorb traffic during Spot interruptions. Net savings: 55–70% vs all on-demand.
Mixed Instance Fleet with Auto Scaling
Use price-capacity-optimized allocation strategy — it picks pools with highest Spot availability, reducing interruption risk more than pure lowest-price selection.