FinOpsForge — Independent cloud cost reviews. No vendor sponsorships. No paid rankings.

AWS Spot Instances: Complete Guide to 90% Cost Cuts (2026)

// Jan 2026 // 10 min read // independently tested

AWS Spot Instances offer up to 90% discount vs on-demand — the largest cost lever in cloud computing. The catch: AWS can reclaim them with 2 minutes notice. Here's how to architect for Spot safely and which workloads are genuinely suitable.

// Affiliate disclosure: FinOpsForge may earn a commission if you sign up via links on this page. This never affects our ratings or editorial independence. We test tools on real cloud workloads.

How Spot Instances Work

Spot Instances use AWS's spare EC2 capacity. When you run Spot, you bid for unused capacity. If AWS needs that capacity back, they give you a 2-minute interruption notice, then terminate or stop your instance. In exchange: 60–90% discount vs on-demand prices.

Spot interruption rates are lower than most engineers assume. The average interruption frequency for most instance types in major regions is 5–10% per month. Graviton instance types (arm64) tend to have the lowest interruption rates because they have separate capacity pools.

Workloads That Work Well on Spot

// good for spot

  • CI/CD build pipelines
  • Batch data processing (Spark, EMR)
  • ML training jobs
  • Stateless web tier (behind load balancer)
  • Video transcoding
  • Game servers (with checkpoint support)
  • Selenium test grids

// not suitable

  • Primary databases
  • Single-instance critical services
  • Stateful workloads without checkpointing
  • Long-running jobs without restart logic
  • Anything requiring guaranteed uptime SLA

Handling Interruptions Gracefully

AWS sends a 2-minute interruption notice via instance metadata and EventBridge. Your application must handle this. The pattern:

# Poll instance metadata for interruption notice TOKEN=$(curl -X PUT "http://169.254.169.254/latest/api/token" -H "X-aws-ec2-metadata-token-ttl-seconds: 21600") # Check for Spot interruption notice (returns 404 if not interrupted) curl -H "X-aws-ec2-metadata-token: $TOKEN" http://169.254.169.254/latest/meta-data/spot/interruption-action # If 200 returned: drain connections, checkpoint state, graceful shutdown

For batch jobs: checkpoint progress every 5–10 minutes. For stateless web: rely on your load balancer's connection draining. For Kubernetes: use node termination handler (AWS Node Termination Handler Helm chart).

Architecture Patterns

Pattern 1 — Diversified instance fleet: Never use a single instance type. Request 6–10 instance families simultaneously (m5, m5a, m5n, m4, m6i, r5, etc.). If one pool is interrupted, capacity remains in others.

Pattern 2 — Spot + On-Demand baseline: Run 20–30% of your fleet as on-demand, 70–80% as Spot. The on-demand instances absorb traffic during Spot interruptions. Net savings: 55–70% vs all on-demand.

Mixed Instance Fleet with Auto Scaling

# Example: EC2 Auto Scaling mixed fleet configuration MixedInstancesPolicy: InstancesDistribution: OnDemandBaseCapacity: 2 # Always keep 2 on-demand OnDemandPercentageAboveBaseCapacity: 20 # 20% on-demand above base SpotAllocationStrategy: "price-capacity-optimized" LaunchTemplate: Overrides: - InstanceType: m5.xlarge - InstanceType: m5a.xlarge - InstanceType: m6i.xlarge - InstanceType: m6a.xlarge - InstanceType: m4.xlarge

Use price-capacity-optimized allocation strategy — it picks pools with highest Spot availability, reducing interruption risk more than pure lowest-price selection.

// FAQ

Is Spot safe for production?
Yes — with proper architecture. Stateless services behind a load balancer with diversified instance types and a 20% on-demand baseline run on Spot reliably. Companies like Netflix, Lyft, and Airbnb run significant production workloads on Spot.
Should I use a tool like Spot.io instead of managing Spot directly?
For teams running $20K+/month of compute, yes. Spot.io automates fleet diversification, interruption handling, and fallback logic — typically delivering 60–80% savings with zero interruption risk. The ROI is usually 10–30x the tool cost.

Get Started

Compare features, pricing, and real-world savings data.

Visit Site →

// related guides