// Definition
Spot Instances (AWS), Spot VMs (Azure), and Preemptible/Spot VMs (GCP) are spare cloud capacity sold at significant discounts — typically 60–90% below on-demand pricing — with the caveat that the cloud provider can reclaim the capacity with short notice (2 minutes on AWS) when demand for that capacity increases.
// Why It Matters
The economics are compelling: a workload that costs $1,000/month on-demand may cost $100–$400 on Spot. For interruptible workloads — batch data processing, ML training, CI/CD pipeline runners, stateless web tiers with multiple replicas — Spot is the highest-impact single optimization lever available.
The architecture requirement: workloads must be designed to handle interruption gracefully. For batch jobs, this means checkpointing progress so a restarted job doesn't start from zero. For web tiers, this means running enough replicas that the loss of one instance during a Spot interruption doesn't degrade availability. Spot is not suitable for single-instance stateful workloads (primary databases, session stores) without specialized management tooling.
In Kubernetes environments, Spot nodes are typically managed as a separate node pool with pod tolerations and disruption budgets controlling which workloads land there. Tools like Spot.io Elastigroup automate Spot instance lifecycle management, handling interruptions, rebalancing, and fallback to on-demand automatically. See our Kubernetes cost reduction guide for the detailed implementation pattern.
// In Practice
Scenario: A data engineering team runs nightly ETL jobs processing 2TB of event data. On EC2 on-demand (r5.4xlarge): $1.008/hr × 6 hours × 30 nights = $181/month. On Spot (same instance type, ~72% discount): $0.28/hr × 6 hours × 30 nights = $50/month. Saving: $131/month ($1,572/year) for one job. The team runs 40 similar nightly jobs — total Spot savings: $63,000/year. Implementation: adding a Spot specification to their EMR cluster configuration.