Standard vs Autopilot: The Most Important Decision
| Attribute | GKE Standard | GKE Autopilot |
|---|---|---|
| Billing unit | Node capacity (reserved) | Pod resource requests (used) |
| Cluster mgmt fee | $0.10/hr ($73/mo) | $0.10/hr ($73/mo) |
| Node management | You manage nodes | Google manages nodes |
| Bin packing efficiency | Depends on your config | Optimized by Google |
| Best for | High workload density, GPU, custom nodes | Variable workloads, simplicity |
Autopilot wins when: Your cluster has highly variable workloads (peaks and troughs), you want to stop paying for idle node capacity, and you don't need custom node configurations or GPU nodes.
Standard wins when: You have consistently high workload density (nodes are well-utilized), you need custom machine types or GPUs, or you're running Spot nodes for batch workloads where Autopilot's constraints are limiting.
Spot Node Pools on GKE
GKE Spot VMs can save 60–91% vs on-demand. GKE automatically handles Spot eviction by gracefully terminating pods with a 25-second window (slightly less than AWS's 2 minutes — ensure your preStop hooks complete quickly).
Use a mixed node pool strategy: Standard nodes for your system node pool and stateful workloads, Spot nodes for stateless services and batch jobs. The cluster autoscaler manages both pools independently.
Cluster & Pod Autoscaling
Cluster Autoscaler: Enable on all user node pools. For Spot pools, set minimum nodes to 0 to allow full scale-down during idle periods. Configure --scale-down-unneeded-time=5m for faster scale-down in cost-sensitive environments.
Horizontal Pod Autoscaler (HPA): Scale pods based on CPU, memory, or custom metrics. Pair with cluster autoscaler — as pods scale down, nodes eventually become unneeded and are terminated.
Vertical Pod Autoscaler (VPA): Automatically adjusts resource requests based on observed usage. In Recommendation mode first, then Auto for non-critical workloads. Avoid running VPA and HPA on the same resource metric simultaneously.
Node Auto Provisioning (NAP)
Node Auto Provisioning extends the cluster autoscaler to automatically create new node pools with the optimal machine type for pending pods — similar to Karpenter on EKS. Enable it instead of manually managing multiple node pools:
Right-Sizing Resource Requests
On GKE Standard, you pay for node capacity regardless of actual pod utilization. Over-provisioned resource requests = over-provisioned nodes = wasted money. Use GKE's built-in recommendations in the console (Workloads → select a deployment → Resource recommendations) or deploy VPA in Recommendation mode for data-driven right-sizing.
Target: CPU requests at P95 of actual usage over 14 days. Memory requests at P99 (memory spikes are more dangerous to under-provision). Set limits at 1.5–2x requests.
Cost Visibility on GKE
Enable GKE cost allocation in the cluster settings (requires enabling usage metering). This splits cluster costs by namespace in your Cloud Billing export. Query with BigQuery for per-team and per-workload cost breakdowns. For a richer UI, deploy Kubecost (free for single cluster) or use the GCP-native cost allocation dashboard.