Kubernetes Cost Optimization for Mid-Market Engineering Organizations

8 min read

Last updated:

Container orchestration concept with stacked shipping containers
Photo by Growtika on Unsplash

Kubernetes cost discipline at the mid-market scale, roughly fifty to five hundred engineers, is a different problem than it is at hyperscale. The platform team is small enough that the bar for tooling has to be high and the operational complexity has to be low. The cluster footprint is large enough that twenty percent waste is real money but not large enough to justify a dedicated FinOps organization. The classic Kubernetes cost optimization advice, written for either tiny teams or huge ones, mostly does not apply.

This is the discipline that works at this scale. It is opinionated and it is achievable in a quarter, not a year.

Where the Money Actually Goes

The waste profile of a typical mid-market Kubernetes deployment in 2026 is consistent across organizations. Idle node capacity from poorly tuned autoscaling accounts for twenty to thirty-five percent of compute spend. Over-provisioned resource requests, where pods reserve two to four times the CPU and memory they actually use, account for another fifteen to twenty-five percent. Always-on non-production environments running outside business hours account for ten to fifteen percent. Inefficient storage class choices and orphaned persistent volumes account for five to ten percent. Cross-AZ data transfer, particularly for service mesh traffic, can hit ten percent on its own.

Add this up and the typical mid-market cluster is running at forty to fifty percent of theoretically achievable cost efficiency. Bringing it to seventy-five percent is a quarter of focused work. Going beyond eighty-five percent requires either dedicated FinOps headcount or accepting reliability tradeoffs most organizations should not accept.

Karpenter and Cluster Autoscaler in 2026

Karpenter has effectively won the autoscaling conversation for AWS, with credible support now for Azure and emerging support for GCP through community contributions. The version one release stabilized the API and made consolidation behavior predictable. For new clusters on EKS, Karpenter is the default choice. For existing clusters on Cluster Autoscaler with stable node group definitions, the migration has a real but bounded payoff, typically ten to twenty percent additional efficiency at the cost of a quarter of platform engineering work.

The Karpenter tuning that produces the largest gains is also the one most teams skip. Configure NodePools with diverse instance types across at least three families, never restrict to a single instance type. Allow consolidation aggressively in development clusters and conservatively in production, with TTL settings tuned to the actual restart tolerance of your workloads. Use the disruption budget feature to prevent cascading evictions during consolidation events. And critically, set requirements that exclude the latest-generation instances when their on-demand price is more than fifteen percent above the prior generation, because the marginal performance is rarely worth the marginal cost.

Spot Fleet Design That Survives Production

Spot instances continue to be the largest single cost lever, with sixty to seventy percent discounts off on-demand pricing in 2026. The reason most mid-market teams underuse spot is not technical, it is operational scar tissue from a bad incident in 2020 when a fleet was reclaimed during peak load. The patterns that make spot reliable enough for production at this scale have become well-understood.

  • Instance diversification across at least six instance types from three families, in three availability zones. Reclamation events almost never affect more than one or two of these dimensions simultaneously.
  • Pod disruption budgets on every workload, with realistic minimum availability targets that allow voluntary disruption.
  • Stateful workloads on on-d
    Stacked shipping containers in muted blue tones symbolizing pods
    Photo by Hannes Egler on Unsplash
    emand
    , stateless workloads on spot, with the boundary enforced by node selectors and affinity rules.
  • Graceful shutdown handlers that respond to the two-minute spot interruption notice by draining traffic and persisting state.
  • Spot interruption rate monitoring as a first-class SLI, alerting when reclamation rates exceed historical baselines.
  • Fallback to on-demand when spot capacity is unavailable, configured at the Karpenter NodePool level so the cluster never blocks waiting for spot.

Request and Limit Hygiene

Resource request right-sizing is the highest-value, lowest-risk optimization most teams have not yet executed. The default culture in most engineering organizations is to set requests at two to four times observed steady-state usage, on the theory that this provides headroom for spikes. The result is bin-packing efficiency in the thirty to forty percent range, where it should be sixty to seventy.

Vertical Pod Autoscaler in recommendation-only mode, fed into a quarterly request review process, produces sustainable rightsizing without the operational risk of automatic VPA. For organizations willing to invest in tooling, Goldilocks for VPA recommendations or the rightsizing modules of Kubecost and Cast.ai produce credible recommendations with less manual analysis. The tooling is less important than the discipline of actually applying the recommendations.

On limits, the strong opinion that has emerged is to set memory limits equal to memory requests and to omit CPU limits entirely for most workloads. CPU throttling caused by limits has caused more production incidents than CPU contention from missing limits. Memory limits matter because OOM is preferable to a node-level memory crisis. CPU limits in most cases just slow your application down for no reason.

FinOps Tooling at Mid-Market Scale

Three categories of tooling have proven useful at this scale. Kubecost, available as both open source OpenCost and commercial Kubecost, provides cost allocation by namespace, label, and workload that the cloud provider billing dashboards do not. Cast.ai is the most aggressive automated optimization platform, taking direct control of node provisioning and bin-packing in exchange for typically thirty to fifty percent cost reduction. PerfectScale and StormForge focus on workload right-sizing automation as a complement to whatever node management you already run.

The honest tradeoff is that automated platforms like Cast.ai produce real savings but introduce a third party into your critical path. For organizations with mature platform engineering, OpenCost plus disciplined Karpenter configuration produces equivalent results without the dependency. For organizations where the platform team is one or two engineers and growing, the automation is worth the dependency.

Our Recommendation

Run a single quarter of focused cost work with three concurrent workstreams: Karpenter migration or tuning, request right-sizing through VPA recommendations, and spot fleet expansion for stateless workloads. Set a target of thirty percent cost reduction. Most teams hit twenty-five to thirty-five percent in this window without operational regression.

Install OpenCost or Kubecost on day one of the work, because you cannot optimize what you cannot measure. Set namespace-level cost allocation visible to engineering managers. Make cost a tracked metric in service ownership reviews, alongside reliability and latency. The cultural shift from cost-as-platform-problem to cost-as-shared-responsibility is the largest source of sustainable improvement.

Kubernetes cost is not solved by tools. It is solved by giving engineers the data to see the financial impact of their choices and the abstr

Abstract grid pattern resembling a Kubernetes cluster topology
Photo by Growtika on Unsplash
actions to act on it without breaking production.

When to Drop Kubernetes

Kubernetes is not the right answer for every workload, and the mid-market is exactly where this question becomes worth asking. If your production footprint is fewer than twenty pods across two or three services, the operational overhead of Kubernetes is rarely justified. AWS ECS on Fargate, Google Cloud Run, or Azure Container Apps deliver equivalent functionality with materially lower operational burden and, frequently, lower total cost.

Consider dropping Kubernetes if your platform team spends more than thirty percent of its time on cluster operations rather than developer enablement. That ratio indicates the platform is consuming more capacity than it produces. Consider dropping Kubernetes if your application is monolithic, stateful, and deployed from a single repository, because the abstractions Kubernetes provides are not solving any problem you have.

The strong case for staying on Kubernetes is when you have ten or more services with diverse runtime requirements, when you have multi-cloud or hybrid requirements that managed serverless platforms cannot satisfy, when you have a platform engineering team large enough to operate the substrate well, or when your developer experience depends on the ecosystem of tooling that has standardized on Kubernetes APIs. For most mid-market organizations between fifty and five hundred engineers with material backend complexity, the answer is to stay and to invest in operating it well. For the subset whose answer is to leave, the migration is a serious project but a finite one, and the operational simplification on the other side is real.

Quotas, Namespaces, and the Cultural Layer

The technical levers above are necessary but not sufficient. The sustainable cost outcomes in mid-market Kubernetes deployments are produced by the namespace-level governance and quota structure that aligns financial responsibility with engineering ownership. Without it, every cost optimization decays back to baseline within two quarters as new workloads accrete the same waste profile.

The pattern that works is per-team or per-product namespaces, each with a ResourceQuota that caps total CPU, memory, persistent volume claims, and pod count. LimitRange objects enforce per-pod request floors and ceilings, preventing both unbounded resource grabs and trivially small requests that defeat scheduler bin-packing. Cost allocation is computed at the namespace level by Kubecost or OpenCost and reported to the owning team weekly. Engineering managers see their team’s namespace cost in the same review where they see error budget consumption.

The political work is harder than the technical work. Engineering managers must agree that namespace cost is a metric they own, not a metric the platform team owns on their behalf. Finance must accept that allocation will never be perfect at the pod level and that namespace-level allocation is sufficient for chargeback or showback purposes. Platform engineering must commit to making cost data trustworthy enough that engineering managers can act on it without second-guessing the numbers. None of this is technical work, but all of it is the difference between a cost program that produces a one-time saving and one that produces sustained discipline.

Non-production environments deserve a specific call-out. Development and staging clusters are typically the largest source of waste at mid-market scale because they run twenty-four seven without justification. Implement automatic scale-to-zero for development namespaces outside business hours via KEDA, kube-downscaler, or a custom CronJob that adjusts replica counts. Pair this with PreviewEnvironment patterns that spin up ephemeral namespaces per pull request and tear them down on merge. The savings from non-production discipline alone often exceed twenty percent of total Kubernetes spend.


Talk to the team

Frameworks scale better when they meet real constraints. If you are facing this decision in production, write to us.