Build vs Buy: A Decision Framework for Custom Software vs SaaS

7 min read

Last updated:

Team collaborating around a whiteboard during a strategy session
Photo by Mapbox on Unsplash

Every quarter, an engineering org somewhere greenlights a custom build that should have been a SaaS subscription, or signs a SaaS contract for the one capability that defines its product. Both mistakes cost millions. The question is not whether to build or buy. The question is which decision rule survives contact with the next five years of your roadmap.

This framework is the one we walk CTOs through during architecture reviews. It assumes you already know how to read an invoice and how to estimate a sprint. What it gives you is a way to defend the decision to a board, a CFO, and to your future self when the tradeoffs surface eighteen months in.

The Differentiator Rule

The first filter is brutal and binary. If a capability is part of how you win in the market, build it. If it is not, buy it. Auth flows, billing, helpdesk, error tracking, feature flags, internal analytics dashboards, document signing, video conferencing, status pages: these are not where you win. Customers do not pay you because your SSO is elegant. They pay you because of the thing your competitors cannot do.

The trap is that engineering teams genuinely enjoy building these things. They are well-scoped, satisfying problems with clear shapes. Auth0, Stripe, Zendesk, Sentry, LaunchDarkly, DocuSign, Daily, Statuspage, Mixpanel, Segment, Snowflake, Looker, Datadog all exist because thousands of teams concluded that those problems were solved well enough by people who solve them full-time. Your team should reach the same conclusion before they write the first migration.

Total Cost of Ownership Beyond License Fees

License fees are the most visible cost and almost never the largest one. A useful TCO model spans five years and counts every line item that engineering, finance, and security will eventually pay. The numbers below are illustrative bands we have seen across mid-market consulting engagements, not benchmarks.

  • Build path: initial engineering (loaded cost per FTE multiplied by team size and duration), opportunity cost of those engineers not shipping product features, ongoing maintenance at roughly 15 to 25 percent of initial build per year, on-call burden, security patching, dependency upgrades, infrastructure spend, observability, compliance audits when in scope, and the eventual rewrite that arrives every 4 to 7 years.
  • Buy path: contract value, integration engineering for connecting the SaaS to your stack, vendor management overhead, data egress costs, audit and procurement effort, and the cost of switching if the vendor underperforms or repackages pricing.
  • Hidden cost on both sides: the time leadership spends defending the decision when something breaks. Build it and an outage is your fault. Buy it and an outage is the vendor’s fault but still your problem.

The honest version of TCO almost always shows that buying is cheaper for the first 18 to 36 months and that build economics only start to compete once your usage scale outgrows the vendor’s pricing model. Below 10,000 active users, build is rarely cheaper. Above 1 million, the math sometimes flips, but not always.

Switching Cost as the Real Lock-In

<
Two paths diverging in a minimalist landscape representing a build or buy decision
Photo by Vladislav Babienko on Unsplash
!– /wp:heading –>

The standard concern about SaaS is vendor lock-in. The standard concern is misframed. The real question is switching cost, and switching cost applies equally to your custom build. A homegrown billing system is locked in too. The lock-in is just to your own team’s tribal knowledge instead of to a vendor’s roadmap.

What Increases SaaS Switching Cost

Proprietary data formats with no clean export, deep workflow integrations with custom logic, identity provider entanglement, vendor-specific UI embedded in your own product, and pricing models that compound with usage so that migration windows become exorbitant. The mitigation is to insist on data portability clauses, to keep an integration abstraction layer between your code and the vendor SDK, and to track the cost of staying versus leaving on a yearly basis.

What Increases Build Switching Cost

Tribal knowledge that left with the original team, undocumented business rules encoded in stored procedures, custom protocols nobody wants to support, and tightly coupled internal systems that all assume the build will exist forever. The mitigation is documentation, modularization, and an explicit owner. Most internal builds fail this test by year three.

The Hybrid Pattern That Usually Wins

Most mature engineering orgs end up with a hybrid posture rather than a pure build or pure buy stance. The pattern looks like this: buy the commodity layers, build a thin orchestration layer on top, and reserve custom engineering for the differentiated workflow that touches the customer. Use Auth0 or WorkOS for identity, but build the tenant-specific authorization model that encodes your domain. Use Stripe for payments, but build the pricing engine that calculates what to charge. Use Snowflake for storage, but build the semantic layer your analysts and product team consume.

This pattern works because it isolates vendor risk to interchangeable layers and concentrates engineering investment on the layer that compounds. Replacing Stripe with Adyen is painful but tractable. Replacing your pricing engine is a strategic project either way.

The Anti-Pattern: Rebuilding the Commodity

The most expensive mistake in this space is rebuilding undifferentiated commodity software because it feels strategic. We see it most often in three forms. The first is the in-house feature flag platform that started as a Friday afternoon hack and now consumes an SRE quarter every year. The second is the bespoke ETL pipeline built to avoid Fivetran or Airbyte license fees that ends up costing four times the annual contract in headcount. The third is the internal admin tool framework that reinvents Retool, Forest, or Appsmith because someone read a blog post about low-code being a trap.

The pattern is recognizable. Engineers like the work, leadership likes the optionality, and finance does not see the line item because the cost is hidden inside payroll. Two years later, the system has one maintainer who cannot take vacation, no documentation, and a quiet plan to migrate to the SaaS that was rejected on day one

Laptop on a wooden desk next to a notebook and a coffee cup
Photo by Andrew Neel on Unsplash
.

The Decision Checklist

When the tradeoff is genuinely close, work through these questions in order. The first one to flip the decision is the answer.

  1. Is this capability part of how we win in the market? If yes, build. If no, continue.
  2. Does a credible vendor exist with a clean API, documented data portability, and a track record beyond 5 years? If no, build or wait. If yes, continue.
  3. Will our usage in 24 months exceed the vendor’s pricing model breakpoint by a factor of 3 or more? If yes, model the breakeven. If no, buy.
  4. Do we have the operational maturity to own this system on-call for the next 5 years? If no, buy. If yes, continue.
  5. Does the build path require us to hire specialist talent we do not currently have? If yes, lean buy. If no, the decision is now financial, and TCO over 5 years decides.

How to Run the Decision in Practice

The framework is only useful if it produces a defensible decision in a finite amount of time. The version that survives contact with real organizations looks like a two-week exercise with three artifacts at the end: a one-page TCO model with documented assumptions, a one-page risk register that names the top three failure modes for each path, and a one-page recommendation that the sponsor signs. Anything more becomes a project. Anything less becomes a hallway conversation that gets re-litigated every quarter.

The most common process failure is letting the analysis sprawl until the decision has been made by inertia. The team that spends six weeks evaluating six vendors at the line-item level is the team that ships a Frankenstein POC because nobody wanted to call the question. Set a deadline, name a decision-maker, and accept that the choice will be made with imperfect information. The cost of a wrong decision is recoverable. The cost of no decision is the year you spent not making it.

When This Framework Applies

This framework works for capabilities that are well-defined, have a vendor market, and are not currently a competitive crisis. It works for billing, identity, observability, support, internal tooling, data infrastructure, and most platform layers.

When It Does Not Apply

It does not apply to your core product surface. It does not apply to capabilities that no vendor sells because the market does not yet exist. It does not apply when regulatory constraints prohibit data leaving your environment, although in those cases the relevant choice is between self-hosted commercial software and pure custom build, not between commodity SaaS and custom. And it does not apply when speed to market is the only thing that matters and the build path adds 6 months you do not have. In that case, buy now, accept the lock-in, and revisit in 18 months with a real TCO model in hand.

The discipline is to make the decision once, defend it with numbers, document the assumptions, and revisit those assumptions every two years. The teams that get this right are not smarter. They are more honest about which problems they are paid to solve.


Talk to the team

Frameworks scale better when they meet real constraints. If you are facing this decision in production, write to us.