back to blog

The Missing Layer: An AI Gateway Build-vs-Buy Playbook for 2026

Read Time 17 mins | Written by: Vinayak Bhagat

Senior engineers sketching an AI gateway network diagram on a whiteboard, showing the gateway routing traffic between client applications and LLM providers
Cloud Solutions · FinOps · Generative AI

The AI gateway is the single layer that separates enterprises that survive AI cost shocks from the ones that don't — and it's the layer most still don't have. It sits between your applications and every LLM provider, enforcing token budgets, routing routine inference to cheaper models, halting compromised keys in under a minute, and emitting structured spend telemetry your provider dashboard can't.

The build-vs-buy decision comes down to four variables: per-team isolation, model-routing sophistication, audit-trail rigor, and time-to-production. Most enterprises should stand up an open-source gateway in 4–8 weeks and consider commercial replacement at the 12-month mark — not before.

4–8 wks
to a production-grade open-source gateway with budget enforcement
<60 sec
to halt traffic from a compromised key — independent of the cloud provider
±15%
spend-forecast accuracy enterprises hit by end of week 4

Continuing the playbook. Our previous post documented why hyperscaler controls fail for AI workloads and introduced the 5-layer FinOps defense. This post is the deep-dive on Layer 3 — the layer most enterprises are still missing.

The Problem

You Can't Fix AI Cost Runaway From Inside the Provider

Provider budgets reconcile on 28-day cycles. Anomaly detection has documented blind spots. Tier auto-upgrade overrides customer-set caps. A misconfigured loop or a leaked key can empty a budget in minutes — and the first signal arrives on the invoice.

You fix it with a control plane that lives between your applications and the model providers — an AI gateway. Every inference call from every app, agent, or pipeline routes through it. Token-level events flow out. Budgets enforce at request time. Compromised keys get killed in seconds, not days.

The Definition

What an AI Gateway Actually Does

Strip away the marketing and there are six things a gateway must do. If a vendor or in-house build skips any of these, it's a wrapper, not a gateway.

Capability 1

Authentication & Key Vaulting

Every upstream API key (OpenAI, Anthropic, Google, Bedrock, Azure OpenAI) lives in the gateway, never in application code. Apps authenticate to the gateway with rotatable internal credentials.

Capability 2

Request Routing

Per-request decisions about which model handles which workload — based on tags, team identity, prompt characteristics, or fallback chains. Routine inference goes to Haiku-class / Gemini Flash. Frontier models reserved for tagged requests only.

Capability 3

Token Budget Enforcement

Hard caps per team, app, and environment, sliced by hour / day / month. Burst limits. Soft alerts before hard cutoffs. Aggregate headroom across providers — because a $5K/day budget split across AWS, Anthropic direct, and OpenAI can't be tracked anywhere else.

Capability 4

Spend Telemetry

Structured events for every call: model, input tokens, output tokens, latency, cost in provider currency and normalized to USD, caller identity, request tags. Streamed to your warehouse in seconds, not waiting on 28-day provider reconciliation.

Capability 5

Kill Switch

A single API call (or single button) halts all traffic from a specific key, team, or globally. Sub-60-second propagation. Independent of provider response times.

Capability 6

Audit Trail

Every prompt and response, retained per your governance policy. Hash-and-index for sensitive payloads. Searchable for incident response, prompt-injection forensics, and compliance review.

The Decision

4 Variables That Settle Build vs. Buy

Skip vendor demos. Decide on these four first. If 3 of 4 variables point one direction, that's your answer. If the split is 2/2, build first and replace later — replacement is straightforward when the application interface is standardized.

Variable 1: Per-Team Isolation

Your need Decision signal
Single team, single use case Build
3+ teams sharing a budget, no chargeback yet Build with team tags
5+ teams with formal chargeback / showback Buy or build with strong tagging discipline
Regulated business-unit isolation Buy enterprise-grade with multi-tenancy primitives

Variable 2: Model-Routing Sophistication

Your need Decision signal
One model, occasional fallback Build — a 50-line FastAPI proxy works
2-tier routing (cheap default / premium tagged) Build with LiteLLM or similar
Semantic routing (by prompt content) Buy or invest 4–6 engineer-weeks
Adaptive routing (accuracy/cost feedback) Buy — 12+ months to in-house parity

Variable 3: Audit-Trail Rigor

Your need Decision signal
Engineering observability only Build — write events to S3 or BigQuery
Internal audit / SOX-adjacent Build with append-only storage + integrity hash
External regulator (FINRA, MAS, FCA) or SOC 2 II Buy with retention, RBAC, certified data residency
EU AI Act high-risk classification Buy and procure compliance attestations upfront

Variable 4: Time-to-Production

Your pressure Decision signal
6+ months runway, strong platform team Build
8 weeks to first-pass control Open-source gateway (LiteLLM, Helicone OS)
Production-grade in 4 weeks, no platform team Commercial gateway
Yesterday Commercial — and renegotiate the AI cost ceiling with finance in parallel
The Build Path

4 Weeks to a Working Open-Source Gateway

If you're building, the open-source ecosystem is mature enough that you're integrating, not inventing.

Week 1

Foundation

  • Stand up LiteLLM or Helicone (open-source) behind your load balancer.
  • Move all upstream API keys into the gateway's secret store.
  • Issue internal API keys per team / per environment.
  • Update one pilot application to call the gateway instead of the provider directly.
Week 2

Budgets & Routing

  • Define team budgets (start with monthly, layer in daily later).
  • Wire model-routing rules: default to a cheap model, escalate on premium tag.
  • Add a kill-switch endpoint protected by ops-only IAM.
Week 3

Telemetry

  • Stream every gateway event to your warehouse (Snowflake, BigQuery, Redshift).
  • Build dashboards: spend per team, per model, per environment, anomaly bands.
  • Pipe alerts to Slack / PagerDuty when a team exceeds 80% of monthly budget.
Week 4

Migration

  • Cut over remaining applications. OpenAI-API-compatible interface keeps the swap one-line per app.
  • Tabletop: simulate a compromised key, trigger the kill switch, measure end-to-end response time.
  • By end of week 4 you should hit ±15% spend-forecast accuracy. If not, the gap is tagging discipline — not the gateway.
The Buy Path

Procurement Criteria That Actually Matter

  • Provider coverage. OpenAI, Anthropic (direct + via Bedrock), Google (direct + Vertex), Azure OpenAI, AWS Bedrock, and at least one open-source path. OpenAI-only vendors will force you into a second gateway in 12 months.
  • Self-host option. Hosted-only is acceptable early; regulated industries need self-host. Confirm no feature gap between the two modes.
  • Latency overhead. Under 50ms p99 added for non-streaming, under 200ms for streaming first-token. Anything higher breaks real-time apps.
  • Pricing model. Per-request pricing scales nastily with AI volume. Prefer flat-rate or per-team licensing.
  • Standards compliance. OpenAI API on the application side. OpenTelemetry on the telemetry side. Proprietary interfaces trade hyperscaler lock-in for startup lock-in.
  • SOC 2 Type II + ISO 27001. Non-negotiable for enterprise.
  • Lock-in cost. If swapping the gateway is "weeks of re-instrumentation," walk.
Build vs. Buy

The Side-by-Side

Dimension Build (Open-Source) Buy (Commercial)
Time to first production traffic 4–8 weeks 1–3 weeks
Year-1 cost (50M req/mo) ~$80K engineering + $20K infra $120K–$400K license + infra
Latency overhead (p99) 10–30 ms 20–80 ms
Provider coverage All (write the adapter if missing) Vendor-dependent — verify before signing
Compliance (SOC 2, EU AI Act) You own the audit work Vendor-provided (verify scope)
Routing sophistication ceiling Whatever you can implement Higher (mature commercial products)
Switching cost Low — code is yours Medium — re-instrumentation required
Risk of vendor disappearing Zero Real — vet the cap table
Pitfalls to Avoid

4 Mistakes Enterprises Make at This Layer

Mistake #1: Letting each team run its own proxy

Per-team proxies kill the only value the gateway provides: cross-team observability and aggregate budget control. One gateway, one source of truth.

Mistake #2: Treating the gateway as a developer convenience

It's a financial control plane. Finance, security, and engineering co-own it. If finance can't pull the kill switch, you don't have a kill switch — you have a hopeful API endpoint.

Mistake #3: Skipping application interface standardization

If apps call the gateway with a proprietary contract, you've recreated lock-in one layer up. Force OpenAI-API-compatible on every application — it makes migration between gateways (or models) a one-line change.

Mistake #4: Believing the gateway is a security boundary

A gateway is a cost control plane. Prompt injection, jailbreaks, and data exfiltration still happen through it. Pair with a prompt-shielding layer (Lakera, Robust Intelligence, in-house) before treating it as security infrastructure.

How Ontrac Helps

Cloud, FinOps, and GenAI — Delivered as One Stack

Cloud Solutions

Gateway architecture, deployment (open-source or commercial), multi-cloud integration, and the IAM & key-rotation pipelines underneath.

FinOps & Financial Intelligence

Token-budget design, chargeback model, anomaly modeling on gateway telemetry. The spend warehouse your CFO actually trusts.

Generative AI Consulting

Model-routing strategy, prompt-shielding integration, governance framework, and the kill-switch tabletop with your finance, security, and engineering leads.

Data & Analytics

The Snowflake / BigQuery / Redshift pipelines that turn gateway events into the spend dashboard finance signs off on.

Free Architecture Review

Is an AI Gateway Right for Your Stack?

Two-hour engagement. We walk your AI traffic patterns, run the 4-variable worksheet against your constraints, and deliver a build-vs-buy recommendation with an 8-week implementation plan within five business days. No obligation.

Book a 30-Minute Discovery Call →
Further Reading
  • Ontrac's prior post: Stopping Runaway AI Cloud Bills — the 5-layer FinOps defense this gateway sits inside.
  • LiteLLM open-source proxy — the most common starting point for a build path.
  • Helicone open-source observability proxy — pairs well with LiteLLM or replaces it.
  • OpenTelemetry specification — the telemetry standard your gateway should emit.

Framework Will Help You Grow Your Business With Little Effort.

Vinayak Bhagat

HubSpot & Marketing Automation Specialist at Ontrac Solutions