|
English

I've spent the past three years building AI-powered features at two different companies. In that time, I've used every major approach to AI infrastructure: direct API calls, aggregators like OpenRouter, open-source gateways, custom-built solutions, and enterprise platforms.

Each time I thought I'd found the answer, reality corrected me.

Here's the uncomfortable truth I keep arriving at: the time you spend on AI plumbing is time you don't spend on product differentiation. The best AI gateway is the one that disappears into the background while giving you the control you need.

Let me walk you through how I learned that the hard way, and what I'd tell any engineering lead who's currently staring at the same choices I faced.

The Direct API Approach: Maximum Control, Maximum Pain

When I built my first AI feature, calling OpenAI directly felt like the obvious move. Import the SDK, make a request, handle the response. Five minutes to "Hello World." I remember the satisfaction of watching that first completion stream into my terminal, thinking this was all there was to it.

Then product asked me to try Claude for our summarization use case. Suddenly I was maintaining two SDKs with different interfaces, different error handling, different response formats.

My clean AI service file turned into a sprawling if-else tree where every provider had its own code path, its own retry semantics, its own way of signaling rate limits. When Google's Gemini entered the conversation a month later, the branching logic tripled. What had been a single elegant function call became a provider-routing switch statement that nobody wanted to touch.

That was just the beginning. When OpenAI had an outage during a product demo, leadership wanted to know why we didn't have a fallback. Fair question. So I spent two weeks building retry logic and circuit breakers.

Then finance wanted per-feature cost attribution, which meant custom logging, dashboards, and token-counting middleware I hadn't budgeted for.

Our PM started asking for A/B testing across models. That meant building a routing layer on top of everything else. Each "small ask" from the business added another week of plumbing work.

By month six, I'd accidentally built a custom AI gateway. It wasn't planned. It wasn't in any roadmap. It was now my problem to maintain, and it was already accumulating tech debt faster than I could pay it down.

Every new feature request from product meant touching the gateway first, then building the actual feature. The infrastructure had become the bottleneck for the product it was supposed to serve. We had a backlog of customer-facing improvements that kept getting bumped for gateway reliability work.

I don't regret the experience. It taught me exactly what to look for in a managed solution. But I wouldn't do it again voluntarily, and I'd push back hard if someone on my team proposed it today.

The Aggregator Approach: Easy Start, Early Ceiling

OpenRouter and similar aggregators solved my multi-provider headache elegantly. One API, standardized format, dozens of models. Switching from GPT-4o to Claude was a one-line string change.

For the first few months, I was genuinely happy. The documentation was solid, the community was active, and I could stop worrying about provider-specific quirks. I told my manager we'd found the solution.

But aggregators optimize for access, not operations. The cracks showed up gradually, then all at once.

When Claude went down one afternoon, my Claude requests simply failed. Aggregators don't do automatic failover; that's still your responsibility.

I found myself writing fallback chains, looping through a prioritized model list, catching errors, and retrying against the next provider in line. The pattern wasn't complicated for a single call site, but I needed it in every AI call across the codebase. Wrapping an aggregator in my own reliability layer felt like an ironic punchline.

Cost optimization told a similar story. We wanted to route simple classification tasks to cheaper models and reserve the expensive ones for nuanced generation. Aggregators don't do intelligent routing, so I ended up building query-complexity estimation and model-selection logic on top of the aggregator.

I was writing functions to score prompt complexity, mapping score ranges to model tiers, and maintaining a configuration file that mapped use cases to preferred models. At that point, I was layering custom infrastructure over a service I'd chosen specifically to avoid building custom infrastructure. The irony wasn't lost on me.

Then there was semantic caching. The same customer support questions were hitting our AI layer fifty times a day. Each time, we paid full price and waited for a fresh response.

Implementing proper semantic caching would have meant embedding generation for queries, vector storage and similarity search, cache invalidation strategies, and response freshness logic. That's a multi-week engineering project, not a feature flag. The aggregator had no answer for it.

The observability gap was the final straw. We could see total usage numbers, but we couldn't break costs down by feature. We couldn't get latency percentiles by model. We couldn't correlate error rates with request types.

Getting real operational visibility meant bolting on yet another system. The aggregator had saved me from multi-provider glue code. It hadn't saved me from building operational infrastructure. That distinction matters more than most teams realize when they're evaluating options.

The Open-Source Approach: Flexibility, If You Have the Time

Tools like LiteLLM offer the flexibility aggregators lack. You define your model list in a config file, specify routing strategies, set retry counts, and in theory you're off to the races. I was drawn to the promise of customizing everything without vendor lock-in. The GitHub stars and active contributor community gave me confidence.

The reality was more demanding than the README suggested.

"Simple deployment" required container orchestration on Kubernetes or ECS, a PostgreSQL database for config and metrics, Redis for caching, load balancing and SSL termination, secrets management integration, and a monitoring stack with Prometheus and Grafana.

My "30-minute setup" turned into a multi-day infrastructure project before I handled a single production request. And that was with an experienced platform engineer helping me. A product-focused team without that expertise would have taken even longer.

Then came the maintenance. Every provider API change required config updates. Every new model needed onboarding. Every security advisory needed patching.

I tracked my time at a previous company: 8 to 12 hours a month just keeping the gateway healthy, and that was during quiet months. When Anthropic changed their API versioning or OpenAI deprecated a model, the updates could eat an entire day. I started dreading Fridays because that's when breaking changes seemed to land.

The features were broad but shallow. Basic caching existed, but not semantic caching. Basic routing existed, but not cost-aware routing with latency constraints. Compliance logging? You could add it. Cost attribution by team? You could implement it.

The answer to every advanced need was "you can build that," which is technically true and practically exhausting. I started keeping a running tally of "you can build that" features we'd been asked to implement. By month four, the list had eighteen items on it.

The flexibility is real. So is the engineering investment required to use it. For a team with dedicated platform engineers and truly unique requirements, open-source can work well. For a product team trying to ship features, it's a trap disguised as freedom.

The Platform Approach: When It Actually Delivers

I was skeptical of "enterprise AI platforms" until I used one that actually delivered on the promise. I'd been burned enough times by marketing pages that oversold and products that underdelivered. But the shift felt almost disorienting in its simplicity.

I pointed the SDK at my API keys, and everything just worked: routing, failover, caching, observability, all functional with defaults that made sense.

The integration was minimal enough to feel suspicious at first:

import { Swfte } from "@swfte/sdk";

const client = new Swfte({ apiKey: process.env.SWFTE_API_KEY });

const response = await client.chat.completions.create({
  messages: [{ role: "user", content: prompt }],
  metadata: { feature: "customer-support", team: "product" },
});

That's the entire integration. Routing, failover, and caching are handled automatically behind that single call.

Cost attribution happens through the metadata fields, which means finance gets their per-feature breakdown without me building custom logging infrastructure. Every request is traced and visible in the dashboard without any additional setup. The first time I opened the analytics page and saw per-feature cost breakdowns I'd never configured, I actually laughed. I'd spent weeks building worse versions of this at my previous company.

When I needed to override defaults, the control was there without any friction. I could enforce a cost-optimized routing strategy with a latency ceiling and a specific fallback chain on a per-request basis:

const response = await client.chat.completions.create({
  messages: [{ role: "user", content: prompt }],
  routing: {
    strategy: "cost-optimized",
    maxLatencyMs: 2000,
    fallbackChain: ["claude-3.5-sonnet", "gpt-4o", "gemini-pro"],
  },
  cache: { enabled: true, similarityThreshold: 0.92 },
});

The key insight was that when I didn't override anything, the platform's defaults were better than what I'd hand-tuned myself over months of iteration. Good defaults with escape hatches turned out to be a fundamentally better developer experience than "configure everything from scratch."

The difference in day-to-day life was stark. I went from spending 10-plus hours a month on gateway maintenance to maybe an hour or two reviewing dashboards and adjusting policies.

When something broke, I checked the platform's tracing UI instead of digging through container logs and cross-referencing timestamps. When a new model launched, I toggled it on in the dashboard instead of updating config files, rebuilding containers, and hoping the deployment didn't break something else.

The team-level configuration was another revelation. Different teams could have different policies without any code changes. Engineering used aggressive caching for development. Production used conservative failover with tight latency budgets. Compliance got enhanced logging with full request and response capture.

All configured through the platform, not scattered across code in different repositories maintained by different people. Our compliance team, who had previously needed engineering support for every audit request, could now pull their own reports.

What I've Seen in the Field

My own experience tracks with what I've seen advising other teams. Two stories stand out.

A fintech I work with, ClearPath Payments, migrated from OpenRouter to a complete platform in a single sprint. They'd been running OpenRouter for about a year, supplementing it with custom fallback logic and a homegrown cost tracker that was perpetually behind on accuracy.

Their two AI-infrastructure engineers were spending roughly half their time maintaining that glue code. The other half went to actual product work, but even that was fragmented because gateway issues would interrupt sprint work unpredictably.

After migration, their P95 latency dropped 40% thanks to intelligent routing they'd never had time to build themselves. More importantly, those two engineers went back to shipping product features full-time.

Their velocity on customer-facing features nearly doubled in the quarter after migration. The CTO told me the latency improvement alone would have justified the switch, but getting two engineers back was the real win. "We hired them to build fintech features," she said, "not to babysit API plumbing."

Another team I advised, a Series B healthtech called MedScope Analytics, had been running LiteLLM on Kubernetes for about eight months. It worked, mostly, but it required constant attention.

Then their platform engineer left, and suddenly nobody understood the routing configs or the Prometheus alerts. Every LiteLLM update was a gamble because nobody knew which customizations might break. The remaining engineers were afraid to touch anything, which meant they couldn't adopt new models or adjust routing even when the business needed it.

Migration to a managed platform took three days. They eliminated an entire category of on-call pages overnight.

The CTO told me the operational simplicity alone justified the cost, even before counting the engineering hours they reclaimed. Six months later, they'd shipped three major AI features that had been stuck in the backlog while the team was busy keeping the lights on.

These aren't unusual stories. Most product-focused teams I talk to have a version of the same arc: start simple, accumulate complexity, realize the infrastructure work is crowding out the product work.

The teams that thrive are the ones that recognize the pattern early and make the switch before the maintenance burden becomes a drag on hiring and morale.

How the Approaches Actually Compare

Having lived with each approach, here's how they stack up in practice.

Direct APIs get you to a first request in five minutes, but reaching production-readiness takes three to six months of building everything yourself. You're writing your own retry logic, your own fallback chains, your own cost tracking, your own observability layer. Ongoing maintenance runs 10 to 20 hours a month, mostly firefighting issues you didn't anticipate. And when things break at 2 AM, you're the on-call.

Aggregators like OpenRouter are nearly as fast to start with, maybe ten minutes to first request. But you'll still spend two to four months building around the gaps in failover, routing, and observability.

Maintenance settles around 5 to 10 hours monthly, and debugging means figuring out which layer, yours or theirs, actually failed. It's a better starting point than raw APIs, but the ceiling comes sooner than you'd expect.

Open-source tools like LiteLLM need a few hours just for initial setup, then one to three months to reach production quality. Maintenance is 8 to 15 hours a month between config updates, infrastructure care, and chasing GitHub issues when something behaves unexpectedly.

The flexibility is genuine, but so is the ongoing investment required to leverage it. I've seen teams underestimate this investment by a factor of three or four.

A complete platform takes about 15 minutes to first request and one to two weeks to production-readiness, mostly spent on your own integration testing rather than infrastructure wrangling.

Ongoing maintenance drops to an hour or two a month, primarily reviewing dashboards and tuning policies. When something breaks, you check the dashboard and open a support ticket. That's a fundamentally different operational posture, and the advantage compounds over time. The hours you save each month go back into product work.

The right choice depends on your situation, but for most product-focused engineering teams, the complete platform delivers the best developer experience by a significant margin.

What "Complete" Actually Means for Developers

A truly complete AI gateway platform isn't just an aggregator with extra features bolted on. It's a different category entirely.

It starts with an OpenAI-compatible API, so your existing code works with minimal changes and the learning curve is measured in minutes, not weeks.

Routing, failover, caching, and observability work out of the box without configuration, but every default is overridable when you need fine-grained control. Type-safe SDKs in TypeScript, Python, and Go mean you're not casting responses or guessing schemas.

Beyond the basics, it means consistent behavior between development and production, so you're not maintaining separate configs for different environments.

It means request tracing, prompt inspection, and response analysis available through a dashboard rather than buried in log files. It means documentation where every feature has examples, every error is explained, and every migration path is detailed.

It means local development that works the same way as production. No mock servers, no conditional logic checking NODE_ENV, no "it works on my machine" surprises during deployment.

And it means responsive human support when something doesn't work. Not just community forums where your question might get answered next week. When you're debugging a production issue at midnight, the difference between a support ticket and a GitHub issue is the difference between resolution in hours and resolution in days.

The bar is higher than most vendors admit. But when a platform actually clears it, the developer experience is transformational. You stop thinking about infrastructure and start thinking about product.

The Migration Reality

If you're currently running an aggregator or custom solution, migration is typically less painful than you'd expect.

From OpenRouter, the API format is largely compatible. Change your base URL and API key, adjust a few response-handling details, and most codebases migrate in a day.

From LiteLLM, you export your configuration and import routing rules and model preferences. The platform absorbs the infrastructure you were managing, and you can decommission the Kubernetes resources, the database, and the monitoring stack. There's something deeply satisfying about deleting infrastructure you no longer need.

From direct APIs, you abstract your AI calls behind a service layer, which you probably should have anyway for testability, and point that layer at the new platform.

In every case I've seen, the migration cost is lower than a single month of ongoing maintenance on the previous solution. The math isn't close, and the ROI compounds with every month you're not spending engineering hours on gateway upkeep.

My Recommendation

After years of building AI infrastructure, my advice is simple.

Use direct APIs when you're prototyping with a single provider and reliability doesn't matter yet.

Use aggregators when you need multi-model access for a small project with straightforward requirements and a limited timeline.

Use open-source when you have dedicated platform engineers and genuinely unique needs that no managed platform addresses.

For everyone else, and that's most product teams, use a complete platform. The developer experience is better, the maintenance burden is an order of magnitude lower, and your engineers can focus on what actually differentiates your product.

I've watched teams transform their shipping velocity simply by getting AI infrastructure off their plate. The engineers are happier, the product moves faster, and the business stops asking why AI features take so long to ship.

The goal was never to build the best AI gateway. It was always to build the best AI-powered product. Choose the infrastructure that lets you do that.


Ready to stop building AI plumbing? Explore Swfte Connect to see how a complete AI gateway platform lets you ship features instead of infrastructure. For the technical architecture behind complete solutions, read our guide on AI gateway flexibility and ease. For the business case, see our analysis of AI gateway ROI. And for background on intelligent routing, explore how smart routing saves enterprises millions.

0
0
0
0

Enjoyed this article?

Get more insights on AI and enterprise automation delivered to your inbox.