|
English

One API call should not fail because one provider is having a bad day. Swfte Connect routes requests across OpenAI, Anthropic, Google, Mistral, xAI, and more -- with automatic failover, latency-based selection, and cost-aware routing.

This guide covers how to configure routing strategies for production reliability.


How Routing Works

When you send a request to Swfte Connect, the gateway evaluates your routing configuration and selects the best provider. The process takes under 5ms overhead.

Your App → Swfte Gateway → [Route Decision] → Provider A (primary)
                                             → Provider B (fallback)
                                             → Provider C (fallback)

The gateway tracks real-time health metrics for every provider: response latency, error rate, and availability. It uses these signals to make routing decisions automatically.


Routing Strategies

1. Explicit Provider Selection

The simplest approach. Specify the provider directly in the model identifier:

# Always use OpenAI
response = client.chat.completions.create(
    model="openai:gpt-5",
    messages=[{"role": "user", "content": "Hello"}]
)

# Always use Anthropic
response = client.chat.completions.create(
    model="anthropic:claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}]
)

Use this when you need deterministic behavior or provider-specific features.

2. Automatic Failover

Configure fallback providers that activate when the primary is unavailable or returns errors. Set this up in the Swfte Connect dashboard under Connections > Routing Rules, or via the API:

response = client.chat.completions.create(
    model="openai:gpt-5",
    messages=[{"role": "user", "content": "Analyze this contract."}],
    swfte_options={
        "fallback_models": [
            "anthropic:claude-sonnet-4",
            "google:gemini-2.5-pro"
        ],
        "max_retries": 2,
        "retry_delay_ms": 500
    }
)

The gateway tries providers in order. If OpenAI returns a 500/503 or times out, it immediately retries with Anthropic. If Anthropic also fails, it tries Google. The total latency overhead for a failover is typically under 200ms.

3. Latency-Based Routing

Route to whichever provider is currently fastest. The gateway maintains rolling latency windows (last 100 requests per provider) and selects the provider with the lowest p50 latency.

response = client.chat.completions.create(
    model="gpt-5",  # No provider prefix = eligible for routing
    messages=[{"role": "user", "content": "Quick question"}],
    swfte_options={
        "routing": "lowest_latency",
        "eligible_providers": ["openai", "anthropic", "google"]
    }
)

This is ideal for latency-sensitive applications like chat interfaces or real-time assistants.

4. Cost-Optimized Routing

Route to the cheapest provider that meets your quality requirements:

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Summarize this article."}],
    swfte_options={
        "routing": "lowest_cost",
        "eligible_providers": ["openai", "anthropic", "google", "mistral"],
        "min_quality_score": 0.85  # Only consider providers above this threshold
    }
)

The gateway compares per-token pricing across eligible providers and routes accordingly.

5. Weighted Distribution

Split traffic across providers by percentage. Useful for gradual migrations or A/B testing model quality:

response = client.chat.completions.create(
    model="gpt-5",
    messages=[{"role": "user", "content": "Hello"}],
    swfte_options={
        "routing": "weighted",
        "weights": {
            "openai": 60,
            "anthropic": 30,
            "google": 10
        }
    }
)

Configuring Fallbacks in the Dashboard

For teams that prefer UI configuration over code:

  1. Navigate to Connections in the sidebar
  2. Select a provider (e.g., OpenAI)
  3. Under Routing Configuration, set:
    • Priority: Primary, Secondary, or Tertiary
    • Failover Trigger: Error rate threshold (e.g., >5% in 60s)
    • Health Check Interval: How often to probe the provider
  4. Repeat for each provider in your fallback chain

Dashboard configuration applies globally. Per-request swfte_options override dashboard settings for individual calls.


Health Monitoring

The Provider Health panel on your dashboard shows real-time status for every connected provider:

  • Operational (green) -- Provider responding normally, latency within expected range
  • Degraded (yellow) -- Elevated latency or intermittent errors
  • Partial Outage (yellow) -- Some endpoints affected
  • Major Outage (red) -- Provider is down; traffic automatically rerouted

Each provider also shows current latency and links to the provider's own status page.


Practical Patterns

Production Chat Application

For user-facing chat, optimize for reliability and latency:

response = client.chat.completions.create(
    model="openai:gpt-5",
    messages=conversation_history,
    stream=True,
    swfte_options={
        "fallback_models": ["anthropic:claude-sonnet-4", "google:gemini-2.5-pro"],
        "timeout_ms": 10000,
        "routing": "lowest_latency"
    }
)

Batch Processing Pipeline

For background tasks, optimize for cost:

response = client.chat.completions.create(
    model="mistral:mistral-large",
    messages=[{"role": "user", "content": document_text}],
    swfte_options={
        "routing": "lowest_cost",
        "eligible_providers": ["mistral", "google", "anthropic"],
        "timeout_ms": 60000
    }
)

Compliance-Sensitive Workloads

Restrict to specific providers for data residency or compliance:

response = client.chat.completions.create(
    model="anthropic:claude-sonnet-4",
    messages=messages,
    swfte_options={
        "fallback_models": ["google:gemini-2.5-pro"],
        "eligible_providers": ["anthropic", "google"],  # No OpenAI
        "region_preference": "eu"
    }
)

Do
  • Set at least two fallback providers for production workloads
  • Use latency-based routing for real-time applications
  • Monitor the Provider Health dashboard for early warning signs
  • Test your failover chain regularly by temporarily disabling the primary
Don't
  • Don't rely on a single provider without failover
  • Don't set timeout values too low -- allow at least 10s for complex requests
  • Don't mix incompatible models in a fallback chain (e.g., embedding model as fallback for chat)

Next Steps

0
0
0
0

Enjoyed this article?

Get more insights on AI and enterprise automation delivered to your inbox.