Cost Guide

Cost Optimization and Controls in Swfte Connect

Configure budgets, rate limits, model cost comparisons, and credits to minimize AI spend.

March 28, 2026

English

AI costs can spiral quickly when you are running thousands of requests per day across multiple providers. Swfte Connect gives you granular visibility and control over spend -- per API key, per model, per provider, and per time window.

Understanding Costs

Every request through Swfte Connect is metered and priced transparently. Cost depends on three factors:

Token count -- Prompt tokens (input) + completion tokens (output)
Model pricing -- Each provider/model combination has different per-token rates
Features used -- Function calling, image generation, and embeddings have separate pricing

View real-time cost data on the dashboard:

Total Costs metric card -- Aggregate spend in the selected time range
Cost per Provider chart -- Breakdown showing which providers consume the most budget
Insights > Analytics -- Per-model cost breakdown, cost optimization recommendations

Budget Alerts

Set spending limits to prevent unexpected charges. Configure budgets in Controls > Cost Controls:

Per-Key Budgets

Limit spend for individual API keys. Useful for isolating costs per application, team, or environment:

# API call to set a budget
import requests

requests.post("https://connect.swfte.com/api/controls/budgets", json={
    "api_key_id": "key_abc123",
    "budget": {
        "amount": 100.00,
        "currency": "USD",
        "period": "monthly",
        "action_on_exceed": "alert"  # or "block"
    }
}, headers={"Authorization": "Bearer sk-swfte-..."})

Actions when budget is exceeded:

Alert -- Send notification (email, webhook) but continue processing
Throttle -- Reduce request rate to 10% of normal
Block -- Reject all requests until the next period or manual override

Workspace Budgets

Set a ceiling for total workspace spend:

Setting	Description
Monthly budget	Maximum spend per calendar month
Daily budget	Maximum spend per day (catches runaway processes early)
Alert threshold	Percentage at which to send warnings (e.g., 80%)
Hard limit	Absolute maximum -- requests are blocked beyond this

Rate Limits

Control request volume to prevent abuse or runaway loops:

# Configure rate limits per API key
requests.post("https://connect.swfte.com/api/controls/rate-limits", json={
    "api_key_id": "key_abc123",
    "limits": {
        "requests_per_minute": 60,
        "requests_per_hour": 1000,
        "tokens_per_minute": 100000,
        "concurrent_requests": 10
    }
}, headers={"Authorization": "Bearer sk-swfte-..."})

When a rate limit is hit, the gateway returns a 429 Too Many Requests response with a Retry-After header. The SDK handles this automatically with exponential backoff.

Model Cost Comparison

Not all models cost the same for the same task. Use the Insights > Analytics view to compare:

Model	Input (per 1M tokens)	Output (per 1M tokens)	Avg Latency
GPT-5	$5.00	$15.00	320ms
Claude Sonnet 4	$3.00	$15.00	450ms
Gemini 2.5 Pro	$2.50	$10.00	210ms
Gemini 2.5 Flash	$0.50	$1.50	180ms
Mistral Large	$2.00	$6.00	280ms

Tiered Model Strategy

Use different models for different use cases:

# High-stakes tasks: premium model
response = client.chat.completions.create(
    model="anthropic:claude-sonnet-4",
    messages=[{"role": "user", "content": "Review this legal contract..."}],
    max_tokens=4096
)

# Routine tasks: cost-efficient model
response = client.chat.completions.create(
    model="google:gemini-2.5-flash",
    messages=[{"role": "user", "content": "Classify this support ticket..."}],
    max_tokens=128
)

# Bulk processing: cheapest option
response = client.chat.completions.create(
    model="mistral:mistral-large",
    messages=[{"role": "user", "content": "Extract entities from this text..."}],
    max_tokens=256
)

This approach typically reduces costs by 40-60% compared to using a single premium model for everything.

Credit Management

Swfte uses a prepaid credit system. Credits are consumed based on actual usage.

Checking Balance

The dashboard displays your current credit balance in the metric strip. You can also check programmatically:

import requests

response = requests.get(
    "https://connect.swfte.com/api/billing/credits",
    headers={"Authorization": "Bearer sk-swfte-..."}
)

data = response.json()
print(f"Balance: ${data['balance']:.2f}")
print(f"Low balance warning: {data['low_balance_warning']}")

Low Balance Alerts

When your balance drops below the threshold (configurable in Controls), the dashboard shows a warning banner. You can also configure webhook notifications:

requests.post("https://connect.swfte.com/api/controls/alerts", json={
    "type": "low_balance",
    "threshold": 10.00,
    "channels": ["email", "webhook"],
    "webhook_url": "https://your-app.com/webhooks/billing"
}, headers={"Authorization": "Bearer sk-swfte-..."})

Auto-Reload

Enable automatic credit top-ups when your balance drops below a threshold:

Go to Controls > Billing
Enable Auto-reload
Set the reload amount and minimum balance trigger
Confirm your payment method

Cost Optimization Checklist

Audit model usage -- Check Insights to see if expensive models are being used for simple tasks
Set max_tokens appropriately -- Don't request 4096 tokens for a yes/no classification
Use streaming -- Streaming doesn't reduce cost, but it improves perceived latency so users wait less
Cache repeated queries -- Swfte Connect can cache identical requests (configure in Controls)
Set temperature to 0 -- For deterministic tasks, temperature=0 enables caching
Monitor daily -- Set up daily budget alerts at 80% to catch anomalies early
Use Gemini Flash for classification -- At $0.50/1M input tokens, it is 10x cheaper than GPT-5
Batch embeddings -- Send multiple texts in a single embeddings request to reduce overhead