|
English

AI costs can spiral quickly when you are running thousands of requests per day across multiple providers. Swfte Connect gives you granular visibility and control over spend -- per API key, per model, per provider, and per time window.


Understanding Costs

Every request through Swfte Connect is metered and priced transparently. Cost depends on three factors:

  1. Token count -- Prompt tokens (input) + completion tokens (output)
  2. Model pricing -- Each provider/model combination has different per-token rates
  3. Features used -- Function calling, image generation, and embeddings have separate pricing

View real-time cost data on the dashboard:

  • Total Costs metric card -- Aggregate spend in the selected time range
  • Cost per Provider chart -- Breakdown showing which providers consume the most budget
  • Insights > Analytics -- Per-model cost breakdown, cost optimization recommendations

Budget Alerts

Set spending limits to prevent unexpected charges. Configure budgets in Controls > Cost Controls:

Per-Key Budgets

Limit spend for individual API keys. Useful for isolating costs per application, team, or environment:

# API call to set a budget
import requests

requests.post("https://connect.swfte.com/api/controls/budgets", json={
    "api_key_id": "key_abc123",
    "budget": {
        "amount": 100.00,
        "currency": "USD",
        "period": "monthly",
        "action_on_exceed": "alert"  # or "block"
    }
}, headers={"Authorization": "Bearer sk-swfte-..."})

Actions when budget is exceeded:

  • Alert -- Send notification (email, webhook) but continue processing
  • Throttle -- Reduce request rate to 10% of normal
  • Block -- Reject all requests until the next period or manual override

Workspace Budgets

Set a ceiling for total workspace spend:

SettingDescription
Monthly budgetMaximum spend per calendar month
Daily budgetMaximum spend per day (catches runaway processes early)
Alert thresholdPercentage at which to send warnings (e.g., 80%)
Hard limitAbsolute maximum -- requests are blocked beyond this

Rate Limits

Control request volume to prevent abuse or runaway loops:

# Configure rate limits per API key
requests.post("https://connect.swfte.com/api/controls/rate-limits", json={
    "api_key_id": "key_abc123",
    "limits": {
        "requests_per_minute": 60,
        "requests_per_hour": 1000,
        "tokens_per_minute": 100000,
        "concurrent_requests": 10
    }
}, headers={"Authorization": "Bearer sk-swfte-..."})

When a rate limit is hit, the gateway returns a 429 Too Many Requests response with a Retry-After header. The SDK handles this automatically with exponential backoff.


Model Cost Comparison

Not all models cost the same for the same task. Use the Insights > Analytics view to compare:

ModelInput (per 1M tokens)Output (per 1M tokens)Avg Latency
GPT-5$5.00$15.00320ms
Claude Sonnet 4$3.00$15.00450ms
Gemini 2.5 Pro$2.50$10.00210ms
Gemini 2.5 Flash$0.50$1.50180ms
Mistral Large$2.00$6.00280ms

Tiered Model Strategy

Use different models for different use cases:

# High-stakes tasks: premium model
response = client.chat.completions.create(
    model="anthropic:claude-sonnet-4",
    messages=[{"role": "user", "content": "Review this legal contract..."}],
    max_tokens=4096
)

# Routine tasks: cost-efficient model
response = client.chat.completions.create(
    model="google:gemini-2.5-flash",
    messages=[{"role": "user", "content": "Classify this support ticket..."}],
    max_tokens=128
)

# Bulk processing: cheapest option
response = client.chat.completions.create(
    model="mistral:mistral-large",
    messages=[{"role": "user", "content": "Extract entities from this text..."}],
    max_tokens=256
)

This approach typically reduces costs by 40-60% compared to using a single premium model for everything.


Credit Management

Swfte uses a prepaid credit system. Credits are consumed based on actual usage.

Checking Balance

The dashboard displays your current credit balance in the metric strip. You can also check programmatically:

import requests

response = requests.get(
    "https://connect.swfte.com/api/billing/credits",
    headers={"Authorization": "Bearer sk-swfte-..."}
)

data = response.json()
print(f"Balance: ${data['balance']:.2f}")
print(f"Low balance warning: {data['low_balance_warning']}")

Low Balance Alerts

When your balance drops below the threshold (configurable in Controls), the dashboard shows a warning banner. You can also configure webhook notifications:

requests.post("https://connect.swfte.com/api/controls/alerts", json={
    "type": "low_balance",
    "threshold": 10.00,
    "channels": ["email", "webhook"],
    "webhook_url": "https://your-app.com/webhooks/billing"
}, headers={"Authorization": "Bearer sk-swfte-..."})

Auto-Reload

Enable automatic credit top-ups when your balance drops below a threshold:

  1. Go to Controls > Billing
  2. Enable Auto-reload
  3. Set the reload amount and minimum balance trigger
  4. Confirm your payment method

Cost Optimization Checklist

  1. Audit model usage -- Check Insights to see if expensive models are being used for simple tasks
  2. Set max_tokens appropriately -- Don't request 4096 tokens for a yes/no classification
  3. Use streaming -- Streaming doesn't reduce cost, but it improves perceived latency so users wait less
  4. Cache repeated queries -- Swfte Connect can cache identical requests (configure in Controls)
  5. Set temperature to 0 -- For deterministic tasks, temperature=0 enables caching
  6. Monitor daily -- Set up daily budget alerts at 80% to catch anomalies early
  7. Use Gemini Flash for classification -- At $0.50/1M input tokens, it is 10x cheaper than GPT-5
  8. Batch embeddings -- Send multiple texts in a single embeddings request to reduce overhead

Viewing Cost Reports

Navigate to Insights > Analytics for detailed cost reports:

  • Cost over time -- Line chart showing daily/weekly/monthly spend
  • Cost by model -- Which models are consuming the most budget
  • Cost by provider -- Aggregated provider-level spend
  • Cost per request -- Average cost per API call
  • Cost optimization score -- AI-generated recommendations for reducing spend

Export reports as CSV for accounting or internal dashboards.


Next Steps

0
0
0
0

Enjoyed this article?

Get more insights on AI and enterprise automation delivered to your inbox.