AI costs can spiral quickly when you are running thousands of requests per day across multiple providers. Swfte Connect gives you granular visibility and control over spend -- per API key, per model, per provider, and per time window.
Understanding Costs
Every request through Swfte Connect is metered and priced transparently. Cost depends on three factors:
- Token count -- Prompt tokens (input) + completion tokens (output)
- Model pricing -- Each provider/model combination has different per-token rates
- Features used -- Function calling, image generation, and embeddings have separate pricing
View real-time cost data on the dashboard:
- Total Costs metric card -- Aggregate spend in the selected time range
- Cost per Provider chart -- Breakdown showing which providers consume the most budget
- Insights > Analytics -- Per-model cost breakdown, cost optimization recommendations
Budget Alerts
Set spending limits to prevent unexpected charges. Configure budgets in Controls > Cost Controls:
Per-Key Budgets
Limit spend for individual API keys. Useful for isolating costs per application, team, or environment:
# API call to set a budget
import requests
requests.post("https://connect.swfte.com/api/controls/budgets", json={
"api_key_id": "key_abc123",
"budget": {
"amount": 100.00,
"currency": "USD",
"period": "monthly",
"action_on_exceed": "alert" # or "block"
}
}, headers={"Authorization": "Bearer sk-swfte-..."})
Actions when budget is exceeded:
- Alert -- Send notification (email, webhook) but continue processing
- Throttle -- Reduce request rate to 10% of normal
- Block -- Reject all requests until the next period or manual override
Workspace Budgets
Set a ceiling for total workspace spend:
| Setting | Description |
|---|---|
| Monthly budget | Maximum spend per calendar month |
| Daily budget | Maximum spend per day (catches runaway processes early) |
| Alert threshold | Percentage at which to send warnings (e.g., 80%) |
| Hard limit | Absolute maximum -- requests are blocked beyond this |
Rate Limits
Control request volume to prevent abuse or runaway loops:
# Configure rate limits per API key
requests.post("https://connect.swfte.com/api/controls/rate-limits", json={
"api_key_id": "key_abc123",
"limits": {
"requests_per_minute": 60,
"requests_per_hour": 1000,
"tokens_per_minute": 100000,
"concurrent_requests": 10
}
}, headers={"Authorization": "Bearer sk-swfte-..."})
When a rate limit is hit, the gateway returns a 429 Too Many Requests response with a Retry-After header. The SDK handles this automatically with exponential backoff.
Model Cost Comparison
Not all models cost the same for the same task. Use the Insights > Analytics view to compare:
| Model | Input (per 1M tokens) | Output (per 1M tokens) | Avg Latency |
|---|---|---|---|
| GPT-5 | $5.00 | $15.00 | 320ms |
| Claude Sonnet 4 | $3.00 | $15.00 | 450ms |
| Gemini 2.5 Pro | $2.50 | $10.00 | 210ms |
| Gemini 2.5 Flash | $0.50 | $1.50 | 180ms |
| Mistral Large | $2.00 | $6.00 | 280ms |
Tiered Model Strategy
Use different models for different use cases:
# High-stakes tasks: premium model
response = client.chat.completions.create(
model="anthropic:claude-sonnet-4",
messages=[{"role": "user", "content": "Review this legal contract..."}],
max_tokens=4096
)
# Routine tasks: cost-efficient model
response = client.chat.completions.create(
model="google:gemini-2.5-flash",
messages=[{"role": "user", "content": "Classify this support ticket..."}],
max_tokens=128
)
# Bulk processing: cheapest option
response = client.chat.completions.create(
model="mistral:mistral-large",
messages=[{"role": "user", "content": "Extract entities from this text..."}],
max_tokens=256
)
This approach typically reduces costs by 40-60% compared to using a single premium model for everything.
Credit Management
Swfte uses a prepaid credit system. Credits are consumed based on actual usage.
Checking Balance
The dashboard displays your current credit balance in the metric strip. You can also check programmatically:
import requests
response = requests.get(
"https://connect.swfte.com/api/billing/credits",
headers={"Authorization": "Bearer sk-swfte-..."}
)
data = response.json()
print(f"Balance: ${data['balance']:.2f}")
print(f"Low balance warning: {data['low_balance_warning']}")
Low Balance Alerts
When your balance drops below the threshold (configurable in Controls), the dashboard shows a warning banner. You can also configure webhook notifications:
requests.post("https://connect.swfte.com/api/controls/alerts", json={
"type": "low_balance",
"threshold": 10.00,
"channels": ["email", "webhook"],
"webhook_url": "https://your-app.com/webhooks/billing"
}, headers={"Authorization": "Bearer sk-swfte-..."})
Auto-Reload
Enable automatic credit top-ups when your balance drops below a threshold:
- Go to Controls > Billing
- Enable Auto-reload
- Set the reload amount and minimum balance trigger
- Confirm your payment method
Cost Optimization Checklist
- Audit model usage -- Check Insights to see if expensive models are being used for simple tasks
- Set max_tokens appropriately -- Don't request 4096 tokens for a yes/no classification
- Use streaming -- Streaming doesn't reduce cost, but it improves perceived latency so users wait less
- Cache repeated queries -- Swfte Connect can cache identical requests (configure in Controls)
- Set temperature to 0 -- For deterministic tasks, temperature=0 enables caching
- Monitor daily -- Set up daily budget alerts at 80% to catch anomalies early
- Use Gemini Flash for classification -- At $0.50/1M input tokens, it is 10x cheaper than GPT-5
- Batch embeddings -- Send multiple texts in a single embeddings request to reduce overhead
Viewing Cost Reports
Navigate to Insights > Analytics for detailed cost reports:
- Cost over time -- Line chart showing daily/weekly/monthly spend
- Cost by model -- Which models are consuming the most budget
- Cost by provider -- Aggregated provider-level spend
- Cost per request -- Average cost per API call
- Cost optimization score -- AI-generated recommendations for reducing spend
Export reports as CSV for accounting or internal dashboards.
Next Steps
- Getting Started -- Initial setup
- SDK Guide -- Full SDK reference
- Multi-Provider Routing -- Failover and routing
- API Reference -- REST API documentation