Governance
Every major LLM, scored against NIST AI RMF and the EU AI Act
Compliance scorecards for procurement and risk teams. Each model is audited against the four NIST AI RMF functions and EU AI Act Articles 10-15, with cited evidence from published documentation and live behavioural probes.
OpenAI
GPT-4.5
Based on published documentation. Full audit in progress (0%).
OpenAI
o3-mini
Based on published documentation. Full audit in progress (0%).
Alibaba
Qwen 3
Based on published documentation. Full audit in progress (0%).
Frequently asked questions
What is an LLM governance audit?
A governance audit scores a large language model against a formal framework — NIST AI RMF, the EU AI Act, ISO/IEC 42001, or sector-specific rules — and produces a procurement-grade scorecard. The output covers the four NIST functions (Govern, Map, Measure, Manage) and EU AI Act Articles 10-15: data governance, technical documentation, transparency, human oversight, accuracy, robustness, and cybersecurity.
Which models are scored?
Every major frontier LLM available via API or open weights — Claude Opus 4.7 and Sonnet 4 (Anthropic), GPT-5.5 Pro and GPT-5.5 (OpenAI), Gemini 3.1 Pro and 3.0 (Google), DeepSeek V4 Pro and V4 (DeepSeek), Llama 4 (Meta), Grok 4 (xAI), Qwen 3 (Alibaba), Mistral Large, Kimi K2.5, and Command R+. New releases are added within 30 days of GA.
How are scores calculated?
Each model is scored on cited evidence from published documentation (model cards, system cards, transparency reports) plus live behavioural probes against the audit harness. Completeness reflects how much of the NIST + EU AI Act control set has verifiable evidence; the score is a weighted aggregate across the controls.
How often are the audits refreshed?
Audits refresh on every model version bump and quarterly otherwise. Each scorecard shows updatedAt — the timestamp of the last evidence pass.
Can I use these scorecards in a procurement RFP?
Yes — that is the primary use case. Each scorecard exports as a procurement-grade PDF that risk and compliance teams can attach to vendor risk assessments and AI Act conformity checks.