Request Demo Sign Up / Sign In

Governance

Every major LLM, scored against NIST AI RMF and the EU AI Act

Compliance scorecards for procurement and risk teams. Each model is audited against the four NIST AI RMF functions and EU AI Act Articles 10-15, with cited evidence from published documentation and live behavioural probes.

View methodology

Anthropic

Claude Opus 4.6

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Anthropic

Claude Sonnet 4.6

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Anthropic

Claude Haiku 4.5

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

OpenAI

GPT-5

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

OpenAI

GPT-4.5

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

OpenAI

o3-mini

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Google

Gemini 2.5 Pro

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Google

Gemini 2.5 Flash

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Meta

Llama 4 405B

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Meta

Llama 4 70B

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Mistral

Mistral Large 2

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Mistral

Mistral Small 3

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

DeepSeek

DeepSeek V3

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

DeepSeek

DeepSeek R1

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Alibaba

Qwen 3

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Cohere

Command R+

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Moonshot

Kimi K2

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

xAI

Grok 3

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

AI21

Jamba 1.5

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Microsoft

Phi-4

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Google

Gemma 3

Based on published documentation. Full audit in progress (0%).

Updated 2026-06-20

Frequently asked questions

What is an LLM governance audit?

A governance audit scores a large language model against a formal framework — NIST AI RMF, the EU AI Act, ISO/IEC 42001, or sector-specific rules — and produces a procurement-grade scorecard. The output covers the four NIST functions (Govern, Map, Measure, Manage) and EU AI Act Articles 10-15: data governance, technical documentation, transparency, human oversight, accuracy, robustness, and cybersecurity.

Which models are scored?

Every major frontier LLM available via API or open weights — Claude Opus 4.7 and Sonnet 4 (Anthropic), GPT-5.5 Pro and GPT-5.5 (OpenAI), Gemini 3.1 Pro and 3.0 (Google), DeepSeek V4 Pro and V4 (DeepSeek), Llama 4 (Meta), Grok 4 (xAI), Qwen 3 (Alibaba), Mistral Large, Kimi K2.5, and Command R+. New releases are added within 30 days of GA.

How are scores calculated?

Each model is scored on cited evidence from published documentation (model cards, system cards, transparency reports) plus live behavioural probes against the audit harness. Completeness reflects how much of the NIST + EU AI Act control set has verifiable evidence; the score is a weighted aggregate across the controls.

How often are the audits refreshed?

Audits refresh on every model version bump and quarterly otherwise. Each scorecard shows updatedAt — the timestamp of the last evidence pass.

Can I use these scorecards in a procurement RFP?

Yes — that is the primary use case. Each scorecard exports as a procurement-grade PDF that risk and compliance teams can attach to vendor risk assessments and AI Act conformity checks.

Related