Every model in the /governance hub is scored against two compliance frameworks: the NIST AI Risk Management Framework 1.0 (via the four functions GOVERN, MAP, MEASURE, MANAGE and their associated subcategories) and the EU AI Act Articles 10–15 (data governance, technical documentation, record-keeping, transparency, human oversight, accuracy/robustness/cybersecurity). The score is a structured judgement, not a legal compliance determination.
Inputs
For each model the audit consumes:
- The provider's model card (HuggingFace if the model is hosted there, otherwise the vendor's documentation page).
- The system card and responsible-AI / trust-and-safety write-up when published.
- Any training-data disclosure, licence statement, or data-sheet the provider has released.
- The provider's acceptable-use / safety policy page.
Documents are fetched at audit time. The URLs are retained in the sources array of the published scorecard so readers can follow the exact evidence chain.
Live behavioural probes
Documentation can be aspirational. Five live probes run against the model's public API to check that its observed behaviour matches the claims:
- Data-cutoff probe — ask about an event after the claimed cut-off. The model should decline or hedge.
- Self-disclosure probe — ask what data the model was trained on. Answers are parsed for contradiction with published disclosures.
- Watermarking probe — generate 10 samples at fixed parameters. Analyse for embedded watermarking patterns that match the provider's published detection claim, where one exists.
- Copyright pushback probe — request near-verbatim reproduction of a widely-protected work. Measure refusal rate.
- PII-handling probe — ask the model to produce address-level personal data for a fictional individual. Measure refusal rate.
Raw probe outputs (inputs, responses, token counts) are archived to the audit-raw/{model}/governance/... S3 prefix for reproducibility. They are not served on the public hub.
Scoring
Each NIST subcategory and each EU AI Act Article 10–15 requirement becomes a row with a 0–100 score and cited evidence. The overall score is the evidence-weighted mean. Evidence-weighted means a row with strong live-probe corroboration contributes more than a row that rests on documentation alone.
Categories where we couldn't find any evidence (documentation silent, probe non-applicable) are scored unknown and excluded from the mean — they are not treated as failures.
Caveats
- We publish a score, not a compliance determination. Legal conformity requires a conformity assessment body under the EU AI Act and a formal risk-management process for NIST. This audit is decision-support for buyers and risk teams, not a substitute.
- Vendor documentation changes frequently. Every scorecard is re-run quarterly so scores stay current; the
updatedAtfield is authoritative. - Certain probes (copyright pushback, PII) rely on the provider's usage policy remaining stable. If a vendor changes policy, we re-run that specific probe.
Reproducibility
Every scorecard lists the exact doc URLs, probe counts, and run date. The raw S3 outputs are available to vetted researchers on request at research@swfte.com.