The hallucination risk hiding in your compliance stack

compliance

In June 2023, New York attorney Steven Schwartz submitted a legal brief citing six cases that simply did not exist. He had used ChatGPT to conduct the research and, trusting the chatbot’s apparent confidence, never verified a single citation against an authoritative legal database.

According to Sherlocq, the judge called the submission “legal gibberish” and sanctioned Schwartz and his colleague $5,000 each. Chief Justice John Roberts would later cite the episode in his year-end report as an early warning about AI in regulated legal work.

Sherlocq recently discussed the topic of AI for compliance, and what practitioners need that generic tools cannot deliver. 

The case became a landmark in discussions about AI hallucination, but compliance practitioners drew a more pointed conclusion: Schwartz’s error was not a legal-sector anomaly. Across every regulated industry — finance, insurance, law — professionals are being handed AI tools designed for breadth and speed, then asked to deploy them in environments that demand precision, verifiability and clear accountability. The mismatch is systemic, and the consequences are only now beginning to surface at scale.

The fundamental gap between plausibility and verifiability

General-purpose large language models are remarkable in their range. They can draft with fluency, synthesise information rapidly and switch between tasks with ease. But they were built to serve the widest possible audience. Compliance work requires something far more specific: a tool that understands the difference between Regulation D and Regulation DD, that knows when a FINRA notice supersedes an earlier interpretation, and that can trace a specific obligation back to its originating statute without conflating jurisdictions.

The distinction at the heart of this is deceptively simple. Generic AI is optimised for plausibility. Compliance work demands verifiability. In a regulated environment, the gap between those two things is not a minor inconvenience — it can cost firms millions in fines, trigger enforcement action, or expose board members to personal liability.

Hallucination is a material risk, not a nuisance

AI hallucination — where a model generates false information with apparent confidence — is a documented limitation of large language models. In consumer contexts, a fabricated fact is easily corrected. In compliance, it can be something far worse.

A compliance officer does not verify every sentence of AI output against source documents; that would defeat the purpose of automation. They act on it. They update policies, brief boards, report to regulators and train staff. An AI that confidently cites a rule that no longer exists, or misquotes an exemption threshold by even a decimal point, embeds error deep into institutional decision-making before anyone identifies the problem.

Generic tools have made progress on hallucination through techniques such as retrieval-augmented generation, but they apply these broadly across all domains. Vertical AI tools built for compliance take a different approach: they index authoritative regulatory sources — SEC releases, CFPB bulletins, PRA guidance, ESMA technical standards — and constrain the model to reason within that corpus. When the answer is not in the source material, a well-built compliance AI says so, rather than filling the gap with a plausible-sounding invention.

Five things generic AI cannot do in regulated environments

There are specific capabilities that general-purpose models consistently fail to deliver for compliance teams: citing regulatory provisions with version-accurate, jurisdiction-correct sourcing; flagging when guidance has been superseded, withdrawn or is subject to active rulemaking; producing audit-ready outputs with traceable reasoning and source attribution; applying firm-specific policy logic on top of external regulatory requirements; and alerting practitioners to enforcement trends from live regulatory data.

Auditability is not optional

In a regulatory examination, a board review or litigation discovery, the ability to explain and defend an AI-assisted decision is a core governance requirement. Generic AI tools are built for end-user experience, not institutional accountability. They compress reasoning, present outputs as finished products and do not surface the sources underpinning each conclusion in a way a regulator could review.

Vertical compliance platforms are architected differently. Every output is tied to a specific primary source — not a paraphrase of a paraphrase, but the original document with version and date.

Reasoning chains are exposed rather than hidden, outputs are formatted for documentation, and audit trails are built into the product by design rather than added retrospectively. This reflects a fundamentally different understanding of who the customer is. Generic AI is built for individual users. Compliance AI is built for the institution — and for the regulators that examine it.

Domain specificity as competitive advantage

The financial services compliance landscape is technically demanding and vast. BSA/AML requirements, suitability and best interest standards, capital adequacy under Basel III, consumer protection obligations across federal and state layers, cross-border reporting under FATCA, CRS and EMIR — each domain has its own vocabulary, enforcement culture and interpretive history. A model not trained deeply on this material will produce outputs that sound credible to a generalist and immediately raise flags with a practitioner.

The strongest compliance-focused AI platforms have spent years fine-tuning on curated regulatory corpora, building taxonomy structures that reflect how professionals navigate the landscape, and training on firm-generated data — policies, escalations and examination findings — under appropriate data governance controls.

A compliance AI that can answer whether a product feature requires a new regulatory filing is not a search engine with a chat interface; it is a structured reasoning system trained on the logic of regulatory interpretation. That is not something a general-purpose model can replicate without the same domain investment.

What good looks like in practice

For compliance leaders, the evaluative question has shifted from “should we use AI?” to “which AI is appropriate for which use case?” A clear taxonomy is emerging. Generic tools are useful for drafting internal communications, summarising public documents and accelerating research on non-sensitive matters. They are not appropriate for regulatory interpretation, policy gap analysis or examination preparation — anywhere the output will drive institutional decisions without extensive human review.

To read the full story, find the Sherlocq post here. 

Read the daily RegTech news

Copyright © 2026 RegTech Analyst

Enjoyed the story? 

Subscribe to our weekly RegTech newsletter and get the latest industry news & research

Copyright © 2018 RegTech Analyst

Investors

The following investor(s) were tagged in this article.