The truth about AI false positive reduction

AI

AI-driven supervision has rapidly become embedded across financial services. Recent industry research shows that 94% of firms are either already using or actively planning to deploy AI-based detection tools.

According to Theta Lake, from trade surveillance to communications monitoring, artificial intelligence is increasingly seen as the answer to the scale and complexity of modern compliance.

According to a report by Financial Industry Regulatory Authority, AI technologies allow firms to ingest and analyse vast volumes of structured and unstructured data, including text, voice, video and images, drawn from both internal and external sources. This broader scope enables organisations to monitor behaviour across business lines in a more holistic and risk-based way. For compliance teams, the promise is clear: smarter systems, deeper visibility and, crucially, fewer false positives.

The claim of significant false positive reduction has become one of the most prominent selling points in the RegTech market. Vendors frequently promote headline-grabbing percentages, arguing that their models dramatically cut unnecessary alerts and free up valuable staff time. Yet as Rohit Jain, distinguished engineer at Theta Lake, explains in the first part of a two-part series, firms should approach such claims with caution.

In machine learning-driven surveillance, the objective is often to detect rare instances of misconduct such as insider trading, collusion or inappropriate workplace behaviour. The analogy commonly used is that of finding a needle in a haystack. A false positive occurs when the system flags something as suspicious that is, in reality, entirely benign. In text-based compliance monitoring, this can arise from sarcasm, ambiguous phrasing or sector-specific jargon that a model misinterprets.

While it may seem prudent to err on the side of over-reporting, excessive false positives introduce their own risks. Every flagged communication demands human review. If analysts are spending significant portions of their day investigating harmless emails, operational efficiency quickly erodes. The resource burden compounds as volumes grow, turning what should be a streamlined risk function into a bottleneck.

There is also a behavioural dimension. When reviewers are confronted with hundreds of inaccurate alerts, fatigue sets in. Over time, they may begin to dismiss notifications more rapidly and less critically. This so-called “boy who cried wolf” effect increases the probability that a genuine instance of misconduct is overlooked. In a high-stakes regulatory environment, that risk can carry severe consequences.

One of the underlying technical reasons false positives persist is the “base rate” problem. In large corporate communications datasets, actual misconduct may account for a minuscule fraction of total messages. Even a model boasting 99% accuracy can generate thousands of false positives when scanning millions of emails if the real incidence of wrongdoing is extremely low. The mathematics of rare-event detection makes some level of noise inevitable.

This leads to another common misunderstanding: the overreliance on accuracy as a headline metric. In low base-rate environments, a system can appear highly accurate simply by classifying almost everything as normal. Such a model may achieve impressive percentages while failing to identify the very cases it was designed to detect. In this context, accuracy becomes a vanity metric, obscuring weaknesses in recall and precision.

There is also an unavoidable trade-off between sensitivity and specificity, often described as the bias-variance dilemma. Tightening a model to reduce false positives may increase the risk of missing genuine misconduct. Loosening it to capture every potential issue inevitably raises alert volumes. There is no perfect configuration, only a balance aligned with a firm’s risk appetite and supervisory obligations.

When vendors advertise “99% accuracy” or dramatic reductions in false positives, firms must probe deeper into the validation methodology, the composition of the test data and the real-world base rates. Without that scrutiny, headline numbers can mislead decision-makers. In the second part of this series, the focus will shift to practical strategies for reducing false positives and the key questions firms should put to technology providers.

Read the daily RegTech news

Copyright © 2026 RegTech Analyst

Enjoyed the story? 

Subscribe to our weekly RegTech newsletter and get the latest industry news & research

Copyright © 2018 RegTech Analyst

Investors

The following investor(s) were tagged in this article.