Why single AI models miss communications risks

AI

The modern workplace no longer communicates in neat, text-only channels. Teams now exchange information across video calls, voice notes, chat, screen sharing and collaborative whiteboards, peppered with emojis, GIFs and a rising volume of AI-generated content.

According to Theta Lake, tools such as Microsoft Teams, Zoom and Webex sit at the centre of day-to-day operations, increasingly supported by embedded assistants like Microsoft Copilot and Zoom AI Companion.

That shift is pushing regulated firms to rethink how they supervise communications, especially as volumes rise and context becomes harder to capture. In financial services, 94% of firms are now using, or planning to use, AI-based detections to monitor employee communications. But the more channels and formats organisations adopt, the clearer it becomes that relying on a single machine learning approach can leave blind spots in regulatory, privacy and security oversight.

The challenge is that every model comes with built-in assumptions about how data behaves. Classical techniques range from nearest-neighbour methods that expect tidy clusters, to maximum-margin classifiers that seek crisp boundaries between classes. Those assumptions create bias, and the same principle applies to modern architectures too. Real-world communications rarely follow clean statistical patterns: language is ambiguous, intent can be indirect, and risky behaviour is often disguised inside apparently normal conversations.

This is where ensemble modelling comes in. Rather than betting everything on one technique—often a large language model—ensembles combine multiple models into a “super model”. Because each approach has different strengths and weaknesses, the overall system can offset individual errors, reduce brittleness and produce more robust predictions. In practice, ensembles also allow weighting, so the system can lean more heavily on whichever method performs best for a particular slice of data.

The idea extends beyond classical machine learning into large language and large vision models, which are typically fine-tuned rather than trained from scratch. Fine-tuning can sharpen a model for a specific task, but it does not eliminate inherited biases from original training data. Combining multiple fine-tuned models can help reduce over-reliance on any single model’s quirks, improving resilience as communications styles and risks evolve.

For compliance detections, Theta Lake argues the strongest ensembles go further than “models” alone. Lexicons can precisely capture known risky terms and phrases, while intelligent fuzzy matching can flag near-misses and subtle variations that would slip past exact matching. Machine learning models, meanwhile, are better at detecting semantic similarity and implied meaning. Used together, these approaches provide layered coverage: when one technique breaks down, another can still surface the signal, improving detection while reducing false positives compared with lexicon-only or single-model systems.

Data quality sits at the centre of this, too. The industry has repeatedly assumed larger models can compensate for noisier data, but highly specific risks still depend on high-quality labels and representative datasets across behaviours, languages and communication styles.

Recent OpenAI research underscored that point, noting: “classifiers trained on tens of thousands of high-quality labeled samples can still perform better at classifying content than gpt-oss-safeguard does when reasoning directly from the policy. Taking the time to train a dedicated classifier may be preferred for higher performance on more complex risks.”

In practice, collusion detection shows why this matters. Collusive behaviour is rarely explicit; it is hinted through secrecy, attempts to avoid detection, and indirect references to manipulation.

More reliable detection blends NLP models trained on collusion patterns with targeted lexicons, fuzzy matching for paraphrases, and contextual analysis across surrounding messages. For compliance teams, the operational win is not just higher detection rates, but fewer noisy alerts—reducing fatigue and improving confidence in what gets escalated for review.

Ultimately, ensemble approaches are less about novelty and more about repeating lessons the industry keeps relearning: data diversity matters, label accuracy matters, model diversity matters—and ensembling improves robustness. In a world where communications formats keep multiplying and behaviour keeps shifting, expecting one model to “understand everything” is a risky strategy for RegTech programmes that need dependable coverage.

Read the daily RegTech news

Copyright © 2026 RegTech Analyst

Enjoyed the story? 

Subscribe to our weekly RegTech newsletter and get the latest industry news & research

Copyright © 2018 RegTech Analyst

Investors

The following investor(s) were tagged in this article.