Rethinking the financial crime stack in the age of AI

February 27, 2026

AI is rapidly reshaping financial crime operations, but its impact is increasingly architectural rather than incremental. Rethinking the financial crime stack requires more than layering models onto existing workflows, it demands scrutiny of how data is structured, how decisions are formed and how policy logic is embedded within the system itself. The distinction between AI as enhancement and AI as operating foundation sits at the centre of this shift.

What does having an AI-native financial crime stack mean? In the view of Madhu Nadig, CTO at Flagright, means that the entire compliance system is built in a way that AI can operate end-to-end. Not as a feature layer, or an add-on, but as a core operating component.

He offers a simple litmus test. “If you remove the AI, would the product still work the same with the few missing bells and whistles? If it’s a yes, it’s not AI-native. But if removing the AI breaks the core operating workflow – the triaging, evidence gathering, decisioning and reinforcement learning – then it is truly AI-native because its built around and built for AI.”

In practice, this architecture has identifiable characteristics. First, a standardised data layer designed for machine reasoning – structured, consistent, and policy-aware. Second, a decision engine where AI can act within configurable controls, not merely recommend. Third, governance primitives, such as evaluation, auditability, and explainability, which are treated as first-class system components, not compliance afterthoughts.

Additionally, Nadig details that an AI-native system should measure model performance by segment, produce reproducible outputs and generate explainable decisions tied to evidence and policy. It should include clearly defined human approvals and an audit trail that stands up to regulatory scrutiny.

What doesn’t qualify as AI-native? This includes bolt-on AI summaries. “If you remove that, the product still works as it is, but it’s just sending an order to an LLM and asking you to summarise it” Likewise, black-box models that output opaque risk scores, with weak linkage to evidence or policy logic, do not meet the threshold.

Is AI-powered misleading?

The term AI-powered is used everywhere today across financial crime and beyond. However, where is it most misleading?

For Nadig, the phrase “AI-powered” is most misleading when attached to claims of detection or decisioning, as it often implies automated learning-based decisions. However, the Flagright CTO details that predominantly, most of these products are still fundamentally legacy workflows with some heuristic rules on top with some bolt-on AI.

What is hiding behind this? In the view of Nadig, typically you’ll see things like LLM summaries that draft narratives, auto-generated alert explanations, chat interfaces layered over case notes. Other things include static machine learning scoring and non-explainable risk scores, as well as fuzzy matching technologies. The latter one is increasingly rebranded under the AI banner, however they don’t constitute AI-native decisioning architecture, even if they are branded as so.

A more fundamental aspect for Nadig is actionability and lack of compounding. “AI needs more than a summary, it needs to safely auto-clear or escalate within defined governance boundaries, complete with audit trails and policy controls.” Without such an execution layer, AI-powered remains cosmetic.

Equally important is whether the system compounds. “A true AI system compounds over time, the right outcomes and dispositions feed into evaluation and tuning, so that models should improve in production as they go along,” said Nadig.

Changing the risk decision

How does an AI-native stack change the risk decision itself, not just how alerts are generated or processed?

For Madhu Nadig, the real impact of an AI-native stack is not faster alert handling. It is a different kind of risk decision altogether.

“I think fundamentally, an AI-native stack changes the risk decision from a one-time rule fires, an alert is created, and a human decides, into a more policy-controlled, evidence- based decision engine – a system that can act, explain itself, be auditable and improve,” said Nadig.

That shift moves firms from alert-first thinking to evidence-first decisioning. What this means for day-to-day workflows, Nadig says, is decisions become more evidence-based, not alert first.

“Now, you’re treating the alert as unit of work, and the system assembles an evidence graph,” said Nadig. This graph consolidates behavioural signals, historical patterns, policy constraints and behavioural signals and produces a structured disposition against that body of evidence. The output of this is not a free-text note, it is a decision object.

The system produces the disposition, the confidence, score, the top drivers, any required follow ups in an auditable way. The emphasis moves from documenting why an alert was closed to demonstrating how a risk conclusion was reached.

The final disruption for Nadig in how the risk decision is changed is that the policy controls move into a decision layer. So any policies or SOPs a company has will become part of the system themselves. “Human work shifts to more exception-handling and policy management, where instead of going in and dispositioning an alert, you just go in and change the policy on the system, and the AI agents take that and then execute on your behalf,” he said.

What is safe automation?

A pressing question that surrounds the recent wave of increased automation is what exactly does safe automation look like? There is a growing belief that safe automation in financial crime operations is not defined by how much work is removed from human hands, but by the controls that sit around every automated decision.

Madhu Nadig points to five core elements as foundational: “workflow, guardrails, auditability, explainability, and automated escalation to humans.” In other words, automation only becomes safe when it is embedded within a governed operational framework — one where every AI-driven outcome can be traced back to institutional policy and reviewed if necessary.

Crucially, all automated decisions must remain both auditable and explainable. As Nadig notes, “The financial institution or the FinTech deploying these AI systems are liable and are the ones who are licensed, so humans should always be in control.” That liability cannot be delegated to a model, which is why human control must remain intact at key decision points across the workflow.

In practice, this can take several forms. For example, AI systems acting strictly within pre-defined SOPs and internal policies; high-risk or ambiguous cases being immediately escalated to investigators; or uncertainty thresholds triggering mandatory human review. Safe automation also depends on continuous evaluation, ensuring that model outputs are regularly tested against investigator dispositions, with feedback loops built in to improve performance over time.

“These for me are the core fundamentals of an AI system that can safely automate in such a sensitive environment,” stressed Nadig.

AI and auditable web research

For Nadig, the foundation of auditable AI-driven web research is deceptively simple: “the core of auditability is to just log exactly what’s happening.” However, such a technology can’t be seen as an afterthought, and must be built a system level.

At Flagright, Nadig explains, it has a core system where every action within the platform is automatically audit-logged, whether it is performed by a human investigator or an AI agent.

“The system doesn’t differentiate who’s performing the action. As long as an action is performed, we log it.” That architectural choice can create parity, and in some cases superiority, in how AI activity can be reviewed.

Nadig points out that when a human investigates an alert, the platform is able to log what they do internally. However, if the investigator opens a browser and runs external searches, the company doesn’t have visibility into that, so that does end up in the audit log. “The only way to know how exactly an investigation was done is to actually work backwards from the narrative that the human has written,” he said.

In contrast, AI agents can be instrumented far more precisely, with web searches able to be logged in full. You can log exactly what data was looked for and can log the exact sources. “I think the core foundation to auditability is to have a system that logs everything, and a system treats any action done by an AI agent within the same level – if not more – as what a human would do,” exclaimed Nadig.

Building in explainability

How should explainability be built into an AI-native system so decisions are defensible to regulators, not just interpretable to data scientists?

The answer first provided by Nadig suggested that explainability in financial crime AI cannot stop at model interoperability. It must translate into regulatory defensibility.

Nadig identified three structural requirements. First of all, he believes there needs to be a big shift away from score-based systems. Traditional machine learning architectures often produce a risk score – a number that may be statistically valid but operationally opaque. “There’s always a score, and that score is not explainable, typically – so that explainability needs to move on to a more holistic sense.” Decisions should be expressed in structured, human-readable terms.

The second requirement centers around anchoring explainability in policy execution. In Nadig’s view, the core architecture of an AI-native system should revolve around the financial institution’s own SOPs and internal risk policies. The agent does not invent logic; it executes predefined governance. If an AI performs four investigative steps, it does so because those steps are encoded in the institution’s procedures.

“You always work backwards from policy,” he explains. When a regulator asks why a decision was made, the answer should be traceable directly to an approved policy clause or operating standard — with a system that executes “exactly to spec.” That alignment transforms explainability from a technical feature into a governance mechanism.

Third, uncertainty must default to human judgement. If the AI cannot confidently explain its own reasoning within policy parameters – or if it encounters ambiguity or operational error – the case should automatically escalate. An AI-native system should not force post-hoc rationalisation. If the explanation fails, the system routes the decision back to a named human owner. In those moments, accountability and interpretability converge.

The larger point is that defensibility is not achieved by adding dashboards or model visualisations for data scientists. It is achieved by embedding policy logic, structured evidence trails, and escalation thresholds directly into the decision architecture. Explainability, in this sense, is not about decoding the model. It is about proving that every outcome is traceable to policy, reviewable by humans, and defensible under supervisory scrutiny.

AI in FinCrime: who is accountable?

A growing debate is rising around a particular question that links AI and financial crime: when AI influences such a decision, who is ultimately accountable, and how should such accountability be made explicit?

When AI influences a financial crime decision, accountability does not become abstract. It becomes sharper. “Ultimately, the regulated financial institution is accountable,” Nadig says. “They’re the ones with the licence. They’re the ones answerable to regulators.” No deployment model, no vendor architecture, and no machine-learning layer displaces that fact. Legal and regulatory responsibility remains with the firm.

In practice, that accountability begins with the business owner of the AML or financial crime programme — typically the compliance lead, MLRO, or BSA Officer, depending on jurisdiction. The question, then, is not who is accountable. It is how that accountability is made explicit inside an AI-native operating model. For Nadig, this must be hard-coded into the system itself.

If an alert is auto-cleared, this means compliance owns both the policy and the outcome. For Nadig, working backwards from policy is critical, because the policy is made by the financial compliance leadership and AI should follow that to the specification.

Nadig added that the second part of the accountability aspect, is that where AI informs or recommends, a human decision-maker must remain clearly identified, and where AI executes an action. Is must do so strictly within pre-approved policy boundaries and guardrails.

Every material outcome, whether that be an alert closure, an exception override, a SAR filing, should generate a record naming the accountable role, the governing policy, the SOP followed, and the evidence relied upon. This for Nadig is another level of the accountability push.

He explained, “Financial institutions need to decide where they draw the line in regard to the scope of automation. Sometimes its by customer-tier, or risk rating, but most often is it by confidence threshold – but there must be documented and approved operating parameters, as well as it being clear that it did something because that was the standard operating procedures from its policies.

The larger point for Nadig is this simply that AI can influence decisions and can be a massive efficiency gain, but the accountability must always resolve to a name-functional role, as well as there being a traceable-control trail.

Human-AI trade-off misunderstandings

What does AI-native not mean for human investigators, and where do firms most commonly misunderstand the human–AI trade-off?

On this question, Nadig makes clear that for him, AI-native is often misunderstood from outset. It does not mean replacing investigators, neither does it mean turning compliance into a ‘lights-out black box’. Instead, it means moving human expertise up the value chain.

“AI doesn’t replace jobs — it replaces tasks,” he says. “We want investigators to go from repetitive try-outs to higher judgment work with tighter controls and better evidence at the end-of-the-day.”

In an AI-native environment, humans are not reduced to prompt-writing or model babysitting. They are still accountable for overrides, risk appetite and exceptions, says Nadig,

In an AI-native environment, humans are not reduced to prompt-writing or model babysitting. They remain accountable for exceptions, overrides, and risk appetite. They define the policies and SOPs; the AI operates within the fences compliance leadership sets.

“Humans absolutely control the policies, the SOPs, so that the AI acts under the fence created by compliance leadership and compliance teams,” Nadig said.

A deeper misunderstanding for Nadig is how firms frame the human-AI trade off. Too often, institutions will optimise for alert processing efficiency rather than for risk outcomes.

Another recurring blind spot is underinvestment in taxonomy and quality assurance feedback loops. Without consistent disposition codes and clearly defined reason frameworks, AI systems cannot learn effectively — and investigators will never fully trust them. Clean, structured decision data is not a reporting nicety; it is the foundation of machine learning integrity. Consistency in day-to-day AML operations is what enables consistency at scale.

Finally, Nadig pushes back on the assumption that greater automation means reduced oversight. “I think it’s actually the contrary,” he said. “In reality, automation needs more explicit governance, because now you maintain your policies more frequently, you have a better grip of your SOPs, because you have a system that does exactly what is written in your policies or your standard operating procedures, which may not be the case with human analysts.”

In that sense, AI-native compliance is not about loosening control. It is about tightening it — with humans firmly in charge of the framework, and machines executing within it with measurable precision.

Measurable changes

If a firm is genuinely operating an AI-native financial crime stack, the most important shift within the first 12 months should be measurable at the decision boundary itself, Nadig stressed.

As Nadig puts it, “a meaningful share of low-risk work should be auto-cleared or auto-routed under policy.” In practice, that means the system is no longer surfacing large volumes of operationally expensive but ultimately benign alerts for manual review. Instead, it is consistently making higher-quality risk decisions earlier in the workflow, allowing investigative capacity to be focused where it matters most.

The second signal is a reduction in false positives without any corresponding loss in coverage. Nadig frames this as a core performance test – can firms materially reduce the percentage of false positives without a single false negative? This is, he makes clear, ensuring the operational work is lower without suspicious activities being missed, a balance that traditional rule-based monitoring has struggled to achieve at scale.

A third measurable outcome should be the speed at which regulatory artefacts are produced. Nadig said, “This is typically SAR or STR narratives and reports themselves. These can meaningfully drop the right amount of time and effort it can within a truly AI-native system.”

On a final point, Nadig said that within a 12-month period, businesses should expect to see clear evidence of a working feedback loop. As Nadig remarks, this timeframe typically provides enough data to begin quantitatively assessing whether the system is improving through training.

Read the daily RegTech news

Investors

The following investor(s) were tagged in this article.

Rethinking the financial crime stack in the age of AI

Investors

Latest Analysis

Mastercard launches CBDC partner programme

Spending on RegTech expected to reach $130bn by 2025

FinTech community welcomes UK FinTech review but fear more must be...

Cybersecurity market expected to be worth $199.98bn by 2025

270 service deposit addresses drive more than half of cryptocurrency money...

Endor Labs acquires Autonomous Plane to boost AI security

AI security firm Nullify bags $12.5m to scale product defence