Why data quality makes or breaks AI

AI

The rapid adoption of AI across financial services has brought an old warning sharply back into focus: “garbage in/garbage out.”

As organisations pour more money and talent into AI initiatives, they are discovering that the technology’s output is only as strong as the data used to train and operate it. Without clean, relevant, and well-structured data, even the most sophisticated AI systems risk producing unreliable or misleading results, claims AscentAI.

AscentAI lead regulatory advisor Jilaine Bauer has seen the consequences of poor data quality play out repeatedly as she works with regulated firms. AscentAI lead regulatory advisor Jilaine Bauer said, “it’s important for AI solutions to be trained on data sets that include industry-specific data to achieve greater accuracy, relevance and insights. For example, when working with an insurance company, it is important for an AI solution to be trained on data sets that include terminology and concepts relevant to the insurance sector and related subsectors. For FinTech firms delivering traditional financial services in new, engaging ways, such as digital banking, data sets may need to include new or different terms and concepts.

“And, for firms operating in more than one country, taxonomies and ontologies can help structure and categorize data to help ensure it is applied in a consistent matter. Finally, perhaps the most important step we take at AscentAI to ensure the data is fit for the AI application is to develop use cases specific to client use cases and then scope the data we think applies for client review and approval.”

This rising scrutiny reflects a shift in how businesses now view AI. Early enthusiasm has given way to a more grounded assessment of how large language models perform when trained on poor, inaccurate, or overly generic datasets. The Institute of Electrical and Electronics Engineers recently issued its own assessment, citing a study that found newer, larger models are sometimes less reliable, despite their scale.

The IEEE noted, “A common assumption is that scaling up the models driving these applications will improve their reliability—for instance, by increasing the amount of data they are trained on, or the number of parameters they use to process information. However, more recent and larger versions of these language models have actually become more unreliable, not less, according to a new study.”

A central issue is that developers often cannot fully understand sprawling datasets containing millions or billions of unstructured data points. Bauer is clear about the consequence. Bauer said, “It’s a really hard problem to solve, but the success of AI applications depends on it. I think it’s a key determinant on whether you succeed or fail in leveraging the power of AI.”

To build dependable AI, organisations must prioritise timely, accurate, and usable datasets—especially as models become more dependent on internal data sources rather than broad, public ones.

This requires modernising data governance frameworks so they can cope with the needs of AI, including handling both structured and unstructured data. Dataversity’s Michelle Knight warns that many firms underestimate the scale of this challenge. According to Knight, today’s governance programmes often focus on only a fraction of enterprise data, leaving the rest unexamined. She likens AI to an iceberg: executives see only the promise above the surface, while the neglected mass of data below risks sinking the entire initiative.

Knight’s advice is to place data quality and governance at the top of the AI readiness agenda. Without understanding existing data and ensuring its lineage and quality, firms risk costly mistakes. She argues that evaluating current governance capabilities and applying data quality best practice is the best foundation for AI adoption.

AscentAI reinforces this connection between data management and trustworthy AI, maintaining strict oversight of all modelling processes. The firm applies 10 layers of redundancy for data quality, combining automated checks with human validation to ensure the accuracy of its data. Its AI draws exclusively from authoritative regulatory materials issued by national, state, and local bodies, ensuring clients are not relying on unpredictable external inputs. For compliance teams and wider AI users alike, clean, reliable, and accessible data remains the foundation of any successful AI deployment.

Read the daily RegTech news

Copyright © 2025 RegTech Analyst

Enjoyed the story? 

Subscribe to our weekly RegTech newsletter and get the latest industry news & research

Copyright © 2018 RegTech Analyst

Investors

The following investor(s) were tagged in this article.