top of page

From Chaos to Clarity: Why Data Quality Is the Real Foundation of AI-Driven Compliance

  • Writer: FinScan
    FinScan
  • 6 hours ago
  • 5 min read

Insights from the Dow Jones Risk Journal Summit in Dubai 




At the recent Dow Jones Risk Journal Summit in Dubai, Ibrahim Bennani Doubli of FinScan sat down to discuss a topic that's often overlooked in the rush toward artificial intelligence: data itself. While the compliance industry buzzes about automation, machine learning models, and generative AI, his message was refreshingly direct—without trusted data, AI doesn’t create clarity. It scales complexity.



The Foundation: Completeness and Accuracy 


According to a 2025 Nasdaq survey, 72% of compliance leaders now rank data quality as their highest monitoring priority over the next 12 months, with data completeness (61%) as a leading concern.  


Before diving into sophisticated watchlist and sanctions screening technology or orchestration layers, compliance teams need to get the basics right. Missing names, partial identifiers, inconsistent formats, or fragmented records don’t just create operational inefficiency, they create genuine risk exposure. Incomplete data drives false positives that overwhelm investigators, while simultaneously hiding false negatives that slip past controls.  


The consequences can be severe: screening systems may inadvertently fail to flag sanctioned entities or individuals. And, the challenge intensifies dramatically in regions like the Middle East and South Asia. 


Consider the Arabic letter ‘ك’ (Ka), which can transliterate into English as either ‘K’ or ‘Q’. The name Mohammed alone has over 50 documented variations in English, and that number can climb into the hundreds depending on regional dialects and transliteration practices. In India, the surname ‘Devi’ is shared by an estimated 70 million people. Layer on top of this the Arab naming convention (which uses Ism, Nasab, Laqab, Nisba, and Kunya rather than the Western first-middle-last structure) and the data quality challenge becomes abundantly clear. 


Without structured, accurate data from the start, even the most advanced screening technology struggles to deliver reliable outcomes. As the saying goes: garbage in, garbage out—in the form of amplified risk. 


Why Data Quality Frameworks Matter


A strong data quality framework transforms screening from a fragile, reactive process into a robust, defensible program. Instead of patchwork fixes and manual workarounds, organizations gain standardized fields, validated and complete records, consistent country and entity codes, traceable transformations, and full auditability across the data lifecycle. 


This foundation makes compliance programs genuinely resilient and regulator ready as well as improves efficiency. As Ibrahim noted during the interview, many high-profile enforcement failures aren’t necessarily technology failures at all. They’re data control failures. TD Bank was fined $3 billion and Starling Bank in the UK faced penalties of £28 million—just two examples of many other failings in screening and data quality controls, not the technology solutions themselves. 


Clean Once is Not Enough: Why Data Quality Must Be Continuous


One of the most important themes emerging across compliance programs today is this: data quality is not a one-time project. Data quality and enrichment play an important role in any ongoing risk and compliance control framework. 


Cleaning data just once and then allowing it to degrade over time creates a silent risk gap. Customer records evolve. Corporate structures change. Sanctions lists update. Trade routes shift. If your data isn’t continuously monitored and maintained, then yesterday’s “clean” data becomes today’s exposure. 


Ongoing data quality discipline requires:

 

  • Continuous monitoring of inbound customer, watchlist, sanctions, and other feeds 

  • Automated validation checks for missing or malformed data 

  • Ongoing standardization and normalization as formats change 

  • Detection of hidden or newly introduced data anomalies 

  • Periodic re-cleansing of legacy records to prevent drift 


Strong compliance programs treat data quality as a living process, not a historical milestone. 


What “AI-Ready Data” Really Means



The term “AI-ready data” has become something of a buzzword. But in practice, it represents genuine operational discipline. AI-ready data is accurate, with minimal errors or gaps. It’s also: 


  • complete, with key identifiers consistently present 

  • standardized, using consistent formats and codes across systems 

  • traceable, with clear lineage showing where each data element came from and how it was transformed  


And critically, it’s reusable at scale without constant manual intervention. 

If teams can’t explain where a data element originated or how it changed hands, they can’t confidently rely on the AI decisions built on top of it. Explainability starts with lineage. As data volumes continue to explode across the compliance landscape, understanding the provenance of your data becomes essential. 


The Data Explosion: Both Opportunity and Risk


The reality of modern compliance is that more data isn’t automatically better. Good data enables stronger analytics, better customer insights, reduced friction in processes, and faster, more effective investigations. Bad data, on the other hand, creates alert fatigue, increases regulatory exposure, drives operational inefficiency, and frustrates compliance teams. 


The 2025 Nasdaq survey revealed that 24.5% rated handling increased data volumes as “extremely challenging” compared to 17.3% in 2024. 


As data volumes continue to grow exponentially across the compliance landscape, the gap between well-governed and poorly governed programs widen rapidly. Organizations that have invested in strong data foundations find themselves able to scale effectively and extract genuine value from their growing data assets. Those that haven’t find themselves drowning in noise, struggling to separate signal from static, and increasingly exposed to both operational and regulatory risk. 


The Complexity of Trade-Heavy Environments 


Industries like shipping, trade finance, and cross-border payments introduce unique data challenges that can completely undermine screening effectiveness. Bills of lading, for example, often contain bundled company names—sometimes up to 18 or 20 entities in a single field—along with notations like “trading as” or “care of”, abbreviations, and free-text entries. Cargo and vessel identifiers are frequently embedded inconsistently across different fields. 


The results were startling: they uncovered more than 50 sanctioned entities the company had been unknowingly dealing with—hidden because of data structure challenges. 

Ibrahim shared a revealing example of FinScan’s work with one of the world’s largest shipping companies. Before conducting any screening, the team performed comprehensive data quality work, properly parsing and structuring the bundled entity names. The results were startling: they uncovered more than 50 sanctioned entities the company had been unknowingly dealing with—hidden because of data structure challenges.  


The lesson is clear: AML and sanctions screening technology alone isn’t enough. Organizations must first parse, normalize, and contextualize their data. A layered approach works best, starting with data quality checks, moving to parsing and normalization, then applying context-aware screening, and finally conducting risk-based reviews. This is fundamentally different from relying on one-size-fits-all matching percentages. 


Technology Matters—But Only After the Data Foundation Is Set


Modern screening technology can dramatically improve compliance outcomes. The right platforms can reduce false positives, surface genuine risk faster, improve investigator productivity, and scale globally. But as Ibrahim emphasized throughout the conversation, technology cannot compensate for broken inputs. 


Your data determines whether your screening platform functions as a safeguard or becomes a liability. The most sophisticated AI models and machine learning algorithms in the world will struggle—or worse, produce misleadingly confident results—if they’re trained on incomplete, inconsistent, or inaccurate data. This is why FinScan treats data preparation, normalization, and quality controls as primary components of the compliance workflow, not afterthoughts. 


The Path Forward 


The compliance landscape is evolving rapidly, with AI and automation technologies offering genuine promise for more effective, efficient risk management. But realizing that promise requires discipline and foundational work that can’t be shortcut. Organizations that invest in strong data quality frameworks today position themselves to leverage AI effectively tomorrow. 


For compliance executives navigating the unique challenges of the Middle East region—with its linguistic complexity, diverse naming conventions, and trade-intensive economy—the message is particularly relevant. Strong screening outcomes start with strong data foundations. Because AI and automation only work when they're built on something solid.  


Watch the full interview with Ibrahim Bennani Doubli for additional practical examples and real-world lessons from trade-heavy, high-risk environments.  



If you’d like to explore how FinScan approaches data quality and screening resilience, get in touch—we're always happy to continue the conversation. 

bottom of page