Middle East Enterprises Must Pair AI Ambition with Data Discipline

Scaling AI is less about data redesign. Without it, AI initiatives risk becoming expensive demonstrations rather than enduring capabilities.

Tasmia Ansari March 18, 2026

Topics

Artificial intelligence may be the most ambitious wave of enterprise technology in decades. Yet for many organizations, the real obstacle isn’t ambition or a lack of willingness to adapt. It’s something far more fundamental: data.

Across industries, executives continue to announce bold AI roadmaps with terms like customer-facing copilots, agentic workflows, and automated decision engines. Yet the majority of these initiatives remain stuck in pilot mode. According to widely cited 2025 MIT research, as many as 95% of AI pilots fail to reach production, often due to unfit or untrusted data.

Conversations with technology leaders across infrastructure, storage, governance, and streaming platforms reveal a common pattern: enterprises are trying to wrap sophisticated AI systems around crumbling data foundations. Until those foundations are rebuilt with architectural, operational, and cultural AI at scale, the AI at scale will remain elusive.

The Front-End Illusion

Confluent positions itself as a streaming backbone for enterprises seeking to operationalize AI. Peter Pugh Jones, the company’s EMEA Field CTO, describes a recurring scenario. Organizations invest heavily in AI at the “endpoint”—often customer-facing systems designed to impress. They build intelligent chat interfaces, personalized experiences, or automated service layers.

But beneath that polished surface lie fragmented, aging systems.

“It’s like wrapping something in beautiful paper,” Jones says, “without fixing what’s inside.”

The core issue is architectural. Legacy enterprises have accumulated systems through mergers, acquisitions, and years of incremental modernization. Data resides in disconnected silos. Governance is inconsistent. Metadata is incomplete. When leadership mandates “introduce AI,” technology teams often begin at the visible layer rather than the data layer.

The result is a pilot that works in isolation, but collapses under production demands.

Streaming architectures aim to solve this by intervening earlier in the data lifecycle. Once connected, data flows continuously rather than in periodic batches. AI models can observe patterns in real time, surface anomalies, and trigger operational responses. In this model, AI is not an add-on; it is embedded in the data stream itself.

Yet streaming infrastructure alone cannot compensate for poorly understood data sources.

Technical Debt and the Metadata Gap

Levent Ergin, Chief Strategist for Agentic AI, Regulatory Compliance & Sustainability at Informatica from Salesforce, argues that technical debt remains the most persistent structural barrier to scaling AI.

“Enterprises don’t fully understand what data they have, where it lives, or how it’s being used,” he says. Legacy systems frequently lack metadata and governance. That absence becomes critical under new regulatory regimes such as the EU AI Act and other regional frameworks, which require demonstrable controls over training data and model operations.

Without metadata, organizations cannot classify data risks or determine appropriate levels of oversight. In such an environment, scaling from pilot to production is not merely risky—it may be noncompliant.

Ergin draws a distinction between high-quality data and AI-ready data. For years, enterprises have invested in cleaning and standardizing datasets. But high-quality data that sits in silos or lacks governance remains unusable for AI systems that demand context and connectivity.

AI-ready data, he argues, must be contextual, connected, and governed from the outset. He advocates the formation of a “Safe AI Deployment Committee” and the creation of an AI agent controls data hub, structures designed to unify governance, metadata, and quality assurance before AI initiatives scale. In this framing, governance is not an afterthought. It is the prerequisite for trust.

Fragmentation, Sovereignty, and Cost Pressures

Markus Grau, Enterprise Architect at Everpure, sees fragmentation as the defining challenge for today’s enterprise data landscape.

Data exists across silos with unclear ownership. Identifying where critical information resides—and who is legally permitted to access it—can itself be a multi-month effort. Regulatory complexity compounds the issue, particularly across Europe, the Middle East, and Asia-Pacific, where stringent cross-border data sovereignty requirements. For example, personal data may be required to remain within a specific jurisdiction.

Transforming or relocating that data for AI workloads can trigger compliance concerns. Meanwhile, moving data to public clouds incurs egress costs that can quickly escalate. Enterprises face not only technical constraints but also financial and geopolitical ones.

Redundancy introduces further inefficiency. Copies of the same dataset may exist in multiple systems, often without clear lineage. Eliminating duplication is not trivial; it requires architectural redesign and cultural change.

Mature AI adopters, Grau notes, have already confronted these issues. They deploy mechanisms such as zero-copy snapshots to avoid physically duplicating large datasets. Emerging adopters, by contrast, often start in public cloud environments with generic infrastructure, only later confronting its limitations. Scaling AI, in this context, becomes a learning curve.

Where Operations and Analytics Converge

A recurring theme across interviews is the historical separation between operational and analytical systems. Operational systems keep the business running—processing transactions, handling workflows, responding to events. Analytical systems analyze historical data, build models, and generate insights. AI collapses this divide.

Real-time inference requires operational systems to consume analytical outputs in real time. Fraud detection, for example, cannot remain a retrospective analysis; it must trigger intervention at the moment of transaction.

Streaming architectures enable this convergence by feeding analytical models with continuous data while returning predictions directly into operational flows.

Yet misuse of AI remains common. Jones observes excessive reliance on agentic AI in scenarios where traditional machine learning suffices. Fraud detection has been a machine learning discipline for decades. Deploying heavyweight agentic systems to replicate established capabilities may add complexity without value.

A more effective model may involve layered intelligence: lightweight models detect anomalies; agentic systems orchestrate responses; streaming platforms manage real-time flow. Composite architectures, rather than singular AI trends, appear more sustainable.

The Infrastructure Bottleneck

Data readiness is only one dimension. Infrastructure readiness is another.

Data lakes were originally designed as inexpensive storage repositories. Built on hard disk technology, they prioritized capacity over performance. They were not architected for constant inferencing workloads, nor for GPUs that require high-throughput data pipelines.

Grau argues that work characteristics have fundamentally shifted. AI systems access data continuously, often in parallel, feeding GPUs that must remain busy to justify their cost. Idle GPUs represent not only financial waste but energy inefficiency—a growing concern as AI workloads strain regional power grids.

Energy availability has become a limiting factor in certain geographies. Power plants cannot be constructed quickly enough to satisfy AI-scale demands. Infrastructure modernization is therefore not merely a cost-optimization exercise but a matter of capacity constraints.

Hardware evolution compounds uncertainty. GPUs are refreshed annually, often delivering incremental performance gains. Meanwhile, software stack improvements can yield tenfold efficiency improvements on existing hardware. Organizations must determine whether to chase hardware cycles or extract value from software optimization.

Increasingly, enterprises are shifting from capital expenditure models to flexible service models, adapting bandwidth and storage dynamically rather than committing to fixed five-year depreciation cycles. In an AI landscape characterized by rapid change, flexibility mitigates strategic risk.

Digital Natives vs Legacy Enterprises

Digital-native companies enjoy an advantage: minimal legacy systems. Integration is simpler. Architecture is more coherent.

However, they lack historical data. Without rich internal datasets, they may rely more heavily on generative AI to supplement knowledge. This introduces risk. Models trained on broad internet data may produce plausible yet inaccurate outputs. Legacy enterprises, conversely, possess extensive datasets but struggle with integration and governance.

Success for established firms often emerges not from sweeping transformation but from targeted interventions. Rather than attempting to revolutionize the customer interface immediately, organizations can identify smaller pain points, trace them back to source systems, and modernize data pipelines incrementally.

Such incremental wins build organizational trust and demonstrate ROI—both critical for sustaining long-term AI investment.

Technical modernization alone is not enough. Grau points out that historically, operational teams have guarded their data. In the manufacturing and automotive sectors, engineers viewed data sharing as a threat, fearing that automation could replace jobs or that errors in AI systems might lead to accountability risks.

Over time, cultural resistance has softened as data-driven strategies become normalized. But trust remains central. Enterprises must demonstrate that AI initiatives use data responsibly and for legitimate business outcomes. The ability of employees to understand and interpret data remains uneven. Without literacy, even well-architected AI systems can be misused.

Garbage In, Amplified Out

The principle of “garbage in, garbage out” predates AI. But AI amplifies the stakes.

Ergin warns of a “mirage of misinformation.” Since generative models can produce coherent outputs even when trained on flawed data, errors may appear authoritative. Insurers have begun offering policies against data poisoning risks—a sign of emerging financial exposure.

Improving model sophistication does not compensate for weak data foundations. The solution lies in strengthening governance, context, and verification processes before AI systems are trained or deployed.

Manual data curation remains costly and labor-intensive. Despite advances in automation, there is no fully reliable method for automatically cleansing and contextualizing enterprise data at scale. Modern architectures increasingly move beyond simple data lakes toward lakehouse models that integrate structured, unstructured, video, and audio data. AI demands multimodal inputs. The volume and diversity of data expand exponentially. But expanding scope magnifies existing problems: silos grow larger, governance gaps widen, and duplication proliferates.

Streaming approaches propose an alternative: treat data as a product, curated and managed by domain teams, flowing continuously through shared infrastructure. Organizational change is required. Teams must redefine ownership and collaboration models. In this view, AI readiness becomes a byproduct of disciplined data operations.

The enthusiasm surrounding generative AI has created urgency (sometimes impatience). But scaling AI is less about model selection and more about systemic redesign. Without the recommended foundations, AI initiatives risk becoming expensive demonstrations rather than enduring capabilities.

The trajectory of the market suggests a convergence toward streaming architectures, composite AI systems, governed data hubs, and flexible infrastructure models. Whether organizations adopt these systematically or continue layering AI atop unstable foundations will determine who moves beyond pilot purgatory.

Topics

About the Author

Tags:

Topics

Share