Beyond the Pilot: Building Scalable Agentic AI That Delivers

Experts discuss how strategic alignment, modular architecture, real-time data, and strong governance are key to unlocking the scalable and responsible deployment of Agentic AI.

Liji Varghese August 13, 2025 Reading Time: 16 min

Topics

[Image source: Chetan Jha/MITSMR Middle East]

When a supply chain disruption unfolds across three continents, or a government agency faces a sudden surge in citizen service requests, even the most advanced AI systems often stall, waiting for human operators to reconfigure workflows, retrain models, or reroute data. Traditional automation was never built for volatility at this scale.

Agentic AI, though still in its early stages, promises to change that. These systems, often working in real time, can interpret shifting conditions, generate multi-step strategies, and coordinate across diverse data sources, tools, and human teams.

These adaptive capabilities are especially critical in a world where governments and enterprises are pursuing bold AI-native visions under tight timelines, and where the operational stakes in sectors like public safety, healthcare, and logistics can be measured in lives, livelihoods, and billions of dollars.

However, moving from pilot projects to enterprise-wide or government-scale deployment is not an easy task. According to a Gartner report, projects that by the end of 2027, more than 40% of agentic AI initiatives will be abandoned before full deployment. The stats highlight how often ambition collides with operational reality.

For many organizations, the challenge goes far beyond adding more computing power or deploying larger models. It requires re-engineering AI architectures from the ground up to manage complexity, enable autonomy, and embed trust at scale.

To understand how to avoid common pitfalls and design for scale from day one, Liji Varghese, Editor, MIT Sloan Management Review Middle East, spoke with leading AI executives in the region.

Foundational Design Principles for Building Scalable Agentic AI

When exploring agentic AI, the design choices made at the outset determine whether these deployments will remain proof-of-concepts or mature into scalable, high-impact platforms.

Start with Strategic Fit, Not Technology

For Sumeet Agrawal, VP of Product Management at Informatica, the starting point is not architecture, but intent. “Much like you wouldn’t open a new office without understanding the market, regulations, and ROI, you shouldn’t deploy agentic AI without a strategic assessment of where it adds value.”

His approach:

Ask the right questions. Where are the friction points? Where does human error creep in? And what really needs scaling? Only then can you identify if an AI agent is the right fit.
Be ruthless about simplicity. Not every challenge needs a powerful LLM-powered solution. If a task can be handled by a simple rules engine or RPA bot, use that instead. Think of agentic AI like a sports car. Sure, it’s impressive, but not ideal for every journey. Use it where ambiguity, context, or scale make simpler automation fall short.
Build with KPIs in mind from day one. AI projects that don’t tie back to measurable outcomes are the first to be cut. Whether it’s reduced handling time, improved decision accuracy, or lower operational risk, define metrics that align with business impact. For example, you might track automation rates and CSAT scores in tandem in customer service workflows.

Build on Data, Modularity, and Trust

Once you’ve asked the right questions and established the intent, the next step is building the right foundation to execute it. According to Haider Aziz, General Manager – Middle East, Turkey and Africa at VAST Data, there are three core considerations:

You need a data architecture that supports real-time access and reasoning. Agents can’t make intelligent decisions if they’re working off stale, fragmented data.
The system has to be modular and stateless wherever possible. You want to be able to update models, change logic, or introduce new agents without re-architecting everything.
Trust matters. Especially in public sector ecosystems, you need built-in explainability, access controls, and traceability. These aren’t bolt-on features; they need to be baked into the design from day one.

“With these three foundations in place,” Aziz notes, “you have a solid platform to scale from.”

Adopt Structured, Reusable Architectures

Once the strategic fit is defined and the foundational building blocks—data, modularity, and trust—are in place, the next challenge is scaling without chaos. This is where architecture matters.

“As organizations scale AI-powered applications, they face a choice between building each one from scratch, creating a complex and hard-to-manage web of custom software, or adopting a more structured approach,” says Kurt Muehmel, Head of AI Strategy at Dataiku.

The LLM Mesh is one such paradigm.

Core Components of LLM Mesh Explained

The Catalog is the system of record for versioning, documenting, and managing the dependencies of all agents and their components. It’s where agents are “born” and registered.
The Gateway is the runtime engine that orchestrates the agents, enforces their permissions, and manages their interactions concurrently.
The Federated Services for cost, performance, safety, and security are the built-in monitoring and control systems that oversee the agents throughout their operational life. They provide the dashboards and alerts needed to understand how the fleet of agents is performing and to trigger manual or automated interventions when issues arise.

This mesh is grounded in three principles:

Modular Construction via a Central Catalog: Instead of building monolithic applications, the system is composed of standardized, reusable “objects” — such as agents, tools, and prompts — that are registered in a central catalog. This Lego-like approach prevents teams from reinventing the wheel and allows complex applications assembled from proven components, essential for scaling development across an organization.

A Unified Abstraction Layer (Gateway): All interactions between objects (e.g., an agent calling a tool or an LLM service) are routed through a single, intelligent gateway. This layer abstracts away the specific implementation details of the underlying services, so developers can work with a consistent interface. This decoupling is critical for maintainability and allows the organization to swap out models or tools without breaking applications.

Federated Governance Services: Cross-cutting concerns like security, cost management, performance monitoring, and content safety are implemented as centralized, “federated” services automatically applied at the gateway. This ensures that every agentic application across the enterprise adheres to the same governance standards, making it possible to scale operations without sacrificing control or compliance.

Architect for Cross-Domain Intelligence

With structured, reusable architectures in place, the next frontier is enabling agents to reason and act across domains, not just within isolated silos.

For Sharif Berdi, Senior Director – Data & AI Engineering at Inception (a G42 company), agentic AI’s power lies in orchestrating decisions across business domains. “The true promise is high-impact, multi-step problem-solving that coordinates across tools, objectives, and contexts — with reasoning, foresight, and adaptation.”

Inception’s enterprise AI platform embeds autonomous agents into procurement, process automation, productivity, and customer experience, enabling:

Real-time, empathetic responses through sentiment learning.
Transformation from reactive automation to proactive, context-aware solutions.

Berdi cautions against unrestricted autonomy. In high-stakes sectors such as healthcare or finance, a tiered autonomy model — agents handling low-risk operational work while humans oversee strategic decisions — ensures efficiency and safety.

Ensuring Enterprise-Grade Interoperability

Even the most intelligent and well-architected agentic AI systems will fail if they can’t work with the technology stack that already exists.

“Most organizations don’t have the luxury of starting from scratch; they’ve got older systems, tools running across environments that might be on-prem, and using different clouds, and for sure, lots of complexity. So the goal isn’t to replace everything, it’s to make sure new, intelligent systems can work with what’s already there,” says Aziz.

A good starting point is making sure your AI systems are built with open APIs; this helps plug into older applications, data sources, or monitoring tools without a lot of custom work. Second, you want your data platform to act like a common foundation, what some call a “substrate”, that gives agents a consistent view of information across environments. That way, even if data lives in different places, it can still be accessed and understood in one place, which is vital. And finally, whatever you build needs to work with existing identity and access systems; the tools your organization already uses to control who sees what. That helps avoid unnecessary risk or rework, and keeps things secure from the beginning, which today is imperative.

At Inception, Berdi says, the products are designed to integrate seamlessly with complex legacy systems and protocols through open APIs, modular orchestration, and flexible deployment models. Whether it’s SharePoint, SAP, Salesforce, or domain-specific legacy systems and databases, the agents are built to speak the language of the enterprise across formats, endpoints, and protocols.

The architecture supports public and private LLMs and runs on Kubernetes for hybrid and multi-cloud environments. This allows agents to execute across cloud-native, on-premise, and edge deployments without rewriting core logic. A standout feature of Inception’s approach is that the tool repository and backend services enable easy wrapping of third-party applications and internal APIs into callable services for agents, allowing autonomous agents to interact with outdated systems as easily as with modern SaaS.

Muehmel offers a more abstracted approach through the LLM Mesh architecture. In this model, any external system — a legacy mainframe database, a modern SaaS API, or a service running in a different cloud — is treated as a Tool Object. For an agent to use it, the tool only needs a simple schema registered in the central catalog that describes what it does and how to interact with it.

The LLM Mesh gateway then acts as a universal adapter. It handles the specific technical handshakes, protocol translations, and authentication required to connect to that system. This decouples the agent’s logic from the underlying infrastructure. For example, an agent’s request to get customer data is the same whether the data resides in an on-premises Oracle database or a cloud-based Salesforce instance. The LLM Mesh handles the translation, making the system inherently interoperable and cloud-agnostic.

Agrawal warns against getting locked into rigid systems that can’t evolve. “To have a hope that your AI systems won’t buckle under the weight of legacy infrastructure or multi-cloud complexity, you need flexibility from the start,” he says.

Because this is such a rapidly evolving space, design your architecture with modularity in mind. Each agent, tool, or data connector should be swappable and independently upgradable. That way, you’re never locked into a monolithic system that can’t evolve with your business. Closely tied to this independence is the use of open standards and APIs. This will ensure your AI agents can plug into whatever systems you’ve already got. The emerging Model Context Protocol (MCP) is gaining traction as a universal translator for AI agents, allowing them to interact consistently across tools and environments.

Finally, stay cloud-agnostic. With most hyperscalers now operating local data centres in the Middle East, businesses here have more flexibility than ever. But only if they avoid vendor lock-in from the outset. So, whether you’re running in AWS, Azure, or on-premise, design with portability in mind. Containers (like Docker) and orchestration tools (like Kubernetes) make it easier to move workloads around based on cost, performance, or regulatory needs.

Tackling the Biggest Bottlenecks in Scaling Agentic AI

Building interoperable architectures is just the start. Deploying agentic AI at scale also brings unique operational challenges around real-time data ingestion, agent orchestration, and system performance.

Top of the list? Data readiness.

A CDO report found that 43% of businesses cite data quality and completeness as their biggest obstacle in AI deployment. And it’s no wonder, as 97% of data leaders report issues like missing inputs, licensing headaches, or even accidental use of sensitive data.

“These aren’t edge cases; they’re day-to-day blockers,” adds Agrawal. Imagine trying to train a supply chain optimization agent when half your data lives in spreadsheets and the rest is siloed across five platforms.

Now, let’s say one has the data. “One of the biggest challenges is the assumption that if you’ve got data, you’re ready for AI,” says Aziz.

In practice, the real problem is getting data into a form that agents can use, and fast enough. A lot of traditional systems are designed to store data, not to serve it up in real time. So when agents need to make split-second decisions, they end up waiting, or acting on out-of-date information.

Another common challenge Aziz points to is managing lots of agents at once. “It’s one thing to have one agent doing a task. It’s something else entirely to have hundreds, each working on a slightly different problem, coordinating with one another, and learning as they go. This might sound like a distant issue for some, but the pace of change is such that soon, it will be a reality. To handle that well, you need infrastructure that supports fast, live access to data, and a way to manage all those agents like a conductor managing an orchestra; otherwise, things can fall out of sync quickly.”

Berdi echoes this, pointing to data fragmentation and the lack of coordinated intelligence across domains. The goal should be to build coordination systems that manage agent behavior and ensure they work together to serve the organization’s objectives.

For example, Inception’s AI-powered procurement solutions do more than automate contract workflows. They identify high-performing, sustainable suppliers, accelerate sourcing-to-award cycles, ensure compliance, and drive measurable savings. Meanwhile, AI agents in productivity and process automation empower teams to deploy no-code AI agents that coordinate workflows, and surface knowledge make intelligent decisions often faster, more accurately, and at greater scale than human-led systems.

When it comes to orchestration complexity, the challenge doesn’t just lie at the operational level. The design element is also equally important.

Agentic workflows are rarely single-shot requests. They are complex chains of thought involving multiple, often dependent, calls to LLMs, tools, and retrieval systems. According to Muehmel, “In a monolithic architecture, managing these intricate dependencies, error handling, and state becomes exponentially more difficult as the number of applications grows. This creates a “complexity threshold” where development grinds to a halt because the cognitive overhead of maintaining the system is too high.”

The LLM mesh addresses this by making orchestration more explicit and manageable. By breaking down workflows into modular agents and tools, the logic becomes easier to trace, debug, and reuse. However, the primary performance bottleneck remains the latency inherent in these multi-step chains. Mitigating this requires architectural solutions like parallelizing tool calls, where possible, and using the smallest, fastest models suitable for each task in the chain.

Another issue Agrawal points to is performance monitoring. Because AI decisions aren’t deterministic, traditional metrics often fall short. You need new KPIs like hallucination rate, escalation frequency, or task alignment with business goals.

Finally, cost is a silent killer. AI agents making constant calls to LLMs can rack up usage fees fast. Without FinOps practices in place, that scaling becomes financially unsustainable.

Resilience, Continuous Learning, and Retraining

Nothing stays still. Your data changes, regulations change, and sometimes even your goals change. So, the system has to be flexible by design.

A good analogy here is building an AI system like you would a city: it needs strong infrastructure, smart governance, and the ability to grow with its people. Resilience, in this case, isn’t just about surviving outages. It’s about adapting to change, learning from experience, and doing it all without falling apart.

“So, start with solid engineering foundations,” says Agrawal. Just like no one wants a city with power cuts every week, your AI systems need guardrails: failover mechanisms, alerts when something breaks, and detailed logs so you’re not flying blind. It’s the basic hygiene that keeps things running.

Aziz emphasizes keeping an agent’s logic, memory, and decision rules separate. If you need to retrain the model or change how an agent behaves, you don’t have to rebuild the whole thing. In practice, this flexibility extends to “shadow modes” or sandboxes, where new versions of agents can run quietly in the background, learning and being tested before full deployment.

Muehmel goes on to explain how the LLM Mesh architecture is designed for adaptation and resilience through the following mechanisms:

Resilience through Modularity: The decoupled nature of the Mesh ensures that the failure of one component does not cascade and bring down the entire system. If a specific tool or LLM service goes offline, the gateway can reroute requests to a backup, or the agent can be programmed to try an alternative tool. This is far more resilient than a monolithic application where a single broken API call could cause the whole system to fail.
Continuous Learning via Human-in-the-Loop Feedback: The architecture is built to capture human feedback as a primary source of learning. For example, when a human expert corrects a report drafted by an agent, those corrections are not discarded. They are logged in a structured format and become a high-quality “golden dataset.” This data is invaluable for evaluating agent performance and identifying areas for improvement.
Agent Retraining and Fine-Tuning: The collected feedback enables a continuous improvement cycle. This data can be used to systematically fine-tune the agents’ underlying LLMs (e.g., retraining an agent to better match an expert’s writing style) or to refine the prompts and logic that guide their behavior. This process ensures agents adapt and improve based on real-world expert interaction.

Here’s where it gets interesting. These agents learn on the job. Every success, every failure, even the moments where they shrug and escalate to a human — that’s all data. And with the right feedback loops, it becomes fuel for improvement. Maybe you tweak a prompt, adjust a policy, or retrain a model.

“The key is to keep that loop short and actionable,” says Agarwal.

Finally, adaptability means avoiding lock-in. Technology moves fast, and an architecture welded to a single model or framework will age poorly. Modular, interchangeable components let you plug in better tools as they emerge, ensuring your AI gets smarter without starting from scratch.

Securing and Governing Autonomous Agents

As AI agents become more autonomous, security, compliance, and governance move from “best practices” to “non-negotiables.”

“It’s important to state up front that autonomous doesn’t mean unchecked,” says Aziz. “Trust starts with access control. Agents should only see the data they truly need — nothing more.” That can mean role-based access, row-level database restrictions, or strict segregation between agents operating on public data and those working with sensitive internal datasets. Just as important is transparent logging, so technical teams, auditors, regulators, and business leaders can understand the ‘why’ behind a decision.

Berdi stresses that enterprises must be able to trust agents to act reliably, transparently, and in alignment with business values. This means embedding dynamic guardrails into agentic systems, ensuring that when confidence is low, ambiguity is high, or legal and ethical boundaries are at risk, the system knows when to pause, escalate, or defer to human judgment.

Muehmel describes how the LLM Mesh embeds governance directly into the architecture, moving from a model of trust to one of provable enforcement. Every action and data flow passes through a centralized, LLM-aware secure gateway that inspects network traffic and model prompts/responses. Attribute-Based Access Control (ABAC) enforces least privilege at the object level, federated identity ties every action to a specific human or agent, and immutable audit trails preserve an end-to-end record. Automated safety layers, like PII filters, prevent sensitive data from leaking to unauthorized tools.

For Agrawal, governance also means context and supervision. “Think of each agent as a digital employee,” he says. “They need a job description, boundaries, and escalation rules. For high-stakes work, human-in-the-loop isn’t optional.” He also advises proactive stress-testing — red-team exercises to uncover leaks, misuse, or policy violations before external actors do.

Managing the Lifecycle of Many Agents at Scale

Running one or two agents is simple; orchestrating dozens or hundreds is another challenge . “If you’re going to run lots of agents at once, you need a smart system behind the scenes — like air traffic control,” says Aziz. That includes tracking what each agent is working on, managing shared memory for collaboration, and pausing or redirecting work as needed.

This is where new classes of tools are emerging. For Muehmel, the LLM Mesh is the operational backbone, bringing MLOps-style rigor to agentic AI. The catalog serves as the system of record for agent components, while the gateway orchestrates and enforces permissions as outlined in the article earlier. Federated services monitor cost, performance, safety, and security across the fleet. In essence, instead of relying on a patchwork of external tools, the LLM Mesh provides an integrated, holistic framework designed specifically for developing, deploying, and governing a large and complex ecosystem of enterprise agents.

Agrawal’s team uses their Intelligent Data Management Cloud (IDMC) to ground agents in trusted metadata and orchestrate diverse components — from different model providers to on-prem legacy systems — through an integrated iPaaS layer. Low-code/no-code interfaces allow business teams to build and refine agents themselves, removing bottlenecks and accelerating innovation.

Like a city under development, agentic AI requires foundations that can bear weight, systems that coordinate at scale, and governance that keeps everything running safely and fairly.

On paper, we have much of what’s needed: architectural principles, design frameworks, and growing investment. However, whether these ideas hold up in the messy, real world — across legacy systems, shifting data, and high-stakes decisions — remains to be seen.

Topics

About the Author

Liji Varghese is the former Editor for the MIT Sloan Management Review Middle East.
View More

Tags:

Topics

Share