Agents of Change: The Rise of Autonomous AI -

Topics

What to Read Next

In the inaugural episode of More than Meets the AI (Season 2), BCG X's Dr. Leonid Zhukov and MIT Professor Tim Kraska unpack what being “agentic” truly means and why LLMs serve as the “brains” of these systems.

As artificial intelligence evolves from predictive systems to technologies capable of autonomous action, businesses are beginning to experiment with AI agents that can plan, execute tasks, and interact with complex workflows. But how ready are organizations to deploy these systems at scale, and what safeguards are needed to ensure they are reliable, secure, and effective?

“AI agents are programs that execute user requests. They have some autonomy, but cannot set their own goals,” says Dr. Leonid Zhukov, Partner and Vice President at BCG X, highlighting the practical limits of today’s agentic systems.

“These agents cannot replace humans at the current stage, but they can make people far more productive,” adds Tim Kraska, Associate Professor at MIT and Co-Director of the Data and AI Lab.

In the inaugural episode of More than Meets the AI (Season 2)—a web series tailored for top executives and decision-makers in the Middle East, in collaboration with Boston Consulting Group (BCG)—Dr. Zhukov and Prof. Kraska explore agentic AI, examining its evolution, the challenges of enterprise deployment, and the future of human–AI collaboration.

TRANSCRIPT

Suraya Turk: Welcome to Season 2 of More than Meets the AI. This web series is a collaboration between MIT Sloan Management Review Middle East and BCG. Last season, we explored AI and how it is reshaping business, technology, and society.

In this season, we look at the next frontier—systems that move from prediction to autonomous action.

Today, we are joined by Dr. Leonid Zhukov, Director of the BCG X AI Science Institute, and Professor Tim Kraska, Associate Professor at MIT and Co-Director of the Data and AI Lab.

Let’s start with you, Dr. Zhukov. AI is generating so much buzz and hype these days, particularly around agentic AI. What makes agentic AI different from predictive AI, and what does that really mean?

Dr. Leonid Zhukov: Well, happy to be here. You know, “agentic” is a very loaded word. In psychology, being agentic usually means having the ability to set up your own goals and execute on them.

Now, the good news is that within software engineering and the AI world, we’re talking about a much narrower definition. Usually, we’re talking about software programs that can take user requests and execute upon them.

In order to do that, first of all, the software or the program needs to understand the request—what the user actually wants. Then it needs to make a plan, or a sequence of actions that it needs to do. And then it actually needs to execute on that, which means either getting the right data, finding the right answer, or calling external tools to actually perform that action.

So those are the steps. In general, there is the “brain” part, if you want, and the “hands” part of agents. The brain part today is large language models that understand user requests and set up the plan of execution. The hands can be external software, access to databases, and so on.

Ultimately, AI agents are programs that execute the requests that users give them. They do have some autonomy, but they cannot set up their own goals.

Suraya Turk: How much can we rely on agentic AI today? And what needs to happen at the infrastructure and architecture level to make it safe, reliable, and enterprise-ready?

Tim Kraska: Right now, the areas where AI agents are most successful are those where mistakes are acceptable. This includes internal deployments, where agents help employees find information, draft documents, or handle HR-related questions. There’s usually some skepticism, and people don’t fully trust every output, but the tools are still very useful.

The more challenging cases are those where mistakes are not acceptable. Some customer-facing use cases still tolerate errors—like tech support chats, where even human agents rarely get everything right on the first try. But other customer-facing scenarios are much harder. For example, booking a trip: if someone asks for “London,” you definitely don’t want the system booking a ticket to the wrong London somewhere else in the world.

These high-stakes scenarios are where the research community is heavily focused. There’s work on automatic reasoning and verification, guardrails that prevent agents from taking certain actions, and provable constraints on what an agent is allowed to do. All of this is aimed at making systems safer and more reliable. Guardrails and safety are just one area—there’s still a lot more work to be done.

Suraya Turk: That’s interesting. You mentioned that agentic AI is better suited for certain decisions and processes rather than everything. What about the idea of a “super agent”—one agent that can do everything, wear many hats, and make decisions across many domains? Is that something we’re moving toward? And is it even a good idea?

Dr. Leonid Zhukov: Let’s see if we agree on this. The idea of a “super agent”, almost like a James Bond of AI, sounds exciting, but I’m not sure it’s really needed.

If you think about how work is done today, it’s highly specialized. We have doctors, lawyers, camera operators, interviewers—each with deep expertise in a specific area. Generative AI is most useful when applied within these vertical domains.

For example, an agent designed for customer service doesn’t need to know anything about medicine. In fact, knowing too much outside its domain could hurt performance by introducing confusion.

From a systems perspective, scalable software usually relies on distributed systems, many smaller, specialized components that work together. Each can be tuned and optimized independently, and collectively they tend to be more stable than a single monolithic system.

So I expect the future to involve many specialized agents, with a strong focus on how they interact, share knowledge, and reach conclusions together, rather than one all-knowing super agent.

Tim Kraska: I think it helps to distinguish between agents and foundational models. Agents themselves will likely become more specialized because everything depends on context.

Imagine a future AI agent acting as your personal doctor. It would need access to your medical history, X-rays, test results, and other systems. It would also need to be personalized, learning from past interactions and tailored to you as an individual.

Foundational models are a different story. There’s a strong argument that very large models provide the best general understanding and reasoning. These models could then be customized or distilled into smaller, more efficient versions for specific agents, driven by latency, cost, or domain needs.

That said, this could change. If hardware costs drop significantly and latency becomes less of an issue, we might see a world where inference runs locally, even on phones. Ten years from now, things could look very different.

Dr. Leonid Zhukov: I agree. If large language models are the “brains” of agents, then today the smartest brains are the largest models.

These models have two components: reasoning capability and stored information. So far, the largest models demonstrate the strongest reasoning. But this raises other important questions: data security, privacy, and sovereignty. Do you want your systems managed locally? Do you want to rely on models developed elsewhere?

There are many open questions, but at the moment, the most capable reasoning still comes from the largest models.

Suraya Turk: I’d like to explore the role of AI in designing and organizing workflows. How important is that capability?

Dr. Leonid Zhukov: There are two sides to this. One approach is to use agents to replicate existing workflows, essentially training them to do what humans already do, either alongside people or eventually independently. That’s a natural first step.

But history tells us that this approach rarely delivers the biggest benefits. When electricity was first introduced into factories, manufacturers simply replaced steam engines with electric motors, without changing how factories were designed. The real gains only came when they rethought the entire production process—using many small motors and redesigning workflows.

The same will likely happen with AI agents. The real value will come when workflows are redesigned around what agents are good at, rather than forcing agents into existing human-centric processes.

Tim Kraska: I’ll share a personal example. I’ve been working on a side project to build my own personal AI assistant. My dream is to have an agent that talks to me during my commute and handles all my boring tasks.

I started with expense reimbursements. The agent fills out the forms, but I still had to collect receipts, convert them to PDFs, search my email, and so on. I began writing scripts to automate each step, converting Uber and Lyft receipts, prioritizing emails, and processing photos from my phone. I then added an MCP server so the agent could call these functions directly.

At first, progress was incredibly fast. But over time, the system became unmaintainable. Using vibe coding, the software turned into a mess. Function names changed, the agent stopped calling the right tools, and eventually the whole thing broke.

This experience connects to an MIT project where we’re rethinking how to build complex systems with LLMs. The idea is to define workflows as design documents, which then become the source of executable code, potentially with an agent interface on top. But we’re not there yet; there’s still a lot of work to do.

Dr. Leonid Zhukov: By the way, it seems professors and consultants suffer from the same pain points.

Suraya Turk: That’s a very interesting point. It really shows how much is still evolving.

Let’s turn to organizations. As agentic AI is deployed, likely in customized ways, humans will still need to work alongside these systems. How much human input will be replaced? And how essential will human involvement remain?

Tim Kraska: At the current stage, AI agents cannot replace humans. They are fantastic productivity tools that make people far more efficient across many domains: report writing, coding, HR screening, customer support, and more.

Over time, roles will change. Entry-level jobs won’t disappear, but they will fundamentally evolve. A single person will be able to do much more. Historically, we’ve seen this with the Industrial Revolution and the Information Age, both of which were ultimately net positives.

I’m very optimistic. Humans won’t be replaced, and I don’t see any Terminator-style scenarios anytime soon.

Dr. Leonid Zhukov: I agree. Generative AI is a tool, not a colleague. Jobs aren’t just collections of tasks—they involve transitions, judgment, and deciding what to do next.

AI can handle individual tasks and assist people, but it’s still far from managing entire jobs or complex sequences of tasks. The idea of mixed teams of humans and agents is compelling, but for now, it remains more of a concept than a reality.

Tim Kraska: And it takes time. Think about self-driving cars. The technology existed for years before we saw real-world deployments like autonomous taxis. Replacing entire jobs is even more complex than that.

Dr. Leonid Zhukov: There’s also a tendency to treat AI as something magical. But it’s just another technology. You build the technology, create products on top of it, and then go through adoption phases.

We are still in the first phase—technology building. The vision of AI as a coworker is compelling, but it will take time to get there.

Suraya Turk: That brings us to risks. What are the major risks of this technology, and how is research addressing them?

Tim Kraska: There are many types of risk. One of the most concerning is content generation. It’s becoming increasingly difficult to tell whether images, and soon videos, are real or AI-generated. This creates serious risks around misinformation.

There’s active research on both sides: improving generation quality and building detectors to identify generated content. It’s an ongoing arms race.

In enterprise settings, the risks are different. The main concerns are correctness, verifiability, and hallucinations. Research is focused on reasoning, verification, and preventing self-reinforcing errors. The research effort is critical and highly active across many fronts.

Dr. Leonid Zhukov: I agree, but I think the biggest risk right now is misunderstanding AI’s capabilities. With generative AI, it’s incredibly easy to build impressive demos in a few hours.

The problem arises when senior management assumes those demos represent production-ready systems. When reality doesn’t match expectations, disappointment sets in, and we risk another AI winter driven by hype rather than substance.

AI capabilities are advancing quickly, but we need patience and a clear understanding of current limitations. Listening to researchers and scientists is crucial.

Suraya Turk: That’s especially relevant for CEOs. Many believe they’re deploying agentic AI, but deployment is only the beginning. What advice would you give to leaders who are experimenting—or have stalled?

Tim Kraska: First, give employees access to the latest models and tools, especially for searching internal information. Encourage experimentation. If someone wants to automate part of their job using scripts or vibe coding, let them try; it often increases both efficiency and job satisfaction.

Security concerns can be addressed with the right platforms and configurations.

Second, gather early feedback from users. Benchmarks are increasingly unreliable, since models are often trained on them. The only real way to learn is to deploy agents, see where they fail, and understand why.

Even failed projects are valuable because they generate real-world data about what works and what doesn’t. That experience is essential for building something that succeeds.

Dr. Leonid Zhukov: I’d add that it’s usually best to start with repetitive tasks. They’re predictable and easier to automate.

The danger lies in automating creative or complex tasks too early. These models are probabilistic and inherently unpredictable—they may produce different answers each time. You can’t always guarantee outcomes.

So be thoughtful about which tasks you automate, and keep experimenting.

Suraya Turk: That’s a great note to end on. Deployment is only the start—it requires continuous development and thoughtful integration across the organization.

Thank you both for your insights. This has been a fascinating discussion, and the collaboration between you has been truly exceptional.

Topics

About the Author

Tags:

AI Autonomous AI BCG

Topics

What to Read Next

TRANSCRIPT

Topics

About the Author

Tags:

More Like This