Why Businesses Need Better KPIs Before Reshaping the Workforce for AI

The real risk isn't measuring productivity incorrectly—it's making workforce decisions based on those flawed measures.

Tasmia Ansari 14 hours ago

Topics

[Image: Chetan Jha/MITSMR Middle East]

Key Takeaways

AI is exposing a measurement crisis as organizations deploy AI faster than they can redraw the KPIs used to evaluate performance, creating a growing disconnect between managers and practitioners.

The most valuable work is becoming harder to quantify because updated tasks like judgment, validation, prioritization, and decision-making leave a much smaller digital footprint than traditional outputs.

Many organizations link AI adoption to hiring and layoffs during restructuring before establishing reliable AI productivity measures, risking role cuts based on outdated assumptions.

The developers were convinced AI had made them faster. The data showed it had done the opposite. In a randomized trial by researchers at the nonprofit AI safety research group Model Evaluation & Threat Research (METR), experienced software developers expected AI coding assistants to boost productivity by 24%. Still, they reported being 20% more productive after the experiment. In reality, they took 19% longer to complete their tasks.

The reason was not that AI had failed. Rather, the nature of the work had changed. Developers spent significant time reworking AI-generated outputs instead of producing everything themselves.

The finding should concern more than just software teams.

Across boardrooms, executives are under growing pressure to demonstrate returns on billions of dollars invested in AI systems. Boards want evidence that AI is improving productivity. Investors want proof that efficiency gains will eventually expand margins. Leaders are making decisions about hiring and restructuring based on assumptions that AI is fundamentally changing how work gets done.

Yet many organizations are struggling to answer a surprisingly basic question: how do you measure productivity when machines increasingly generate the output?

The uncertainty is becoming too visible to ignore. A survey of nearly 6,000 executives across the Americas, the UK, Germany, and Australia found that almost 90% reported no measurable impact of AI on productivity or employment over the previous three years, despite widespread deployment across their organizations.

“For 30 years, productivity meant output per person. The only measurements were how much you produced, how fast you produced it, and how many tickets you closed. That definition is now worthless.”

— Jessica Constantinidis, Innovation Officer – EMEA at ServiceNow

The disconnect has echoes of one of technology’s most enduring economic puzzles. In 1987, Nobel Prize-winning economist Robert Solow observed: “You can see the computer age everywhere but in the productivity statistics.” Decades later, economists studying the digital revolution reached a similar conclusion. Computers transformed offices, communication, and workflows long before their impact appeared in aggregate productivity data. Today, AI appears to be creating a modern version of the same paradox.

Perhaps AI isn’t failing to improve productivity at all. Perhaps organizations are still judging AI-enabled work by metrics that no longer reflect how work gets done.

For decades, productivity was relatively straightforward to evaluate. The metrics were imperfect, but they reflected a world in which human effort and output were closely linked. AI is twisting that relationship. Today, the scarce resource is judgment.

“For 30 years, productivity meant output per person. That definition is now worthless,” says Jessica Constantinidis, Innovation Officer – EMEA at ServiceNow.

“When a machine can generate the output in seconds, measuring how much output you made tells you nothing about whether you’re good at your job. Today, productivity needs to be measured by judgment and intuition. The new scarce skill isn’t doing the work. It’s knowing which work is worth doing, and whether what the machine produced is safe to ship,” she adds.

That shift is forcing organizations to confront an uncomfortable reality. Many of the metrics, i.e., KPIs, that they have relied on for decades, have never measured value directly. They were measuring activities that happened to correlate with value. The correlation is now crumbling.

Measuring Activity, Not Outcomes

Part of the challenge is that organizations continue to measure AI adoption rather than AI impact. Executive dashboards frequently track indicators that only reveal whether employees are using AI tools. They reveal far less about whether business performance is improving.

“AI creates value only when it is deliberately connected to a specific process, decision, or business outcome,” says Dr. Moataz BinAli, CEO of Magna AI. “For example, if AI is deployed in procurement, the measure should not be usage alone, but whether it reduces cycle time, improves supplier decisions, or lowers procurement costs. If it is deployed in customer service, the measure should be faster resolution, improved accuracy, or higher satisfaction.”

“AI didn’t break a working system. It broke a system that had been quietly failing to measure what mattered, and it broke it loudly enough that we now have to fix it.”

— Kurt Muehmel, Head of AI Strategy at Dataiku

The distinction is important. “Too often, organizations launch pilots without defining these metrics upfront,” BinAli says. “Tools are procured before outcomes are agreed, and success is assumed rather than measured. That is why many AI initiatives struggle to move beyond experimentation.”

History suggests this pattern is not unusual. Research on earlier technological revolutions shows that productivity gains rarely appear immediately after deployment. Instead, they emerge after organizations redesign workflows, management systems, and operating models around the new technology.

The lesson for AI may be similar. Deploying tools is only the beginning. Capturing value requires redesigning how work is performed, managed, and measured. And nowhere is this challenge more visible than in software engineering. For decades, engineering organizations relied on imperfect measures that still provided a practical signal in environments where most work was performed manually. However, AI coding assistants further exposed their limitations.

A paper from BNY Mellon and other organizations has found that modern developer productivity depends on factors far beyond output volume, including collaboration, expertise, ownership, long-term maintainability, and system quality. Yet many organizations continue to rely on metrics designed for a different era.

The gap is becoming increasingly visible in industry surveys. According to Harness, 89% of engineering leaders report that AI has made their teams more productive. Yet 81% also say developers are spending more time reviewing AI-generated code. Roughly one-third of engineering effort is now disappearing into activities that remain largely invisible to traditional productivity systems. “That’s the tell,” says Constantinidis. “We didn’t eliminate the work. We moved it from creating to validating, and our definition of productivity never followed.”

The implication extends beyond engineering.

“The optimistic read is that this forces a long-overdue correction,” says Kurt Muehmel, Head of AI Strategy at Dataiku. “Engineering organizations lived with proxy metrics for decades because proxy metrics were convenient. AI didn’t break a working system. It broke a system that had been quietly failing to measure what mattered, and it broke it loudly enough that we now have to fix it.”

According to Muehmel, organizations are increasingly moving further down the value chain, away from measuring activity and toward measuring outcomes.

Who Is Doing What With AI

The deeper transformation occurring inside organizations is about the changing nature of human contribution. The value of human work is shifting from production to oversight, prioritization, context, and decision-making.

That creates a problem for traditional performance systems because judgment leaves a much smaller digital footprint than production. “The new scarce skill isn’t doing the work,” says Constantinidis. “It’s knowing which work is worth doing.”

The shift is also changing who benefits most from AI. Contrary to popular assumptions, the biggest beneficiaries are not always the most experienced employees. “Juniors are adapting fastest,” says Muehmel. “The AI kills entry-level jobs narrative assumes AI competes with junior developers for the same work. We’re seeing closer to the opposite.”

On the other hand, senior employees are reserving their attention for architecture and trade-offs. While some are skeptical and risk being outsourced by colleagues who jumped in, says Muehmel.

The same dynamic exists among leadership teams.

“Leaders who are removed from hands-on work and do not develop an appreciation for AI tools carry a risk,” says Stefan Leichenauer, VP of Engineering at SandboxAQ. “They may make decisions about AI use without really understanding it.” The most effective AI-native organizations, he argues, are led by executives who actively engage with the technology themselves.

Thinking Outside the Dashboard

The measurement problem becomes significantly more consequential when it begins influencing workforce decisions.

Across industries, executives increasingly discuss AI in the context of efficiency gains, flatter organizations, and leaner operating models. There is a running list of organizations making these decisions while simultaneously acknowledging they cannot reliably measure AI’s impact, creating a dangerous mismatch.

The risk is particularly acute because AI often removes visible work while increasing invisible work, and traditional dashboards rarely capture those activities. “AI certainly has ramifications for team structure,” says Leichenauer. “But we need to be cautious about simply cutting heads because coding speed goes up.”

“This comes back to the question of how we’re measuring productivity. While it’s true that generation is fast, we still need humans to invest the time and effort to verify and validate what is being generated,” he adds.

The challenge reflects a deepening uncertainty across the business landscape. While many executives anticipate workforce reductions from AI over the coming years, evidence of large-scale productivity gains remains limited. Organizations are trying to calculate how many employees they need (or don’t) before they know how to measure the work those employees now perform.

A recent example is American automobile manufacturer Ford, which last week admitted that, while it introduced AI to increase production, it miscalculated the value of skilled talent. Some of the company’s most experienced engineers left before their accumulated knowledge could be leveraged for the automated systems. This led to a new approach of making around 350 hires, new or rehires, in the past three years.

“Doing more work faster is the single-most expensive mistake of this entire era.”

— Jessica Constantinidis, Innovation Officer – EMEA at ServiceNow

The divide in understanding arises from the growing disconnect between managers and practitioners. According to research, managers are nearly four times more likely than practitioners to report having no concerns about how AI productivity data will be used.

“The people holding the measuring stick are relaxed,” says Constantinidis. “The people being measured are anxious.” She sees the divergence as evidence of a deeper organizational challenge.

“Leadership sees a dashboard going up and to the right. The engineer sees their craft changing under their feet, the fun work going to the machine, the exhausting review work piling up on them, and a number being collected that they suspect will one day be used against them,” she points out.

When executive optimism and employee experience diverge to this extent, organizations can mistake deployment for transformation. “Real readiness isn’t measured by how many licenses you’ve bought or how confident the executive layer feels,” Constantinidis says. “It’s measured by whether the gap between the layers is closing.”

This may be one of the most overlooked aspects of AI adoption. Technology deployment can be measured through usage statistics and procurement budgets. Cultural adaptation is considerably harder to quantify now. “AI is not a tool you bolt onto the organization you already have,” she says. “It’s a mindset and a culture shift.”

The most forward-looking organizations are beginning to recognize that the productivity conversation itself may be too narrow. “The real shift is that AI is moving from generic productivity support to industry-specific intelligence,” says BinAli.

Rather than simply helping employees work faster, AI is increasingly embedded into the workflows that shape business performance, creating space to rethink work itself. “Doing more work faster is the single-most expensive mistake of this entire era,” says Constantinidis. “Faster assumes the work you’re doing is the right work.”

The Next Productivity Framework

The first phase of AI transformation was acquiring the tools. The second was driving adoption. The third, now underway, is to determine how value is created in a human and AI-led environment.

For the C-suite, clearly, productivity can no longer be treated as the sole technology metric. The organizations seeing the greatest returns from AI are not necessarily those with the highest usage rates or the largest deployments, but those that have connected AI to measurable business outcomes. CEOs and executive teams will need to move beyond tracking adoption and begin asking harder questions about decision quality, organizational effectiveness, customer outcomes, and innovation capacity.

For operational leaders, the task is even more immediate as existing measures of activity are becoming increasingly unreliable. As AI takes on more routine tasks, managers will need new ways to evaluate their employees. Organizations that continue to rely on pre-AI KPI frameworks risk rewarding one task over another.

Boards will also need to encompass measurement. If management cannot clearly articulate how productivity is being measured in AI-enabled workflows, directors should question whether the organization truly understands the return on its investments.

“The deeper redefinition isn’t really about engineering,” says Muehmel. “Whatever the industry figures out for productive engineering becomes the template for productive analysts, marketers, lawyers, and support agents. Engineering is forcing the methodology question early, and the rest of the organization benefits.”

The companies that thrive in the next decade may not be those with the largest AI budgets and models. They may be the ones who learn how to accurately measure human capabilities that become more valuable than tasks AI can perform.

What Must Leaders Do Differently?

CEO & C-Suite

Move beyond measuring adoption and start measuring outcomes. Instead of tracking licenses, prompts, and usage rates, executives should assess whether AI is improving decision quality, innovation, and organizational effectiveness.

Operational Leaders

Redesign performance management systems for AI-enabled work. Rework traditional activity-based KPIs to metrics that capture judgment, validation, collaboration, problem-solving, and outcome creation rather than volume alone.

Boards & Governance Committees

Careem shows how platform businesses face governance risks that traditional growth frameworks often overlook. Rapid multi-vertical expansion increases exposure to regulatory uncertainty, labor governance challenges, geopolitical instability, and integration risk.

Expand AI governance beyond risk, compliance, and cybersecurity to include measurement and accountability. Boards should challenge management on how AI productivity is being defined and how the organization is tracking return on AI investments.

Research Context
This article is based on interviews with:

Dr. Moataz BinAli, CEO, Magna AI
Jessica Constantinidis, Innovation Officer – EMEA, ServiceNow
Kurt Muehmel, Head of AI Strategy, Dataiku
Stefan Leichenauer, VP of Engineering, SandboxAQ

Additional reporting draws on research and analysis from the National Bureau of Economic Research survey, Model Evaluation & Threat Research (METR) study on AI coding assistants and developer productivity, Harness AI in Software Development Report, and other academic papers.

Topics

About the Author

Tags:

AI productivity AI Workforce

Topics

Share