How AI Helps the Best and Hurts the Rest

Generative AI can boost performance for stronger business owners but harm those already struggling. The difference comes down to human judgment.

Topics

  • Mark Shaver/theispot.com

    Can generative AI serve as an effective adviser for business owners and entrepreneurs? Intuitive chat-based natural language interfaces mean that anyone who can read and write can use GenAI tools for a wide range of tasks, even if they lack technical skills. This has obvious appeal for entrepreneurs and small business owners, many of whom could benefit from an on-demand adviser able to help with marketing, pricing, operations, and strategy.

    Improving the performance of entrepreneurs at scale has proved to be challenging. The most effective interventions tend to be high touch, such as hands-on consultingindividualized mentorship, and in-person networking. However, they are expensive to deliver and difficult to scale. In emerging markets specifically, this constraint is often even tighter: High-quality business support can be scarce, and its cost can be prohibitive relative to organizational resources. A low-cost and always-available AI mentor could potentially deliver, at scale, the type of business guidance that has historically been limited by the availability and cost of human experts.

    To test whether accessing generative AI can actually help small businesses, we ran a field experiment with hundreds of small business owners in Kenya. We randomly gave half of them access to a WhatsApp contact that connected them to a version of OpenAI’s GPT-4 that we had prompted to act as a Kenyan business adviser, and then we tracked business performance over time. The key factor driving either an increase or decrease in profits and revenues? Whether an entrepreneur had the judgment to distinguish good AI advice from bad.

    In contexts where problems are broad and fuzzy, generative AI amplifies the role of human judgment.

    Business owners in the experimental group could ask any business-related question of their choosing and use the assistant as much or as little as they wanted. We tracked sales and profits over time, comparing entrepreneurs who got the AI assistant against the control group, who did not. On average, the difference between the control group’s and the experimental group’s business performance was close to zero and not statistically significant. But the average for the experimental group masked a striking split: Having access to generative AI boosted revenues and profits by 15% among business owners who had already been doing well (that is, they were in the top 50% of performance before the experiment), but among those in the bottom 50%, AI use led to a nearly 10% decline in revenues and profits.

    Same Advice, Different Choices

    Why would a tool capable of producing high-quality business suggestions harm the entrepreneurs it was supposed to help? We found that both high- and low-performing entrepreneurs asked a similar number of questions, asked similar types of questions, and even received similar advice from the AI tool. The difference was in what they chose to act on.

    In our data, we saw that every entrepreneur, regardless of baseline performance, received generic suggestions like “lower your prices” or “invest in advertising” alongside more tailored, context-specific ideas. Low performers disproportionately acted on the generic advice, cutting prices and increasing spending on advertising. These one-size-fits-all moves often eroded margins and raised costs without generating enough new business to offset the costs.

    High performers, in contrast, used GenAI to discover and implement changes specific to their situation: A cybercafe owner started renting out gaming accessories to customers; a car-wash owner introduced a new in-demand detergent and started selling cold sodas to waiting customers; and another entrepreneur found alternative power sources to withstand electricity blackouts. Both groups had access to the same quality of AI advice. The difference was whether the entrepreneurs had the judgment to sift through AI-generated suggestions, pick the ideas that fit their business, and ignore the rest.

    Our takeaway from the study is that in contexts where problems are broad and fuzzy, generative AI amplifies the role of human judgment. The value created by an open-ended AI adviser is critically dependent on the human judgment that guides its use and application. In open-ended contexts, a positive effect of AI on performance relies on asking good questions, interpreting suggestions, and choosing which actions to implement. For users with strong judgment, the tool helps surface new ideas and think through trade-offs. Users with weak judgment can end up following plausible-sounding but misleading advice that leads to worse outcomes.

    For managers and policy makers, recognizing this nuance is essential. Without it, well-intentioned AI deployments risk widening performance gaps, because the people who often need the most help are also the least equipped to filter and apply advice.

    How Leaders Should Implement AI Advice for Open-Ended Problems

    Our experience prototyping and launching a WhatsApp-based AI adviser shows how quickly and cheaply generative AI tools can be rolled out and made widely accessible. But a fast implementation of a GenAI tool may also raise the risk that organizations roll out open-ended AI tools without strong guardrails or evaluation. As the cost of deployment falls, AI is being applied to an ever-wider range of open-ended tasks. For example, engineers at Google now use AI coding tools in their day-to-day work, and there is evidence that the most experienced developers benefit the most from these tools. In book publishing, established authors have been able to increase their output with AI while AI-assisted entrants have flooded the market with lackluster prose. For leaders managing AI within their organizations, these findings reinforce the importance of careful design and rigorous measurement to ensure that AI does not inadvertently lead to worse performance.

    What can leaders do? First, cultivate awareness. Leaders should not assume that AI will boost performance for everyone. Evaluations that focus only on average effects can be misleading, because the mean can conceal meaningful harms for specific groups.

    Next, leaders can design for heterogeneity. For workers with experience and judgment, open-ended AI tools can have real returns. Junior or weaker performers might need tighter guardrails to avoid following harmful suggestions. One promising direction is feeding the AI tool more context about the user’s specific situation — their business data, financials, or competitive environment — so that it can better filter out generic advice that doesn’t fit their situation. Building that kind of contextual awareness into AI tools remains an open challenge that GenAI vendors are actively exploring.

    Evaluations that focus only on average effects can be misleading, because the mean can conceal meaningful harms for specific groups.

    In the meantime, it is more likely that most people will find generative AI useful for specific, narrow tasks — such as summarizing documents, writing more clearly, or reviewing code for efficiency — rather than tasks that require a great deal of contextual knowledge to determine the applicability of its output and skill to implement well.

    Organizations should also invest in human judgment and scaffolding around AI use. For high-stakes decisions, escalation to human support is a critical safeguard, especially when advice is open-ended, context-dependent, or difficult to evaluate in advance. Organizations can build supports that make these tools safer, such as structured onboarding that elicits context, decision checklists, or warnings about margin-destroying tactics.

    The third step is to audit for uneven effects by asking questions in three areas:

    • Adoption: Are some groups avoiding the tool entirely or using it far less than others?
    • The interactions themselves: Are different users asking different kinds of questions, providing different amounts of context, or receiving meaningfully different outputs?
    • What happens next: Is the tool changing real-world decisions, and are those decisions producing better results for some users than others?

    Asking those questions can help leaders pinpoint where inequality may emerge, which allows for intervention through targeted training, workflow redesign, or tighter controls.

    AI shows real potential to increase business performance at scale, but the benefits are not guaranteed. Our research results suggest that GenAI can inadvertently increase inequality in business performance by helping stronger performers more than others and, potentially, actively harming lower performers. When deploying AI tools at scale, a central design challenge is to not merely make AI available but to make its use effective so that scaling AI does not scale inequality.

    Topics

    More Like This

    You must to post a comment.

    First time here? : Comment on articles and get access to many more articles.