Avoid ML Failures by Asking the Right Questions

Machine learning solutions can miss the mark when data scientists don’t check their assumptions. Adopting a beginner’s mindset in any domain can help.

Dusan Popovic, Shreyas Lakhtakia, Will Landecker, and Melissa Valentine June 24, 2024 Reading Time: 14 min

Topics

Paul Garland

In our collective decades of experience building, leading, and studying companies’ machine learning (ML) deployments, we have repeatedly seen projects fail because talented and well-resourced data science teams missed or misunderstood a deceptively simple piece of the business context. Those gaps create obstacles to correctly understanding the data, its context, and the intended end users — ultimately jeopardizing the positive impact ML models can make in practice.

We have discovered that small mistakes and misunderstandings are much less likely to cascade into failed projects when development teams engage with colleagues on the business side and ask enough questions to deeply understand the process and the problem at hand. Asking questions might seem like a simple step, but that might not be part of a company’s, team’s, or an industry’s culture. Appearing to be in command of all the information needed may be one of the ways employees signal competence in the organization. And while data scientists might possess technical mastery, they can lack the soft skills to reach a deep, accurate mutual understanding with business partners.

At the same time, business partners often hesitate to ask questions themselves and don’t necessarily know what information or context would be helpful to share with a data science team. It’s hard work on both sides to have the kinds of interactions that allow everyone to surface and question assumptions, and identify the most important elements of business context.

Setting ML projects up for success with those kinds of useful interactions requires leaders to foster a culture that normalizes and values asking questions with a beginner’s mindset — and putting aside ego and past expertise. One data scientist we’ve worked with became very intentional about this after noticing that he makes the fewest mistakes when he is in a new context and must ask a lot of questions. But what are the right questions to ask? In this article, we present three examples where significant ML projects failed and explore how asking the right questions with a beginner’s mindset could have improved the interactions between data scientists and business partners, and helped their ML deployments succeed in creating business value.

Scenario 1: Ask ‘What is the business process?’ not ‘What is the data set?’

Facing the first economic downturn prompted by the COVID-19 pandemic, a local finance team at a multinational retail company had a hunch that some customers would weather the storm with a little help whereas others were at risk of bankruptcy. The team wondered whether the company’s data science team could help them predict which customers were likely to file for bankruptcy each month. This information would allow the finance team to identify solvent customers and temporarily extend more credit to assist them during the downturn while limiting the company’s exposure to customers who were very likely to default. The local finance team requested this analysis from the local IT partner team, which in turn engaged the company’s data science center of excellence to produce the model. Using the data provided, this central data science team successfully developed a model that seemed to perform well: During offline tests, using historical data, the data scientists could accurately predict which customers were likely to default on payments.

However, when the model was deployed with the local finance team, it no longer performed well. In fact, it was essentially useless for predicting customer bankruptcy each month, despite performing well during testing and prototyping.

The missing link: Process understanding. This central data science team received and analyzed a compelling and complete data set but, having had little interaction with the team that had commissioned and would be using the model, failed to grasp the underlying business processes. They did not understand the legal process for bankruptcy in the country the finance team was concerned with, or how the timeline of bankruptcy was recorded by the company. The data scientists built the model based on a variable that flagged customers as either having defaulted or not and trained the model to detect the typical pattern of transactions right before the customer was flagged as being in default.

There were three key events on the timeline of a customer declaring bankruptcy: the customer not meeting financial obligations, the customer then filing for bankruptcy in court, and the court finally making the bankruptcy ruling. What the data scientists did not know was that customers were not flagged as having defaulted when they began missing payments; rather, the flags were entered on their accounts only after the bankruptcy court ruling. The data scientists missed that because they were using training data in which the default had already been reported for all customers in the data set; they did not realize that live customer accounts would not be flagged as defaulted until about a year after customers started missing payments. In other words, the model used data to make a prediction that in reality would be unavailable to the prediction system when the model was run against live data — a problem that data scientists call target leakage.

As Kate Jordan, a data scientist at Octagram Analytics, told us, data scientists are often trained to think in terms of the data set in front of them, as well as perhaps some other data that’s accessible and might be relevant to their analysis. By focusing their questioning on the data set, they overlook the context of the operational system that the model will be placed into. Jordan has seen other cases similar to our example above, where data scientists analyzed a data set that included all of the variables, not understanding that one of those variables was not actually recorded in the data in the live system at the time they were programming the model to analyze and act on it. She has seen data science teams focus on variables that they could not actually put into the algorithm in the live operational system. She warned against teams handing data scientists “sterile” data sets and encouraged teams instead to ask “What is the process?” and “What is the system and the system flow that this model is going to be placed into?”

One industry-standard practice that helps data science teams find answers to these questions and develop deep business understanding is to shadow the entire business process. We think that regardless of where the analytics team is located within a company — even those with a global center of excellence — a data scientist working on a problem should be temporarily deployed to the function in question. There, they should spend a meaningful amount of time observing how the job is normally done, what tools are used, and where the inefficiencies arise. Shadowing the business process and being embedded among users is not only a great way to develop a detailed understanding of the problem space itself but also a means of gaining a solid foundation for subsequent adoption of the solution. A related process is for the teams to prioritize creating a diagram that walks through the process.

Business leaders can value and normalize the value of these processes and this level of understanding rather than handing centralized data scientists a sterile, decontextualized data set that they must analyze without understanding the business process and operational systems.

Scenario 2: Ask ‘Who are the decision makers, and what are their decisions and incentives?’ not just ‘What should we predict?’

The revenue management team at the headquarters of a large multinational bank was facing a serious problem. The profit margins of its home mortgage businesses had been steadily eroding for several years in a row. As the team investigated this trend, they learned that customer-facing credit officers who worked in local branches had been offering interest rates toward the lower end of the assigned discretionary ranges. The revenue management team hypothesized that a data-driven approach to setting the terms of mortgage loans would help improve profits. They commissioned a loan price optimization system from a centralized data science team, which developed, tested, and shipped an ML system that had been shown to successfully determine profit-maximizing terms for each individual loan.

During the initial live A/B test, the system displayed performance superior to that of most individual credit officers in terms of realized profits. However, none of those credit officers used the system after the testing was completed.

The missing link: Competing organizational priorities. As in most companies, the bank’s executive board defines and communicates organizationwide strategy. In this case, the board’s strategic focus was profit maximization, a priority that cascaded down to the top-level functions such as retail banking and revenue management. To directly address this strategic goal, the revenue management team commissioned the development of the profit-maximizing pricing model. However, the intended users of the model were housed in the retail banking function, which had its own operational KPIs for local branches. In this case, retail banking had set several KPIs around growth, and consumer-facing credit officers in the local branches were given financial incentives to sign more loans to fuel the desired growth. The credit officers received higher bonuses if they sold more loans — regardless of the loans’ profitability — and in most cases the most straightforward way to achieve that was to apply the largest allowed discount. From a technical perspective, the data scientists had created a model that effectively optimized the given metric. However, from an organizational perspective, their incentives were not aligned with those of their users. The data scientists were optimizing a metric that their sponsors had asked for but their end users did not care about.

To avoid this kind of failure, it’s essential for business leaders and data scientists to better understand the decision makers using the ML system, and the factors informing their decisions. Before engaging in a full-blown modeling exercise, a data scientist and their business stakeholder should make a comprehensive map of decision makers and decisions. This can be another output from shadowing and part of creating the business process diagram. They should seek to understand what decisions are under the control of the project sponsors, the intended end users, their partners, and their business customers. This might include asking users how they might act in response to the kinds of results that a data scientist can anticipate the model producing. This question can help identify gaps in understanding a problem. In the bank’s case, the data scientists’ sponsors were focused on optimizing one metric (profit), but their end users were incentivized to optimize a different metric (revenue growth) and thus did not follow the ML recommendations. This failure could have been prevented if the data scientists had sought to understand the decision makers and their incentives rather than just asking what variable to predict.

Jordan told us that in her previous role at Zurich Insurance, she and her team would sit with users for days and ask questions as they interacted with the data, such as “What would you look at?” and “What do you do with that data insight?” They even rescued a failed project using this method, after a data scientist (who had never been to the intended users’ office) delivered a sophisticated neural net to predict fraud in a disembodied data set, and the model was never adopted by the users. As Jordan and her team questioned the intended users, they came to understand that the users were actually responsible for collecting and producing evidence of fraud that would be sufficiently substantive for regulatory or court proceedings.

A neural net prediction did not meet the standard of evidence; the users needed to construct an account of the fraudulent activities using the actual bills that proved fraud. In other words, their decisions needed to be based on documentary proof; they could not forward cases for potential enforcement action on the basis of an analysis that simply predicted the likelihood of fraudulent behavior.

Scenario 3: Ask ‘Who are the stakeholders, and what actions do they control?’

At Anheuser-Busch InBev’s European operations, the team responsible for the company’s B2B e-commerce platform sought to improve conversion and repurchase rates. Online promotions were their primary tool to achieve this goal, and the team was responsible for designing the key aspects, or mechanics, of a promotion: what products it would include, how long it would run, what type of discount would be offered, and so on. Category managers at the company decided which brands would be promoted, and then promotions were typically executed in bulk each month, using a single mechanism determined separately for each brand.

After running a number of promotions, the platform team saw signs that different customers preferred different types of promotions. This indicated that personalizing promotions to the preferences of each B2B customer might capture additional value by increasing overall conversion and repurchase rates. The e-commerce team engaged a local data science team to produce a personalized promotion model. The deployed system consisted of two layers: A model predicted the probability of conversion for every customer that was given a fixed setup of promotion mechanics, and an optimization wrapper simulated different mechanics for each customer in order to identify the one that resulted in the biggest increase in conversion probability for that particular customer. For example, the system might recommend increasing the range of products included, with the intent of increasing the likelihood that a particular customer would make a purchase from 90% to 95%.

However, after a series of live A/B tests, the data science team was surprised to see that the system was failing to increase customers’ conversion or repurchase rates. While the underlying model had been extremely good at estimating conversion probabilities for each customer and promotion combination, in live tests the system as a whole failed to move the needle on the salient KPIs.

The missing link: A key variable outside of the team’s control. After investigating the output of the model, the data science team discovered that promotional mechanics — the levers that the platform team could control — were not the strongest factors determining customer purchases. Whether they were offered a direct 33% discount or a “buy two, get one free” promotion, for example, did not materially affect conversion probability for most customers. Instead, the most significant variable turned out to be which brand was promoted: If a certain customer was offered the right brand with a discount, they would convert regardless of the mechanics applied. Unfortunately, the choice of which brands to promote was made at a higher level in the organization, which meant that this insight could not be immediately operationalized. Organizational alignment and tight cross-functional collaboration, not a technological solution, had to be implemented before the overall approach could pivot to personalizing the brand offer and increasing conversion rates. From a technical perspective, data scientists were perfectly capable of modeling conversion, and they successfully identified levers that affected it. But from an organizational perspective, their direct users had limited control to act on the recommendations the model surfaced, at least during the initial iteration of the project.

To avoid this type of failure, a best practice is for business leaders and data scientists to understand the stakeholders that are directly and indirectly involved in their work. One way to do this is to ask, “Who are the stakeholders relevant to the process at hand, and what actions do they control?” The answer should result in a clear map of the decision process, with responsibilities unambiguously attached to each junction where human input is involved. For business leaders, this helps clarify where to build relationships and whom to inform in the context of the project scope. This also creates an opportunity for data scientists to educate nontechnical partners and build support and awareness of their work. Finally, it helps both business leaders and data science leads ensure the actionability of insights delivered, by determining who controls which levers and how those tie to the data science analysis at hand.

Especially when responsibilities cross functional boundaries, all relevant decision makers should be onboarded from the beginning of a data science project, regardless of which particular function the initial request originated from. The key is to ensure that from the very outset, any insight resulting from the initiative can eventually be acted upon. Moreover, there is an inherent trust-building value in bringing potential stakeholders on board at the onset of data science work rather than once it is done. Such engagement builds cross-functional trust — giving teams not just an opportunity to learn whether insights can be acted on but also the reassurance that when they are, they are done with goodwill and complete buy-in that maximizes the business returns for the organization at large.

Foster a culture of questioning through hiring and training. Strengthen the organizational capability to think like a beginner and ask fundamental questions by reinforcing these practices with training. Zurich Insurance, for example, ran an intensive summer school for all new data science interns and hires, where they worked through templates that helped them map business processes and better understand the full set of incentives and decision makers in play.

These practices can help managers diagnose and avoid machine learning project failures. But diagnosing issues is only one part of it. Business leaders also need to solve the problems that data scientists discover around misaligned incentives or competing priorities. This requires not only strong sponsorship but also a high level of cross-functional collaboration and alignment, which is where business leaders can excel.

Topics

About the Author

Dusan Popovic is head of data science at Anheuser-Busch InBev, Commercial Analytics Europe. Shreyas Lakhtakia is a graduate student at Stanford University. Will Landecker is the former AI ethics lead and data science tech lead at NextDoor. Melissa Valentine is an associate professor of management science and engineering at Stanford University.
View More

Tags:

& Machine Learning ML models

Topics

Share