Congratulations, you just landed a new customer! Once the celebrations are over, you have to decide who to thank – was it the digital or offline ads team, or the in-store or virtual customer service reps, or the customer’s neighbor or friend, or the review of your product on YouTube, or some complex combination of all of these, or none of these? Beyond attributing credit, you need to decide what to do more or less of next time when reaching out to other customers. You have lots of potentially influencing factors, and the outcome you care about – in this case, customer acquisition – is delayed in time from all of the influencing factors. This situation occurs not only in marketing and sales, but in supply chains and operations
, manufacturing, customer care operations, HR and finance, and in absolutely every function of an enterprise. Traditional analytic techniques exist to try to use all the diverse data to understand what sequence of actions would lead to the best outcome, but a modern approach known as reinforcement learning promises to dramatically improve on the current state of the art.
Reinforcement learning is the machine learning approach that is behind some of the most talked about advances in AI
, including robotics and computer programs that can beat humans in games like chess and Go
. Rather than attempting to explicitly program the logic to guide a system’s actions, reinforcement learning algorithms are trained from data to take actions in an environment so as to maximize a reward. For example, the machine that easily beats the world’s best Go players, AlphaGo
, learned to play Go through trial and error, playing lots of games, receiving a reward for good moves, no reward for others, and perhaps losing rewards for bad moves.
To date there have been very few examples of applying reinforcement learning to enterprise problems like reducing the cost of fraud, recommending the best offer, or optimizing a supply chain. These examples, and others, are complex problems for traditional machine learning
models: they have multiple potential factors to optimize and involve a series of events leading to a decision or a business outcome. Both of these characteristics – diverse causal influences and delayed outcomes – make such problems well suited for reinforcement learning, which offers the opportunity to drastically outperform current approaches.
If you want to predict whether a visitor to website will click on an ad, there is just one event, showing the ad, and the outcome you want to predict, clicking the ad. However, imagine predicting what is the best next ad to show a customer after he has visited a website a number of times, interacted with an app, and even called the call center. There is now a sequence of events, across multiple customer touchpoints, each influencing how the company should engage with the customer, in highly complex and evolving ways. Reinforcement Learning provides an efficient way to discover sequences of enterprise decisions that will yield the customer interaction journey with the greatest overall business outcome, even when none of those enterprise actions results in immediate results like a purchase or click.
In many business settings, for business decisions to be guided by data intelligence, they need to be optimized for many complex and interacting factors. For example, in the case of fraud, a typical analytic model is trained to detect the first fraudulent transaction on a particular card. The performance of the model may be judged by standard measures of precision and recall. However, to optimize business benefit, a financial services company won’t want to just reduce the number of fraudulent transactions they can detect, instead they want to tune their models to reduce the overall cost of fraud and to maximize customer satisfaction. This means detecting higher value transactions and minimizing customer dissatisfaction from blocking legitimate transactions, especially by VIP customers. Conditioning customer interactions to account for differences between customers and transactions can be done with traditional rules engines via painstaking programming of logics. Reinforcement learning offers a principled and automatic way of incorporating these complex details to optimize the business value of the customer interaction. Rather than myopically decreasing a loss function to train a fraud classification model, for example, reinforcement learning can find the interactions that maximize the value of transactions blocked minus the cost of customer dissatisfaction,
Teradata Emerging Practices is developing approaches to apply reinforcement learning to these types of enterprise problems, for example, using off policy learning to overcome the challenge of not having a real-world environment to train the model like you would a robot in a lab. We are currently working with customers to demonstrate how reinforcement learning can be used to deliver enormous business value.