Categories Machine Learning

Contextual Multi-Armed Bandits: A Quick and Practical Guide

A practical walkthrough and conceptual deep-dive into the world of bandits

Press enter or click to view image in full size

Photo by Joakim Honkasalo on Unsplash

Is my new product recommender better than my old one? Does my new web journey convert more customers? Which of these images gets more clicks?

The traditional way to answer these questions is with a controlled experiment like an A/B test. For example, having made a new algorithm to power the products you might like section of my website, I expose a random 5% sample of my customers to this new approach. I keep exposing the rest of my customers to the old approach. I wait a few weeks. Finally, I run a statistical test to compare the conversion rate of each group, and declare to my boss that the new approach is better. The result is statistically significant, in fact. After a few presentations we all agree to go with the new algorithm.

This works really well, and that’s why a big company will be running tons of these experiments at any moment in time.

But there are some major downsides to this approach, the most obvious one being speed. When I present my statistically significant result, the optimist will be happy that we have a new product recommender that is proven to improve conversion. The pessimist will be annoyed that we spent the last 12…

You May Also Like