The SO1 Supermarket Gym – Fundamental Research

In our previous blog posts, we introduced the Supermarket Gym, a tool to simulate the shopping behaviour of individual customers. So far, Sebastian, our Chief AI Officer at SO1 (Segment of One), has explained the following:

Part 1: Introduction to the Supermarket Gym – what it is and why we developed it
Part 2: How we generate data for the Supermarket Gym to accurately simulate shopping behavior – and what type of academic research is considered

Let’s now have a look at how this Supermarket Gym can be applied in SO1’s fundamental research projects (interview edited for clarity and length).

Now that we know how the Supermarket Gym works and what the background of it is, what can SO1 learn from it?

There are several things we can learn from it. Coming back to the two motivations for developing the Supermarket Gym in the first place, the Supermarket Gym provides us with a “workout space” in which we know exactly where the data comes from, so to speak. By that I mean that we define the parameters of the simulation. Just to give some examples of what those parameters can be: they might be your brand preferences, e.g., you like Coca-Cola, you hate Pepsi Cola; it might be your price sensitivity, i.e. the price might affect your decisions a lot or very little. So there are a series of parameters that we need to define before we can run the simulation. A simulation’s parameters are defined by the modeler, which is us in this case.

Once we have those parameters defined, we can simulate data. Without “simulation noise” the result of our sampling will always be the same, so we systematically add noise, or uncertainty, to our simulation, and this simulation noise creates (random) variation in our data.

So we have that data now, and, as I mentioned before, one core benefit of this simulated data is that we know true preferences. And those true preferences are the reason we have the data actually generated. Because when you think of that in a mathematical framework, the data is the outcome of our model. Let’s denote that with Y, where Y is a function of our input X. In this case, our input X could be both (a) certain characteristics of the market – i.e. what products are available, what retailers are available, what the products’ prices are, what the categories are, etc., but also (b) consumer preferences. And in reality we observe the market, because we know prices, we know the products. We might not know categories, but that’s a caveat that we don’t need to go into right now. Overall, the market is fairly well known.

But one thing we never know in a real-life application: true consumer preferences. So in modeling, what we typically do, is we define a model, and then we input the data. We input both the market data, that is, our X, and also the output data Y, which we observe. And then we try to infer consumers preferences – and this is a very typical econometric approach to modeling consumer decisions.

Now, once you have simulated data, you know not only the market state, but you also know the consumer preferences. So the only thing you don’t know is the error term of your model. It’s the uncertainty that we introduced in the simulation.

Now, once you have simulated data, you know not only the market state, but you also know the consumer preferences. So the only thing you don’t know is the error term of your model. It’s the uncertainty that we introduced in the simulation.

Again: you build a simulation, run your model on it, and you don’t only do that once, but you do that a certain number of times “n” (“n” replications). You analyze whether the average across all your replications, the average of your model output, that is your model of consumer preferences, fits the true consumer preferences, which you input in your simulation. This is called a Monte Carlo simulation. Monte Carlo meaning that you run that whole thing a certain number of times, compute the average and then compare your average estimate of consumer preferences to the average or to the true consumer preferences that you simulated.

Now, this allows you to do two things:

  1. You can understand whether your model is actually able to uncover true consumer preferences. Because you can directly compare the average over your Monte Carlo simulations, your model output average, with the true simulation, which is great.

But it also allows you to do one more thing, and that is even more important for us.

  1. We systematically deviate from the original data generating process, which is a bunch of econometrics models. We mentioned the Multivariate Probit Model, and the Multinomial Probit Model, for example. We deviate from those and we use machine learning instead, especially deep learning. Now, with deep learning, often you don’t really know what your deep neural network learns so in our case, we can take whatever the deep neural network learns and compare that to the original parameters of our data generating process, that is category structure, consumer preferences, etc. The beauty here is that we can really analyze our deep learning approaches very, very thoroughly. We can compare what they learned, which is normally inscrutable to mere mortals, to the truth. In the real world, we can’t do that. So this goes beyond only testing whether our model was implemented correctly, which is the standard econometric approach, i.e., using simulations to test models. This goes beyond that, because it allows us to open the black box of deep learning, to look into that and to see what our deep neural networks actually learn.

And this is something we did, for example, in joint research with Massachusetts Institute of Technology (MIT), and what we presented at the Marketing Science Conference in Philadelphia in 2018. Here, we presented a discrete choice model, but not a classic discrete choice model as commonly seen in the literature – which is exactly the type of model we used to generate our simulated data. What we propose is a deep learning model that models consumer choices across a certain number of categories and products.

This work is really exciting, because what we were able to show is that the deep neural network learns various aspects of that original data-generating process without making any of the assumptions that modelers typically need to make, concerning for example, which products are substitutes, or the functional form of consumers’ price sensitivities.

We also learned purchased incidents, i.e., the timing of purchases, very well without any parametric assumptions on how consumers build inventory and consume products.

So opening the black box is something that sets SO1 apart from other companies.

Yes. But there are two specific factors that set us apart from competition:

  1. There is a lot of research going into “transparent ML” and “interpretable ML”. So the gym is not the only way to help us understand what the machine learning models really learn. But I believe it’s a pretty smart way.
  2. We validate our models very carefully, not only through benchmarking and A/B testing, but we dissect the models and are not merely happy with predictive performance. This also gives us trust that we identify causal effects and not mere correlations.

How does SO1 use what it learns from looking into the black box to benefit our clients?

We apply what we learn from that in several ways. Most importantly, we develop new models based on simulated data, and then when a model has a high prediction accuracy, we want to understand why the model performs well. And the “why” is something that the Supermarket Gym can tell us.

The second application is to help us, internally, to understand certain caveats with regard to our data. Remember that this whole development was basically driven by our company goal, which is providing coupon recommendations and building targeting systems. Now, targeting systems are very dynamic, because we target consumers and then we observe their reactions, and then we need to learn from those reactions. So there’s a constant action-reaction feedback loop.

That also means that the data we collect is very biased, biased by our past actions. In such systems you need to be very careful because you don’t want your recommender algorithm necessarily to internalize your historical targeting. Because if you give out coupons for certain things, you might actually converge to very specific edge cases of your solution space. So you want to understand what the long-term performance of your algorithms is, especially if you don’t just train your models on random data, but also on targeted data, which is what we do. This is another aspect of how the Supermarket Gym is very useful.

The Supermarket Gym allows us to “fast forward” through time. Since it’s a simulation, we’re not dealing with real shoppers in the real world – we’re offering coupons to simulated shoppers in a simulated environment. So it’s very easy, and very cost effective to fast forward 100 or 200 weeks in time, observe lots of shoppers, and we can see how our algorithms work in that simulated environment in the long term.

Now, of course, that’s not the reality. And that is of course a major limitation of the simulation. It’s simulated. It doesn’t necessarily reflect what happens in reality perfectly. It is certainly a very good proxy for reality, but it’s only the first step.

So what happens after that first step?

After that there’s always a series of further tests. They usually involve applying the models to real data and in vitro tests, that’s basically using historical data to evaluate models on train/test splits, evaluating the predictiveness, and then, of course, a series of A/B tests in the real world, usually starting with low traffic, then increasing it gradually.

At some point, when we’re confident that the model did well in the simulation and in our internal benchmarking (e.g. the in vitro test), and have demonstrated that it works well in A/B tests, then we typically roll-out the model or modifications

So the simulation, the major use case, is very early in our algorithm development. It’s used either to develop new models and new algorithms, or to analyze why and how existing algorithms (algorithms already used in production) work in a controlled environment.

In the fourth and final part of this interview series, Sebastian, our Chief AI Officer, will give an outlook to other uses cases as well as new fundamental research projects at SO1:

Part 4: Applying the Supermarket Gym in software integration projects (plus what’s next on SO1’s fundamental research agenda)

Retail technology news