How does SO1’s AI compare to Spotify’s?
Explaining latent properties — where meaning has no name
Explaining AI is somewhat of a challenge, because for most humans outside of our development lab, the matrices of numbers don’t mean a great deal. So this post attempts a practical comparison.
At SO1 (Segment of One) we operate in the retail CPG industry, so it’s actually about nothing more than promoting tomatoes, shampoos, dog treats, and the like. If we want to recommend the right products tied to a promotion, the first step is to understand those products and how shoppers perceive them.
The deeper this understanding, the more successful we are in creating additional revenue for our partners, e.g. grocery retailers or drugstores. Each of these retailers sells thousands of products to millions of shoppers with innumerable preferences and perceptions. That’s why we’ve built an AI that is capable of understanding these products (as well as consumer preferences and how best to reach the retailers’ objectives) in great detail – without any human interaction.
In this post, we explore a bit how SO1 achieves this, by comparison with another popular recommendation system: Spotify.
Before we dive in, I want to get two big disclaimers out there: I’m using music as a tangible example – but I have no validated insight whatsoever into the workings of Spotify’s AI. Still, from publicly available documentation and simply using their platform we can infer a quite similar approach to delivering their service to the user.
Secondly, understanding products is only one part, albeit an important one, of SO1’s recipe for success. Along with our insight into the product assortment, there are user-specific features that we also take into consideration, such as purchase behaviour over time or price sensitivity.
The Spotify function that I want to draw an analogy to is the “radio”. Users can switch from a curated list of songs to a radio; think of it as a sequence of songs someone else chooses for you. Many of the main metadata elements of a song — the artist, the genre, the album, or the song itself — can be chosen by the user and will act as a backdrop to set the general style of the radio. What Spotify then does is to choose more songs which are … somehow … similar to the song playing at the moment. The main thing to realize here is that it doesn’t matter at all whether the songs in the sequence can be described as similar, the only thing that’s relevant is whether the user perceives them as similar.
All I Want For Christmas Is You
Take a Mariah Carey Christmas classic that you want to build an actual radio station around. How would a music editor go about building the playlist ? She would certainly take to a couple of music libraries and look for the label “Christmas”. This label would have to be given to albums by other editors or librarians. It is very possible that the label is only accurately applied to full Christmas albums, while individual songs might not be tagged if they appear on a “best of” album for example. This is only one problem of accurately labeling musical titles. If you wanted to take other occasions and moods into account, while also considering “beats per second”, “genre”, “language of the lyrics” and so forth, the effort required for labeling and the expected error rate would increase exponentially. Moreover, it’s not even a given that similarity in these humanly understandable properties are relevant for a “good playlist”. And if you think back to the last sentence of my previous paragraph: the listener does not really care how a radio station builds its playlist; the only thing relevant is whether the listener perceives the list as appropriately matching.
Spotify’s solution to rendering matching playlists starts with looking at what users themselves feed back to Spotify as matching playlists: When the user perceives songs as nicely matching their expectations, they can use a thumbs-up icon to give their feedback. And there is a thumbs-down icon, too. I presume that if the song is allowed to play to the end, it probably wasn’t too bad, whereas if if the user skips to the next title — no more questions, your Honor.
Let a million users do this, and the machine will gradually build an understanding of which sequences make sense to users and which don’t. And certainly the multitude of radio stations that Spotify offers, considering the user may start a radio station off of any music, artist, album, or genre, requires maybe a few magnitudes more of user feedback than the mentioned million. But it’s a numbers game in the end. The machine can work with whatever feedback it has, and gradually get better over time.
May I Now Have a Look ?
When the machine has built its understanding, you might pop open the hood and have a look. In fact the machine will never be done learning, yet the rate of change it applies to the whole picture diminishes with each additional data point aka user feedback. Having a look means in this case that you let the machine list all the songs that it believes are similar in each particular property. These properties, mind you, were not predetermined by the makers of the machine! No, the machine itself chooses which properties it will use. And then it will group songs in these properties based on their “likeness” of the property in a percentage value.
A human looking at all the songs that are very much “like” any one property will then be able to give the property a name, say : Christmas songs. There might be another property where you look at and say, “gosh, these are all carolers’ songs”, and yet another property where the songs rated high by the machine are definitely pop songs. Naturally, some Christmas songs also rank high in pop, like Mariah Carey’s “All I Want For Christmas Is You”, while others are Christmas carols.
Where meaning has no name
The takeaway here is that the computer today is capable of choosing, by itself, the properties within which it wants to describe the similarity of things. And only when you open the hood and look at them, will a human be able (though not in all cases) to give a name to those properties.
(Source: Mars Inc.)
Going back to the supermarket, where SO1 does its business, the same anonymity is true for the “latent properties” that our SO1 Engine can discover when looking at a myriad of shopping incidents. SO1’s thumbs-up and thumbs-down is simply items purchased together.
Our machine doesn’t have any notion of a potato. It does not have words for the property “organic”, nor any other property. What the machine is capable of doing is finding all products which are sufficiently similar in this regard and thousands of other properties. Of those thousands of properties, we found the top 800 to be a good compromise between resolution and clustering, which is to say, a good balance between granularity and aggregation. But, again, there are no names for these properties and the machine “does not care”.
When a human later looks at what the machine has grouped closely together, she will easily be able to call it “organic”, “dairy-free”, “GMO-friendly” or “these wine labels all look expensive”; however, keep looking and you will encounter groups of products which cannot be described so easily. For the machine, the grouped products share a similarity even though we can’t understand it! And this is where the machine becomes more powerful than a human marketing mastermind.
What does it mean, finally, to truly understand products ? By way of example, we can do calculus with products. We can add and subtract. In other words, our system can take “organic potatoes” and subtract “potatoes”. For our system the result is the “organic”-ness of products. And remember, this is one of the 800 properties for which we humans don’t have a name; the SO1 Engine can even do this for properties that we would never recognize on our own.
Add this “organic”-ness onto regular tomatoes and you get ? Organic tomatoes.
And the same works for organic shampoos, meat, soft drinks – you name it. This may not come as a big surprise to you, but making a machine be able to “understand” this is a big deal, we think. The real magic, though, is that the SO1 Engine can discover all of this by itself and no-one needs to input metadata which says organic, nor need anyone babysit the machine, maintain nor tune it.
What a value to bring to a grocer’s recommendations when they can recommend single-household size convenience food made of organic potatoes to such users who like them, and such users only! And this is just the beginning of what Segment of One means.