Recommendations. Everyone’s talking about them, to paraphrase the old Eastenders slogan. I’m currently working on a pilot project looking at ways to expose the BBC’s archive content, help people find programmes they might be interested in, and clearly show when the programme was made/broadcast. Part of this work includes examining the ways we can improve episode to episode recommendations. I’ve been doing lots of thinking around this, and here’s the latest.
When it comes to recommendations, there seem to be four approaches. Each have their advantages and disadvantages, but I would argue that, until now, only three of the options have been tried in earnest.
Firstly, there’s the traditional method of hand-picked, manual ‘editorial’ recommendations. This means that staff consider each programme they’re responsible for, look around at what else is on offer, and pick out other programmes that could sensibly be recommended. The advantage of this method is that it’s often highly targeted, and good quality, basically because it’s been sense checked. The disadvantage is that it doesn’t scale well. It requires a great deal of human effort, and equally, a potentially vast knowledge of the programming output of a broadcaster in order to reap the maximum benefit. However, until recently, it’s been the safest, if not the only option on the cards.
The next three approaches are more to do with the reasons for recommendation. They’re often the reasons behind the manual recommendations, but as we turn to data-driven systems more and more, these reasons can inform automatic recommendations.
Production-based information – By this, I mean using production data, such as programme structure, categorisation, classification and cast/crew details, to power recommendations. In its simplest form, this can be seen on bbc.co.uk/programmes for almost any episode, where you can see the previous and next episodes in a series. Essentially, this is a recommendation as to what episodes it would make sense to consume before & after the one you’re looking at. Similarly, the genres, format and channel aggregations offer recommendations based on traditional broadcast classification structures. On the plus side, these are (relatively) easily sourced from the existing programme making workflow. They can also provide pretty useful recommendations. However, they tend to be very general. For instance, just because something is on the same channel, or in the same genre, or indeed, has the same actor in, doesn’t automatically make it a relevant recommendation. I would even argue that just showing other episodes in the same series or brand, as is done on things like iPlayer, aren’t really the best recommendations, and probably shouldn’t be sold as such.
Social-based information – Here, I’m talking about probably the most prevalent form of recommendation at the moment – or at least the one that everyone seems to be advocating. Here, we would collect data on a person’s viewing/listening habits, and use this data to provide other programmes that they might want to see, based on a combination of the frequency/range of their consumption, and the already established production-based recommendations. In addition, this can then popularly be combined with social networking information, so that recommendations can be provided based on what other people you are linked to have been consuming. Again, the advantages are that you can build up a fairly accurate picture of the type of audience you have, based on what they’re consuming, and this can then be used to influence both what you provide to them, and what you commission. However, there are major downsides to this, as well. Firstly, speaking personally, although I accept that recommendations from friends can be helpful, I don’t believe it’s the correct primary source for recommendations. Certainly, I’m not really interested in just knowing that other people have watched a particular programme – just because they watched it, doesn’t mean they would recommend it. Indeed, just because they liked it, also doesn’t mean they would recommend it. A recommendation, in this form, at least, has to be pro-active. That is, I’d much rather a friend actively recommended a programme to me, rather than a computer spying on their habits and then telling me. Which brings us to the second problem – the slightly dubious ethical/moral question of whether it’s right for companies to collect detailed information about audience habits. A really thorny question, which I’m not going to delve into now.
Which brings us on to the final form of recommendation, the one I believe gives the greatest benefit. And surprise, surprise, yes, it’s Content-based recommendations. Here, I mean something deeper than ‘this episode is in the same brand’, something more specific than ‘this programme has something to do with the same topic’, and something less, well, creepy than ‘twelve of your friends watched the Inbetweeners, so you must too!’. I’m also not talking about just tagging content. Tagging is probably the simplest and crudest way of doing this – it’s a start, but it really isn’t the end game. I mean that it’s necessary to, as far as possible, represent the actual content of the programme as data, and then link to other programmes which utilise the same data. This provides the most accurate recommendations, because we know that the exact same thing (or at least things with meaningful links between them) are being recommended. The downside, unfortunately, for the time being, is that it would have to be a fairly manual process. In this way, yes, it’s similar to the hand-picked, curated recommendations I mentioned earlier. The difference here, though, is two fold. Firstly, we’re capturing the reasons behind the recommendation as data itself, which leads to automatic re-use rather than constantly having to manually pick things (there would, of course, probably still need to be some form of editorial oversight to at least pick out highlights from the potential mass of auto-generated recommendations). Secondly, it can be folded into the production workflow from the very beginning, by engaging with writers & production staff, so that a seperate team is not required, and the recommendations can be captured and compiled at the very source, rather than after the fact. Commonly, the people who will know the content (and therefore the links) the best, will be the people who made the content in the first place.
This really shouldn’t be news to anyone, and yet it seems that this approach, until now, hasn’t been tried, in the main. I really can’t understand why, although given the problems and reluctance to even provide enough accurate data to power the production-based recommendation perhaps provides a clue. But I don’t think I’m alone in advocating this. In the oft-quoted (but perhaps not often enough!) words of Nicholas Negroponte, in 1995’s Being Digital:
“We need those bits that describe the narrative with key words, data about the content, and forward and backward references….The bits about the bits change broadcasting totally. They give you a handle by which to grab what interests you and provide the network with a means to ship them into any nook or cranny that wants them. The networks will finally learn what networking is about.”
So that’s not just tags, but data to actually represent the content.
With all this in mind, I’ve begun to compile a mixture of production-based and content-based recommendations for traversing through the BBC’s archive. The next post will provide some examples of this, and lead you through the format and choices I’ve made in representing these links in the n3 format of RDF.