Alex Iskold posted an interesting article on Rethinking Recommendation Engines on ReadWriteWeb yesterday. I like (and recommend) his crisp and clear delineation of different types or sources of recommendations – personalized (based on your past behavior), social (based on past behavior of others who are similar to you) and item-based (based on the recommendable items themselves) – and his emphasis on the importance of incorporating psychological principles, not just technological ones, into the design of effective recommendation engines. [I also like (and recommend) Rick MacManus’ associated recommendations on 10 Recommended Recommendation Engines, but that may be biased by MyStrands‘ prominent placement in that list.] However, I take issue with – or at least re-rethink – some of Alex’ contentions regarding the road to successful recommender systems being paved with false negatives.
First, I want to agree with Alex (and Gavin Potter, the Guy in the Garage that Alex references) about the importance of psychology in technology design and in general ("Enhancing formulas with a bit of human psychology is a really good idea") and the value of recognizing and capitalizing on human inertia. However, his characterization of inertia – the tendency of our ratings to be heavily influenced (or primed) by other recent ratings – seems more characteristic of a primacy or recency effect than inertia (as I understand these concepts). However, I do think that inertia plays an important role in the adoption and use (or non-adoption / non-use) of any technology – people do not tend to change much or even expend much effort, unless or until sufficient incentive is provided.
So I think the inertia problem, with respect to recommendation engines, is more one of motivating users to rate things … and I actually think the Netflix ratings system for movies (which provides the basis for much of the article) is an outstanding example – it doesn’t require much effort (you are automatically prompted for a rating whenever you login to the site after having sent a DVD back), and the more you rate, the better the recommendations you receive, offering intrinsic vs. extrinsic motivation … and explaining why the system has motivated millions of its users to contribute an estimated 2 billion ratings. [Aside: I see that ReadWriteWeb is offering an extrinsic incentive for
comments and trackbacks – a chance to win an Amazon gift certificate –
but I was already planning on adding a trackback for intrinsic reasons.] In any case, however one labels these psychological influences – inertia, priming and/or recency – they are important to incorporate into the design of recommendation engines, and the systems that use them.
Further along in the article, Alex distinguishes false positives – recommendations for things that (it later turns out) we do not like – from false negatives – recommendations against things (it would later or perhaps likely turn out) we do like, and correctly recommends leveraging false negatives more effectively in the design of recommendation engines. [And just to round things out, in case it isn’t obvious, true positives are recommendations for things that we will / do like, and true negatives are recommendations for things that we will / do not like … and thanks to Eric for helping me set the record straight with respect to "do likes" and "don’t’ likes" in my description of false negatives (!)]
Unfortunately, he extends this thread to some propositions that lie beyond my comfort zone:
We do not need recommendations, because we are already over subscribed.
We need noise filters. An algorithm that says: ‘hey, you are definitely not going to like that’ and hide it. … If the machines can do the work of aggressively throwing information out for us, then we can deal with the rest on our own.
Now, on the one hand, I am sympathetic to the problem of information overload. However, as I noted in my notes from CSCW 2006, Paul Dourish pointed out that this is not a new problem:
One of the diseases of this age is the multiplicity of
books; they doth so overcharge the world that it is not able to digest
the abundance of idle matter that is every day hatched and brought
forth into the world.— Barnaby Rich (1580-1617), writing in 1613 (!); quoted by de Solla Price in his 1963 book "Little Science, Big Science."
I’m also reminded of James Carse’s observation about evil in his marvelous (and highly recommended) book, Finite and Infinite Games:
Evil is never intended as evil. Indeed, the contradiction inherent in
all evil is that it originates in the desire to eliminate evil. "The
only good Indian is a dead Indian."
I think that too aggressively filtering out [presumed] false negatives can render us more easily manipulated by technology … and the people and organizations who control technology. Although there is considerable debate about what Web 2.0 is, one of its key ingredients is surely the provisioning of architectures of participation, in contrast to the "command and control" paradigm of earlier technologies (and eras). One of the beneficial side effects of the growth of Web 2.0 – for me – has been enhanced opportunities for serendipity, and allowing more false negatives is likely to yield fewer instances of serendipity. Furthermore, I believe increasing the probability – or acceptability – of false negatives may have the unfortunate consequence of moving further up the head of the long tail … and/or further down toward the lowest common denominator(s). Book burning lies at or near the extreme end of the "acceptance of false negatives" spectrum, though I do not mean to imply that any of these consequences are intended or desired by the article or author.
In earlier chapters of my career, when I was more focused on natural language processing and automatic speech recognition, I became familiar with the concept of Equal Error Rate (EER), which represents a way of measuring the balance between false positives (which yields what is called the False Acceptance Rate, or FAR) and false negatives (False Rejection Rate, or FRR). The documentation for the BioID biometrics system SDK from HumanScan provides a nice articulation of these concepts, including the graph below:
Perhaps the solution to the tension between false positives and false negatives in recommender systems is to incorporate some kind of control for the user to specify an acceptable balance or threshold (which may default to the EER) … although that would also require devising a solution to the tension between user inertia and input … but that simply provides additional corroboration for Alex’s primary argument that we need to incorporate more psychology into our designs of good – or better – recommender system technologies.

Comments
8 responses to “Re-rethinking Recommendation Engines: Psychology and the Influence of False Negatives”
I’m particularly found of this point:
>>I think that too aggressively filtering out [presumed] false negatives can render us more easily manipulated by technology … and the people and organizations who control technology. <<< One of the reasons I quit reading blogs was the whole recirculation and reinforcement of ideas (basically seemed true of any community or segment or lifestyle or professional set of bloggers) There has really been way to little discussion about control and controlling technologies . As you point out, it is like evil, doesn't strike one as bad/wrong but insidiously is. Nice post. Thanks
ken: thanks for the feedback … and reinforcement :-). I was considering the use of “insidious” in the title for the post, but decided to be a bit less incendiary in my commentary.
Your comment triggered a further analogy to FAR (false acceptance rate), FRR (false rejection rate) and EER (equal error rate) – some corners of the blogosphere seem to be echo chambers (high FAR?), while others seem to host Crossfire-style shouting matches (high FRR?). ReadWriteWeb seems to have an interesting mix of commentary (EER?) … as does this blog (in my biased opinion), thanks to commenters like you.
Good post, Joe. I think you got turned around in your definition of false negatives:
That should be recommendations against things that we do like.
Oops – thanks, Eric! I’ve fixed the description in the original post.
There are some instances ( and looks quite common in general ) wherein the supposed “false positives” and “false negatives” are really controlled outputs irrespective of what the user/subject actually has in mind. It is more related to the current (as in when the choice or opinion is given) physiological and psychological state that the body and brain are at, when the opinion or like or dislike is formed. Quantifying or including the the same in metrics is a huge gamble and may itself lead to unwanted “false +/-“. There is a phase where the opinion is not fully crystallized and we are in a oscillating zone, the example cited in ReadWriteWeb article correctly grasps the same.
Praveen: I’m not quite sure what you mean by “controlled outputs” – do you mean recommendations made irrespective of any user profile or preference rating, e.g., recommending something from a broad-based current “top 10 list”? Or do you mean offering recommendations in a [preceding] sequence that are likely to influence the rating of the target item?
I’ll insert the paragraph from the ReadWriteWeb post to which I think you are referring, for local reference:
I would welcome any further clarification you can provide.
Meanwhile, I’ll respond further to the issue you raise regarding the brain and body’s differential receptivity to opinion formation. In the book, “Your Brain on Music”, Daniel Levitin talks about some of the psychological and physiological factors in the our receptivity to music, and how younger brains are more impressionable with respect to determining what music we like and don’t like; I’m only partway through the book, but I imagine he will have more to say regarding the points you raised (or my projections onto your points). In the book “The Female Brain”, Louann Brizendine talks about how hormonal cycles have significant influence on the receptivity in female brains (although another commenter on an earlier post on content centered conversations, in which I referenced Louann’s book, suggested that some of Louann’s claims are overstated).
I’m not sure how deeply we can delve into some of these psychological and physiological dimensions of taste via technology – and again, I’m not sure that is what you meant to suggest – but it provides provocative food for thought.
I just meant that many times we are all not honest in giving our opinions, we are subjective ruled by our emotions and more so for political reasons. Many times our opinions are political in the sense, I may just rate something lower or higher due to personal prejudices and vice versa. Clickfrauds are extreme examples of such cases. I was just pointing out to such factors which may or may not be taken into account.
A link about technology and psychology: http://www.iht.com/articles/2008/02/29/technology/cell.php
Praveen: thanks for the clarification and the link. I agree that social and political factors often influence ratings and reviews, in addition to personal or psychological factors. I enjoyed the article you linked to about consumer tastes and the study of human behavior in the design of cell phones … and particularly enjoyed the quotes from Jan Chipchase that illustrate the ecological and planetary factors that may influence technology design and adoption: “Are you innovating something gimmicky just to sell a product? Or is it saving the planet you are after?”
Another friend sent me a link to a fabulous presentation by Barry Schwartz at the TED conference a few years ago on The Paradox of Choice, based on his book of the same name, in which he [also] delves into a number of psychological factors regarding our preferences (and choices) … but I think I’ll save more commentary on that for a separate post.