09/24/2006

Filtering content in a Web 2.0 world

Web 2.0 big idea is that content production no more comes from centralized sources, but is taken care by collaborative contribution from users. Traditional Web sites face indeed difficult challenges to ensure an attractive content publication services. That comes from several issues:
- content production is very expensive, without speaking about its maintenance
- the Net users are increasingly demanding regarding content reliability and integrity
- real time aspect of the Internet makes the management of content production very complex
- personalization of content for each user is a true headache for all content providers
Web 2.0 collaboratives approach are supposed to solve these issues in an elegant way. By entrusting content production to end users, portals free themselves from all these constraints. Suddenly, they can combine real time, content diversity and low production costs. Combined with advanced research tools (search engines, tags), those web 2.0 platforms demonstrate unbeatable productivity compared to traditional media approaches.
However, it remains the sharp issue of content integrity. While allowing everyone to produce content, how can one make sure that this collaborative content is not biased or misleading?
There are several ways to tackle the problem. First is to screen all produced content before (or afterwards) its publication. If the volume of content remains reasonable, this can be given to a bunch of professional experts.
For big volumes, a very interesting alternative is to have this screening carried out directly by end users. The idea is that if a majority of users consider a content as poor, there is a reasonable chance that it is actually the case. Statistically, for the mast majority of cases, this democratic control appears surprisingly effective. This idea is behind services like Digg.
Still, this process tends to eliminate exotic gems. And what if you don’t think like the majority of users? Well, this is another story that requires a new generation of tools capable to perform one-to-one personalized collaborative filtering. But be patient, as those complex tools are more web 2.5 than web 2.0. ;-)

09/12/2006

Long Tail actual challenges

Welcome in the marvellous world of “Long Tail”. Long Tail is supposed to make gold with loosy back catalogue. I mean a magic way to make business without relying on blockbusters (for details on this buzz word, check the excellent article of Wikipedia on the subject).
For instance, it is common knowledge that Amazon.com generates more sales on unknown books than on best sellers. The classical brick and mortar rule which states that 10% of all existing products account for 90% of sales is no more valid on the Web. In a digital store, to a physical store, zero storage cost offers the option to present an almost infinite number of products. In theory, e-commercial sites have no more limits on their inventory size. In particular, no more need to focus on a specific niche like in the old physical world. You just need to plug a fast and efficient key word search tool on your huge digital catalogue and you can serve the entire universe.
This looks too good to be true. Indeed, offering millions of products implies that customers know… what they are looking for! Indeed, it is impossible to browse around all the catalogue. Moreover, very often, the customer has a very vague idea of what he is looking for. He wants “a new cool shirt”, not “a yellow pink shirt with a rasta logo on its back”. Traditional search tools are unable to look for a “cool shirt”. Cool is not only a vague notion, it is specific to each of us!
All of us have been stuck one day in the 10th sub menu of a site “where there is all”. That is to say, in order to maximize potential of this famous Long Tail, it is necessary to invent new tools to browse within those giant catalogues.
For instance, take blogs. I love to learn that there are more than 30 millions blogs in the cyberspace. What an incredible Long Tail! But these blogs are confronted with the same problem of relevance. In all this enormous sea of posts, how can I spot those which are of real interest for me? I mean, without wasting hours browsing randomly?
Intelligent personalized filtering is definitely the next frontier of Long Tail.

09/11/2006

How to Choose a Filtering Solution?

Recommendation engines which generate real-time personalized recommendations are based on very complex mathematical algorithms. To source such a system, several options are possible ranging from complete in-house development to fully packaged commercial solutions like Criteo.

What are the different types of approaches?
In general, we distinguish two major types of approaches:
- « Content » approaches, based on the analysis of intrinsic product characteristics,
- « Collaborative » approaches, based on the relative user profiles.

To be efficient, content approaches need a complete preliminary configuration of products. Unfortunately, this is barely possible in an open environment. Moreover, results are in general very disappointing in terms of predictive accuracy. For these reasons, content approaches are losing ground on the internet.

On the other hand, collaborative approaches involve two major constraints:
- algorithms which are much more complex than content approaches,
- very high computing resources.
Consequently, very few collaborative methods are capable of managing big volumes of data with acceptable response times.
Conclusion : before rushing on a cheap solution, make sure you won't get stuck in the middle of your ramp up. Otherwise, you are better off doing nothing!