Programming Collective Intelligence is a visionary book in the sense that I think it predicts a lot of what will happen to the Internet soon. I have been thinking and blogging a few times on how we process information in the Internet age. Instead of reading magazines and newspapers we should use blogs as our source of news. The main reason is that blogs offer much more customized news feed. In a typical newspaper, how much of its content is of interest to a reader? I guess half is a big value but typically it is less than that.
I start my working day with consuming two sweet drinks. One drink is a cup of coffee made by Mocamaster тАУ a trendy brand of coffee makers priced as much as 1,000 Euros. Yes, you can get a coffee maker much cheaper than that but Mocamaster delivers its promise тАУ the coffee is really tasty. Another morning drink is a virtual information soup made of 100 blogs. I glance over most of the stories quickly and select those that I am interested in. I might read them in greater detail later on during the day, in the evening, or on a weekend. I do not know which drink gives me more pleasure тАУ the delicious Mocamaster product or sweet virtual soup. I like the latter a lot because it is rich with media content тАУ with bright images, cool videos, wow-type web pages.
However, I often discover news that I wish I found out earlier. In other words, there are so many news sources that reading them all or just looking at the headlines of major blogs will take too much time. We need targeted information delivery service.
This is the main idea of this book. In fact, it starts with explaining how to make recommendations given a set of preferences of a number of people and your own preferences. What are those cool things that you have not tried out yet but everybody else did? The example described in the book is applied to Delicious which does not offer recommendations yet. In the wild, such a system has been implemented in Digg and in Google Reader. I found out that the recommended blogs are quite relevant.
I often try to decide what my interests are. The blogs that I am reading might answer this question if one builds groups of them. In fact, I have done this manually, but I found out that this categorization is not perfect. The book answers this question in Chapter 3.
After that the book deviates into a number of additional topics such as search, neural networks, discrete optimization. The author Toby Segaran has a great ability to explain difficult concepts using simple words and pictures. As most of the stuff was familiar to me I was wondering how easy a new concept seemed and how much time I spent originally understanding it.
After that the main melody of the book is there again тАУ the next chapter explains how to filter documents, for example to decide if a particular news story is interesting to you or not. Then the book deviates again into decision trees and building price models and even matching people on a dating site. However, there comes our melody again тАУ this time it explains how to extract trends from a lot of news sources, that is decide what people are discussing today. This feature is similar to Google News except that the user has no control of news sources.
I was surprised when I found out that Python is such a popular language in a scientific community. The book describes lots of libraries dealing with numerical data or displaying various charts. The book will serve as a great introduction to Python language even though there are lots of introductory books available. In fact, learning Python this way it easier and more enjoyable.
After reading the book I definitely want to try out the tricks explained there and improve my information soup. This book is my virtual cookbook.