Book reading: Here comes everybody

Monday, September 14th, 2009

The book begins with an intriguing story of a girl who found a cellphone that was forgotten in a cab and later refused to return it to the owner. It happened in New York in May 2006 and was reported widely on the Internet as well as in New York Times. Surprisingly, I have not noticed this story even though I was living on Long Island at that time. Moreover the house in which I rented a room was receiving New York Times every morning.

Anyway, that story demonstrates the power of Internet crowd. They are powerful enough to change the course of action of government. A mere 10 years ago such things were impossible.

The book is full of such examples. In other chapters it describes the story of Wikipedia and its unsuccessful predecessor Nupedia, the story of Linux, multiple political riots, as well as unusual cases from American life. Thus it is possible to think of this book as a series of case studies. But the author goes beyond that. Being an NYU professor, the author find out what made such things possible.

He discusses multiple historic examples, for example how McCallum have thought of an org chart when he was working at New York & Erie Railroad. Another example is the invention of the printing press. Before that, the books were copied by hand. No matter how many people were doing that the literacy did not spread. It was impossible to teach people to write using book copying. What was needed was a vast increase in the number of books being read – and only after that people began trying to reproduce what they were reading themselves. The invention of printing press increased the literacy significantly.

The author studied the distribution of number of contributions to Wikipedia. It turned out that most people did very few short contributions. For example, many people attempted to start an article but were not competent enough to write the whole thing. Thus they left after writing only an introduction. But such small contributions when accumulated build a solid encyclopedia.

IRC was mentioned as one of the most convenient means of communication but it is probably the hackers’ paradise. But the author mentions an Internet company Meetup almost in every chapter. I have checked out this website. In Helsinki there are fewer than 5 groups with as many as 100 members. Well, Meetup did not take off everywhere in the world. The author describes other companies that his students have developed.

Well written, with lots of examples, thought provoking, this book will entertain IT professionals and non-computer people alike. The book greatly benefits from the fact that the author is a professor and teacher as the clarity and structure of the text is of very high quality. To me it is an invaluable historical evidence of present day changes which people will keep analyzing for a long time.

Book reading: Programming Collective Intelligence

Tuesday, October 21st, 2008

Programming Collective Intelligence is a visionary book in the sense that I think it predicts a lot of what will happen to the Internet soon. I have been thinking and blogging a few times on how we process information in the Internet age. Instead of reading magazines and newspapers we should use blogs as our source of news. The main reason is that blogs offer much more customized news feed. In a typical newspaper, how much of its content is of interest to a reader? I guess half is a big value but typically it is less than that.

I start my working day with consuming two sweet drinks. One drink is a cup of coffee made by Mocamaster – a trendy brand of coffee makers priced as much as 1,000 Euros. Yes, you can get a coffee maker much cheaper than that but Mocamaster delivers its promise – the coffee is really tasty. Another morning drink is a virtual information soup made of 100 blogs. I glance over most of the stories quickly and select those that I am interested in. I might read them in greater detail later on during the day, in the evening, or on a weekend. I do not know which drink gives me more pleasure – the delicious Mocamaster product or sweet virtual soup. I like the latter a lot because it is rich with media content – with bright images, cool videos, wow-type web pages.

However, I often discover news that I wish I found out earlier. In other words, there are so many news sources that reading them all or just looking at the headlines of major blogs will take too much time. We need targeted information delivery service.

This is the main idea of this book. In fact, it starts with explaining how to make recommendations given a set of preferences of a number of people and your own preferences. What are those cool things that you have not tried out yet but everybody else did? The example described in the book is applied to Delicious which does not offer recommendations yet. In the wild, such a system has been implemented in Digg and in Google Reader. I found out that the recommended blogs are quite relevant.

I often try to decide what my interests are. The blogs that I am reading might answer this question if one builds groups of them. In fact, I have done this manually, but I found out that this categorization is not perfect. The book answers this question in Chapter 3.

After that the book deviates into a number of additional topics such as search, neural networks, discrete optimization. The author Toby Segaran has a great ability to explain difficult concepts using simple words and pictures. As most of the stuff was familiar to me I was wondering how easy a new concept seemed and how much time I spent originally understanding it.

After that the main melody of the book is there again – the next chapter explains how to filter documents, for example to decide if a particular news story is interesting to you or not. Then the book deviates again into decision trees and building price models and even matching people on a dating site. However, there comes our melody again – this time it explains how to extract trends from a lot of news sources, that is decide what people are discussing today. This feature is similar to Google News except that the user has no control of news sources.

I was surprised when I found out that Python is such a popular language in a scientific community. The book describes lots of libraries dealing with numerical data or displaying various charts. The book will serve as a great introduction to Python language even though there are lots of introductory books available. In fact, learning Python this way it easier and more enjoyable.

After reading the book I definitely want to try out the tricks explained there and improve my information soup. This book is my virtual cookbook.