WIRED magazine advertises science that deals with massive amounts of data

Saturday, August 28th, 2010




New trends in computing appear every now and then. Dealing with massive amounts of data is not new but a few interesting applications that can greatly benefit from our increased processing power are on the horizon. Wired magazine describes these applications in the following two articles that appeared recently:


Sergey Brin’s Search for a Parkinson’s Cure


What You Want: Flickr Creator Spins Addictive New Web Service

The first article deals with the problem of how the new drugs are developed. It turns out that the biggest problem is not whether it is useful or not but whether it poses any danger (or in other words has any side effects) or not. For example, aspirin was discovered in 1899 but it was not until a 100 years later that it was noticed that patients who take aspirin regularly have descreased risk of heart attack. In this case, the side effect was positive. However, in many other cases it is negative. The difficulty of testing a new drug is that it takes a lot of time to establish a strong correlation between taking a drug and a certain change in patient’s body. The problem is that increased body temperature is a possible result of many things – food, outside environment, which people the patient talks to, etc. A more comprehensive monitoring system is needed to take those things into account. However, working with such a multi-dimensional data set is only possible with use of automated tools and requires lots of processing power. This is what Google is good at.

The second article describes Hunch, a system that tries to build a psychological model of you. It asks you a number of random questions, for example whether you believe in alien’s kidnapping or not and then matches your answers to those of other people. Then it can give you recommendations based on what people with similar answers like. The system makes a step further, however. It can try to guess your answers to arbitrary questions based on what people who are similar to you answered. I have tried to allow Hunch to learn a fair amount of my preferences – I answered over 100 questions. After that it started to recommend what seemed like interesting guesses. However, when it tried to predict my answers its rate of correct answers was around 50%. Therefore, the system did not learn much yet. On the other hand, a human being is a lot more complex creature whose model obviously does not fit into 100 questions.

Another system mentioned in the article which I liked a lot more is Aardvark. It uses a unique combination of computing technology: messaging, tagging, social networking, etc. Its idea is simple and I guess many people have thought of it. What if you have a question but you don’t know who to ask. Then most likely you head on to a web forum or a mailing list and ask. The problem is that you have to find an appropriate forum and wait a couple of days to get an answer.

The problem is that Internet does not allow you to find right people instantly. Try typing Who knows Chinese out there in a search engine and see what happens. Aardvark is a kind of social search engine. Once you type your question the system determines its topic and tags it appopriately. Then it searches over its database of users who have indicated that they are experts in this particular area. In addition, Aardvark will check who is online at the moment to avoid sending your question to someone who is possibly on vacation. The expert will get an IM notification from Aardvark asking whether (s)he wants to help. If yes, the expert can type the answer immediately and even chat with the person who asked the question. The whole process is real-time which actually encourages people to ask questions.

I have tried Aardvark both to ask questions and to answer linux questions. In either case the experience was positive. I got answers that were quite valuable. For example, when I asked how to learn Chinese I was given a link to a web site with online language course which even had an iPhone app to facilitate learning. I could not find it otherwise in app store.

Aardvark contacted me through Google Talk a few times. It is actually fun to talk to a robot because this way you can help out real people. I guess this is one cool application of artificial intelligence – robots are helping people to socialize!

WIRED magazine, what it looks like nowadays

Monday, July 5th, 2010

I used to read WIRED magazine a few years ago. Then when I started working I only had time to read what was absolutely necessary – Communications of ACM.

However, this spring I decided to subscribe to Wired magazine again. I was impressed when I got a copy of the magazine – it has totally changed. To start with, nowadays it is half advertising, half content magazine. I guess this is because of the economic crisis. In other words, it is possible to throw away half of the magazine right off the start.

What remains is also not necessarily interesting. I used to like wired-tired-expired section as well as those new pesky words from the underground that Wired was decyphering to the rest of us. Those sections are gone.

But there are few interesting articles still. In June 2010 issue a number of interesting people are mentioned including Nicholas Carr and Daniel Shirky. I have read books of these authors. The magazine advertises their new books which I definitely need to check out. There is an interesting research article as well called Traffic Cop. It describes brave attempt of few people to build a model of traffic in NYC as a giant spreadsheet. By introducing a number of parameters to the model and adding an ability to fiddle with them it becomes possible to optimize traffic in NYC. This work is so necessary for lots of cities with traffic jams!

To summarize, Wired magazine is always fresh and surprising but not necessarily what you want to read. I can only explain this as an attempt to attract more readers, for example causal technology enthusiasts are more interested in space junk and organ transplants whereas computer specialists are appreciating computer technology content. To me Wired looks more like Popular Mechanics nowadays, at least earlier it was more related to geek culture. On the other hand, the definition of geek is also changing.

Reading list Spring 2010

Friday, April 16th, 2010

I have read the first three issues of Communications of ACM of year 2010: January, February, and March. Overall, I have noticed that CACM is aiming at a broader scope, not only CS-topics but also biology and physics. Therefore, nowadays it is more like Science magazine or Nature. But of course in every article there is a computational aspect that connects computer science with another area of knowledge. I found out that cross-disciplinary articles are more engaging than purely technical articles. The nature has lots of secrets that computer science helps reveal.

Jan 2010
Rebuilding for Eternity. Bundler – open source version of Photosynth.
Automated translation of Indian Languages
New Search Challenges and Opportunities
Data in Flight. Implementation of StreamSQL. Stanford streams, MIT Aurora, SQL Stream.
Other people’s data – XIgnite

Last but not least – two articles that discuss Google’s parallel engine – Map-Reduce. I have noticed that CACM contains lots of articles dedicated to Google’s technology, for example there is an article discussing the evolution of Google file system in one of the following issues. At the same time there are no articles from other software giants, for example Microsoft, Apple, or IBM. This is not because those companies do not innovate. Everybody knows that programmers went nuts writing iPhone apps. The reason of Google domination is I believe that amount of sponsor money that it gives to ACM. It is fine, Google has created lots of innovative frameworks but other companies deserve attention as well.

Map Reduce and Parallel DBMSs: Friends or Foes?
MapReduce: A Flexible Data Processing Tool

Feb 2010

The best issue I have ever read! To start with, its cover story is dedicated to new model of computation, quantum algorithms. This topic is not new. When I was an undergraduate student in Russia in late 1990s there was lots of buzz of how quantum algorithms can change the cryptography. With its strong mathematical tradition, Russians were trying to explain quantum algorithms from the number theory point of view. To me it was totally incomprehensible. Or I should say that my mind was more inclined toward an algorithmic perspective of quantum computers. In this article CACM does a great job on explaining the notion of quantum algorithm at the level that was most appropriate to me as a software engineer. It briefly mentions computational complexity challenges and explains how quantum algorithms might help tackle those.

Recent progress in Quantum algorithms

Type Theory comes to age. Aura, Jif for security. Philip Walder
An interview with Michael Rabin

A few billion lines of code later.

Another great article in the same issue! When I was a student (again) but this time in a graduate school in the United States I was lucky to witness the emergence of a new technology – practical bug detection using static analysis. But I will start with a brief introduction on how industrial research is transformed into a widely adopted mature technology.

In my life so far I saw two such events. More experienced people might name a few other cases but here is what I can say. In late 1990s computer graphics has advanced rapidly because of increased processing power. Researchers began experiments with massive amounts of data or images. This is how light field mapping technology was developed simultaneously at several universities as well as at Microsoft and Intel. Its idea is to build a 3D model of an object from a number of images taken with an inexpensive camera. I was lucky to participate in the development of this technology as an undergraduate intern at Intel-Nizhny Novgorod in 2001-2002. However, it was only a research project which was soon abandoned. However, in year 2010 there is a commercialized version of this technology Photosynth that Microsoft has created.

When I joined graduate school in Stony Brook in 2002 application security was a hot research area. Everybody was thinking how to protect the programs against viruses. This is why we have created DIRA – a dynamic protection tool that instrumented programs with additional instructions that made it resilient against buffer overflow attacks. But again, the project was soon abandoned. However, Dawson Engler was able to transform the technology landscape with his static bug finder. In this article he describes his experiences with making commercial tool from a research project.

Software Model Checking takes off
Assessing the Changing US IT R&D Ecosystem

March 2010
Chasing the AIDS virus

Cover story is another must-read article! It explains the mechanics of AIDS virus. I never thought that it can transform itself to avoid the medicine it is exposed to.

Making decisions based on the Preferences of Multiple Agents

This article describes various algorithms of voting with applications to social networks. Very comprehensive discussion.

Engineering the web’s third decade
Orchestrating coordination in pluralistic networks
GFS: Evolution on fast-forward
Global IT management: structuring for scale, responsiveness, and innovation

Economist special reports on financial crisis and information management

Saturday, March 13th, 2010

These days I am reading every other issue of Economist, those that contain special reports. A special report is a 10-15 page report that deals with a particular issue, for example financial crisis. It consists of a number of articles. Each article sheds light on a specific side of the topic, the articles are typically arranged in a logical order.

The financial crises special report explains the multiple reasons behind the troubles that world economy faced. It turns out that the underlying theory of risk evaluation was developed in mid-20th century. I noticed that Economist is always doing a good job on analyzing the history of a particular event. The underlying theories are always developed long time before their applications start to make a difference.

The use and misuse of computational models to evaluate the risk was the primary reason behind the crisis. A simple example of an error-prone model is when bank A owns shares of bank B and bank B owns shares of bank A. If one them collapses then the other will collapse too. But models often ignored this domino-effect. Of course, not everybody was that stupid. But the problem was that as soon as one bank began announcing higher yields other had to follow to stay competitive. Thus, the mathematicians were forced to bend their models to make them fit the desired higher yield.

Another article describes how the risk managers were treated. Various tricks were played to reduce their influence on the decision of borrowing money. One common trick was to work quietly on a proposal for weeks and show it to risk team only a couple of hours before the approval meeting so that they would not have time to evaluate it properly.

The rest of the report is devoted to how to avoid repetitions of the crisis. Of course, banks need to have bigger reverses in cash. Also they need to prepare themselves, they need to understand which factors lead to such crisis. They are playing board games now when a bank is put into a simulated crisis and the management needs to think how they got into such mess.

But regulators also need to do a lot. Too big to fail is one issue that they need to address. Another problem is that the Central Bank was kind enough to lend big amounts of money with low percentage which stimulated the desire of banks to borrow. If a bank has lots of money then it starts to attract kinds of customer it would typically not mess up with. It might even promise a higher yield than average but obviously such good life ends as soon as cheap money supply stops.

Another special report that I read deals with information deluge. Again, Economist begins with a history lesson: in 1917 a manufacturing manager complained on the effects of a telephone. It was called a big time-waster and confusion-generator. But Craig Mundie is saying that big data opens new horizons for new economies. Farecast, a system that Microsoft built allows to estimate when to buy a flight ticket depending on the expected change in the price.

Economist provides lots of examples how various companies saved using better information processing tools. For example Nestle found that nearly 9 million of its records were either obsolete or duplicate. Another example is Chinese company Li & Fung that operates a supply chain. One of the most important technologies is videoconferencing which allows buyers and manufacturers to examine the color of a material.

Another article is dedicated to Google. It managed to build a translation system using machine learning over a training set of 2 trillions words obtained through its book scanning technology. In early 1900s IBM tried to build a French-English program but their system did not work. The reason was that IBM had only millions documents, not billions. Therefore, big data generates big improvements. The magazine also mentions the Data Liberation Front – an attitude Google is taking towards users’ data.

Next article is describing open government. On his first day in the office, Barack Obama issued a presidential memorandum ordering federal agencies to make available as much information as possible. The article mentions several books on open government such as Full Disclosure and Wiki Government.

Visualizing massive amounts of information is an important and challenging task. Pat Hanrahan of Stanford University founded Tableau Software which facilitates information manipulation. Valdis Krebs, a specialist in social interactions was once asked to help speed up a delayed project. He mapped the e-mail conversations between various teams and found out that they all communicated through a single manager. Connecting the teams directly was the key to saving the troubled project.

But large amount of information demand lots of energy. This is why big companies such as Google and Microsoft are building their data centers near hydro generating plants.

What information consumes is rather obvious: it consumes the attention. Hence a wealth of information creates a poverty of attention, said Herbert Simon in 1971.

Reading interesting magazines

Saturday, January 30th, 2010

I have been reading mostly Communications of ACM recently. However, I also like Wired magazine and Technology Review but it so happens that if I don’t have a print edition in front of me then I am not going to read it.

In the local supermarket there is a wide selection of magazines. While waiting for the cashier to process your stuff one can choose something interesting. Even though I never intended to read those magazines but this is exactly how I bought TIME magazine and Economist. Btw., there is Newsweek available as well but I thought it was basically the same as the other two magazines.

Because I bought both magazines near New Year they were summarizing the year 2009 and of course their main concern was the global financial crisis. However, the magazines dealt with the issue in two different ways.

Time magazine has traditionally selected the Person of the Year which is Ben Bernanke. Then the magazine explained how US Federal Reserve works and interviewed Ben Bernanke. To do this Time magazine gathered its best writers and editors. Quite traditionally, the magazine attributes each article to a particular author or a number of them.

I bought The Economist magazine because earlier I learned that it is one of the favorite publications of George W. Bush. The articles in this magazine have no authors. The articles themselves are quite short which is more typical for a newspaper. However, their difference is that they do not deal with just one event but rather describe a trend or a development of a news story over the period that the magazine spans or even longer. The coverage of the topics is also quite wide – from Europe to the US to China but surprisingly no mentioning of Russia whatsoever. The Economist explained that the excess of cash that the Fed generated makes another bubble possible. But there are no immediate signs yet.

Thus, both magazines are examples of analytical reading but Time is more into people and organizations, the big players which define what is going on in the world. The Economist describes how people live in various parts of the world thus it is more like a long tail magazine. It is an open question who defines the world history, either the few top guys or ordinary people.