WIRED magazine advertises science that deals with massive amounts of data

Saturday, August 28th, 2010




New trends in computing appear every now and then. Dealing with massive amounts of data is not new but a few interesting applications that can greatly benefit from our increased processing power are on the horizon. Wired magazine describes these applications in the following two articles that appeared recently:


Sergey Brin’s Search for a Parkinson’s Cure


What You Want: Flickr Creator Spins Addictive New Web Service

The first article deals with the problem of how the new drugs are developed. It turns out that the biggest problem is not whether it is useful or not but whether it poses any danger (or in other words has any side effects) or not. For example, aspirin was discovered in 1899 but it was not until a 100 years later that it was noticed that patients who take aspirin regularly have descreased risk of heart attack. In this case, the side effect was positive. However, in many other cases it is negative. The difficulty of testing a new drug is that it takes a lot of time to establish a strong correlation between taking a drug and a certain change in patient’s body. The problem is that increased body temperature is a possible result of many things – food, outside environment, which people the patient talks to, etc. A more comprehensive monitoring system is needed to take those things into account. However, working with such a multi-dimensional data set is only possible with use of automated tools and requires lots of processing power. This is what Google is good at.

The second article describes Hunch, a system that tries to build a psychological model of you. It asks you a number of random questions, for example whether you believe in alien’s kidnapping or not and then matches your answers to those of other people. Then it can give you recommendations based on what people with similar answers like. The system makes a step further, however. It can try to guess your answers to arbitrary questions based on what people who are similar to you answered. I have tried to allow Hunch to learn a fair amount of my preferences – I answered over 100 questions. After that it started to recommend what seemed like interesting guesses. However, when it tried to predict my answers its rate of correct answers was around 50%. Therefore, the system did not learn much yet. On the other hand, a human being is a lot more complex creature whose model obviously does not fit into 100 questions.

Another system mentioned in the article which I liked a lot more is Aardvark. It uses a unique combination of computing technology: messaging, tagging, social networking, etc. Its idea is simple and I guess many people have thought of it. What if you have a question but you don’t know who to ask. Then most likely you head on to a web forum or a mailing list and ask. The problem is that you have to find an appropriate forum and wait a couple of days to get an answer.

The problem is that Internet does not allow you to find right people instantly. Try typing Who knows Chinese out there in a search engine and see what happens. Aardvark is a kind of social search engine. Once you type your question the system determines its topic and tags it appopriately. Then it searches over its database of users who have indicated that they are experts in this particular area. In addition, Aardvark will check who is online at the moment to avoid sending your question to someone who is possibly on vacation. The expert will get an IM notification from Aardvark asking whether (s)he wants to help. If yes, the expert can type the answer immediately and even chat with the person who asked the question. The whole process is real-time which actually encourages people to ask questions.

I have tried Aardvark both to ask questions and to answer linux questions. In either case the experience was positive. I got answers that were quite valuable. For example, when I asked how to learn Chinese I was given a link to a web site with online language course which even had an iPhone app to facilitate learning. I could not find it otherwise in app store.

Aardvark contacted me through Google Talk a few times. It is actually fun to talk to a robot because this way you can help out real people. I guess this is one cool application of artificial intelligence – robots are helping people to socialize!

WIRED magazine, what it looks like nowadays

Monday, July 5th, 2010

I used to read WIRED magazine a few years ago. Then when I started working I only had time to read what was absolutely necessary – Communications of ACM.

However, this spring I decided to subscribe to Wired magazine again. I was impressed when I got a copy of the magazine – it has totally changed. To start with, nowadays it is half advertising, half content magazine. I guess this is because of the economic crisis. In other words, it is possible to throw away half of the magazine right off the start.

What remains is also not necessarily interesting. I used to like wired-tired-expired section as well as those new pesky words from the underground that Wired was decyphering to the rest of us. Those sections are gone.

But there are few interesting articles still. In June 2010 issue a number of interesting people are mentioned including Nicholas Carr and Daniel Shirky. I have read books of these authors. The magazine advertises their new books which I definitely need to check out. There is an interesting research article as well called Traffic Cop. It describes brave attempt of few people to build a model of traffic in NYC as a giant spreadsheet. By introducing a number of parameters to the model and adding an ability to fiddle with them it becomes possible to optimize traffic in NYC. This work is so necessary for lots of cities with traffic jams!

To summarize, Wired magazine is always fresh and surprising but not necessarily what you want to read. I can only explain this as an attempt to attract more readers, for example causal technology enthusiasts are more interested in space junk and organ transplants whereas computer specialists are appreciating computer technology content. To me Wired looks more like Popular Mechanics nowadays, at least earlier it was more related to geek culture. On the other hand, the definition of geek is also changing.

Unboxing ceremony: my new HP TouchSmart

Wednesday, April 21st, 2010

After years and years of waiting and working on others’ laptops I decided to get my own! It is funny that I bought my previous laptop a whopping 8 years ago when I started graduate school at Stony Brook (I also own a netbook but that is used exclusively when I am on the road).

This is why I decided to get one of the best models available at the moment. I got a 12-inch tablet HP TouchSmart tm2. It has Intel Core 2 Duo SU7300, 4 GB of RAM, 320 GB hard drive, and a Windows 7 with multitouch support.

In fact, it is multiple gadgets in one box: it is possible to use it as follows:

  • an e-book reader because it supports landscape/portrait orientations
  • watching movies is also quite convenient
  • it is a nice development machine with its dual-core CPU and 4GB of RAM.

One of the features that I like the most is hand-writing recognition – it is so accurate nowadays. I only had to go through a small training session in which the computer asked me to write down a number of sentences in my own style. Multitouch gestures is another cool feature which is supported for example in a few games that Microsoft has developed.

I have also installed BumpTop – a state of the art 3D desktop. The only problem with it is that it does not support native input methods, this means that I always had to use keyboard for typing.

While looking for a suitable model at Verkokauppa I have noticed that it offers lots of bigger laptops with screen sizes as much as 17 inch. Of course one can say that 12 inch of TouchSmart is not enough. But this is not the case when size matters. The screen has exceptional brightness which makes it possible and even convenient to use small font sizes. On the other hand, monstrous laptops have so much unused space in the keyboard area. It looks as if I am buying not a laptop but a pad for the coffee cup.

To summarize, TouchSmart is a compact and lightweight (2.1 kg) yet extremely capable device! The only problem that I have experienced so far is learning which touch gesture is doing what – there are so many possible gestures on a multitouch screen! But the time spent learning multitouch is not wasted as this is the technology of the future, undoubtedly.

Obligatory unboxing ceremony images:











Linux Seminar in Oulu 2010 featuring Bjarne Stroustrup

Saturday, March 20th, 2010

A few days ago I went to Oulu, a city in the north of Finland to attend Linux Symposium which was featuring Bjarne Stroustrup, the inventor of C++.

I left Helsinki on Monday night and arrived to Oulu at 7:30 AM on the overnight train. It was so-o-o co-o-old in Oulu! We did not have such freezing temperatures during the whole winter in Helsinki. On that early spring day it was -20 centigrade. Because the train arrived early initially I planned to walk to the Oulu University which is located approximately 5km from the train station. I actually walked there but I froze like I have not frozen for a long time already. What was surprising to me was that local people were walking and even riding bikes normally. Apparently, they got used to such temperatures. In Finland there is a special word sisu which means persistence and stubbornness in a good sense of the word. Now I know that the city of Oulu is the city of sisu – sisu students, sisu workers, sisu everybody.

I barely had time to warm up in the university lobby before the conference began. It had a keynote speech dedicated to the looming C++ 0x standard as well as two tracks: business and technical. Before the conference I have spent lots of time studying the agenda trying to decide which track I want to go to. But there were interesting talks in both tracks. So I needed to remember the order in which I would visit the tracks. It turned out that there is a simple algorithm which tells you which track to go at any moment of time. The idea is that it is best to always switch the tracks, for example if you are listening to a talk in the technical track now then the next interesting talk is in the business track. So I followed this algorithm and I enjoyed every talk that I attended.

But the first was the keynote. Bjarne is a great speaker! He was describing his work in the standardization committee and the features that were selected in the new C++ 0x standard. He said that name of this new standard comes from the year in which they wanted to get it approved – anytime before year 2010, but at this moment the standard is in the Final Draft phase which means that it will get approved in year 2012 probably.

Bjarne pointed out several criteria that they used when selecting features for the standard. Basically, keep it simple was the main criterion. Any extra functionality should go to a library. Keep the run-time as small as possible. One of the goals was to make it possible to use C++ as the first language during teaching in a college. It is an ambitious goal as most of US universities are using Java as the first language. New set of features for writing parallel programs was described. Mostly, it was related to locks, semaphores, etc. and avoiding deadlocks and other problems, as well as inter-process communication. To me it sounds like a pretty low-level stuff. After his presentation I asked whether the committee thinks they’ve chosen the right level of abstraction. Nowadays there are a few interesting parallel programming frameworks such as Map-Reduce and transactional memory. Bjarne said that it is too early to standardize any of those which is probably true.

The funny thing is that the committee does not necessarily accept the features that Bjarne proposes even though he is the inventor of C++. For example, he was trying to get lexical_cast into the standard which is basically string tokenizer. But the committee voted against him because of possible problems with locale. On the picture above Bjarne is trying to persuade the audience that lexical_cast is a cool feature.

Here are the notes from a few other talks:

Sami Paihonen. Implementing cross-platform UI

The core of cross-platform UI is UI style.
Lots of research. Empty screen is the best place to start.
6 design principles: avoid clutter. Too many things on the screen. Two hands is not mobile usage.
IPhone open-source contacts has a better UI than official app.
UI style defines core UI identity
Smoothness and stability are most important. Steven Frei blog.
blog: dizzyhorizon.com

Mikko Välimäki, Tuxera. Open source and IP licensing

This is the guy who won Espoo half-marathon!

Tuxera – is company doing filesystems on non-Windows systems
GNU GPL – free of charge to everyone.
Is it possible to use Android UI on another hardware? Apple is suing HTC for Patent infringement.
Jonathan Schwartz blog. Bill asks royalty for every download of OO b/c of patent infringement.
Microsoft sued TomTom over usage of FAT file system.
Mixed, dual-licensed, open & proprietary models will win.

Alexander Bezprozvanny. Traditional vs agile/open source

different roles that a person takes in multiple team in agile.
key differences in OSS projects: no project managers. Project leaders are models.
Healthy community is the key.
Definition of healthy community, various paths that a project might take depending on how developers interact with users. Nice diagram.

Examples:

1) Too late means never. Affix and bluez bluetooth stacks. Commercial vs. open-source. A company that missed release.

2) High admission price: OpenBSD community. A success at a high price.

3) OSS contribution from software company: bureaucratic barrier too high. Disclaimer of rights is difficult to explain to management.

4) Maemo case: combining proprietary and OSS SW.

Ari Jaaksi’s speach and consequences in his blog.

Research trend of the year: Parallel Computing

Wednesday, December 30th, 2009

So what were those cool ideas this year? In the last few issues of CACM the topic of parallel computing has received lots of attention. Basically, researchers are saying that lots of time and money have been spent on parallel research but most programmers are still writing single-threaded programs or even if they are multi-threaded they do not scale with the number of processors.

Here are the articles on this topic which I found only in three issues of CACM from September through November 2009:

When I noticed the increased attention to parallel computing I started thinking whether I encountered parallel programming before. When I was an intern at Intel I attended an introductory course to parallel computing during which we were implementing standard algorithms such as sorting on a parallel computer using OpenMP. That was in 2001 or so. Since then I saw OpenMP in the literature every now and then until it suddenly disappeared in 2005. All subsequent articles on parallel computing that I read did not even mention OpenMP as a predecessor of whatever new framework they were dealing with. Thus I felt alleviated when I read an article of an independent writer Face the Inevitable. The experiences of that author are very similar to mine. The author explains the lack of attention to OpenMP with its very specific applications.

A couple of years ago another parallel programming framework was extremely popular but its fate was the same – it felt to oblivion. I mean Google’s MapReduce technology or its open-source version Hadoop. The explanation of its current unpopularity is probably the same – the applications are quite limited.

The authors of the Berkely article at least learned the lessons of the previous frameworks. Their article proposes an application-driven approach. The authors consider a number of potential killer applications of parallel computing. They are using a multi-layered approach. The application writer will need to adopt a number of parallel design patterns. Then the developers of the middle ware will create libraries that implement such design patterns. The target hardware on which these libraries are executed are not specified yet. Possibly, it is a multi-processor computer with homogeneous or heterogeneous processors. The authors propose an FPGA architecture to facilitate flexible experimentation.

Besides the lack of parallel killer app, the ideal parallel hardware is also a moving target. So far, success has been achieved only in special domains. For example, Anton is a biological computer which features long pipelines executing specialized instructions that compute forces of interaction among molecules. This is an exceptional architecture because long pipelines are considered harmful for parallel processors in general. Thus, an ideal parallel computer is something that reseachers have not created yet.

To summarize, after a decade of research on parallel computing it is not clear which paradigm the programmers will accept, which middleware they will use, and on which hardware the programms will get executed. We are entering a new decade with lessons learned from previous failures and lots of ideas on how to design an ideal stack of parallel computing. Thus I think that after 5-10 years we will use parallel programming on the daily basis.

Enjoying my Lifetime membership at ACM

Wednesday, December 30th, 2009

This year I have become ACM Lifetiime Member. The idea is that you pay a certain amount of money and your membership continues as long as you are alive. The membership includes subscription to the print edition of Communications of ACM Magazine, as well as various discounts. For example, I have signed up for an unlimited access to Safari Library for basically half the normal price.

The CACM magazine has been transformed a lot during the last couple of years. To start with, most of its articles are now available online. The design of the magazine has been improved. As earlier, there is a central topic for each issue of the magazine and a number of articles that deal with it. In addition, there are a number of viewpoint articles which describe random issues. The magazine has Research Highlights section in the end which contains a number of research papers, either short or full length. Finally, Virtual Extension of the magazine consists of papers that are available online only. Recent issues of CACM include big overview articles such as Turing lectures or the status of P vs NP problem.

As I am working in the industry CACM seems very interesting to me as it sheds light on computer science from an academic angle. Unfortunately, I do not have time to read every article in every issue of the magazine even though I would like to. For example, I never read Research Highlights not to say Virtual Extenion which I even never looked at. When I was a graduate student we were told to read at least 100 papers every year. I am glad though that my ACM membership allows me to get updated on the main research trends even though I do not get full exposure of the details.

CACM was always good at publishing high-level view articles on a certain issue. For example, if it is computer security then CACM describes policies and human-computer interaction issues, not the inner workings of a particular framework. However, this trend is changing. Nowadays CACM includes lots of practical articles similar to IEEE Computer Magazine. Recently, an article analyzing Conflicker worm has been published in CACM. Another example is an article describing Google Web Toolkit which allows writing client-side applications in Java and deploy them as JavaScript.

To summarize, CACM has transformed itself from a purely academic source of information to dynamic resource which balances cutting-edge industrial reports and innovations from the academic world.

My online library

Thursday, June 18th, 2009

As a part of ACM Professional Membership I subscribed to Safari Library with full access to any number of books at a time, as well as roughcuts and videos. When I was a student at SUNY I was subscribed to a starters edition that gave access to any 10 books in a month. In other words, when you added a book to your bookshelf it had to stay there for at least a month.

Now with a more advanced subscription I was able to look at any book of my choice. I was surprised to find out how many books were added to Safari – around 100 during one month.

Here is a list of books that I already added to my bookshelf. Of course, I only glanced trough most of them, but these are the books that I have read, almost:


  • Web 2.0 Architectures, 1st Edition
    By Duane Nickull; Dion Hinchcliffe; James Governor

  • The Google Way, 1st Edition
    By Bernard Girard

The cost saving opportunity is amazing. Given that a book costs approximately 25 USD the 30 books that I have added would cost 750 USD. But the yearly subscription with full access was only 300 USD thanks to my ACM Professional membership.

Book reading: The Google Way

Sunday, May 31st, 2009

Google is the most intriguing company ever. This book sheds light on the reasons behind its unprecedented success and describes lessons that we can learn from it.

Even though I use Google on a daily basis and think that I know this company, my perception of it changed a lot after reading the book. For example, some time ago Google released a product that had a glitch. I got disappointed and switched to a similar competitors offering. In the meantime Google’s product developed into something very useful while the competitor’s stayed where it was. If I had read this book earlier I would have made a better decision.

According to the author of the book there are several forces that steer Google as a company: the triumvirate of executives and user
community. Each of these parts has its share in Google’s success.

The founders challenged many traditional methods of managing a company. To start with, a governing triumvirate is very uncommon. The biggest advantage is that they compensate each other when making decisions. Another difference is how Google went public – the founders used then uncommon Dutch auction model to distribute initial set of shares. The author analyzes the advantages and disadvantages of Google management style.

Over the past, many companies have accumulated devoted user bases. This was achieved in different ways such as more traditional – offering a discount or less traditional elitism. The author of the book analyzes the reasons why so many people admire Google:

  • Using Google is free. However, many people asked to charge a nominal price in exchange of support.
  • Google releases beta versions of its products. It relies on large user community to report bugs and generate improvement ideas.
  • As a company whose revenue is based on online ads, Google has simplified the process of placing an ad. The process is totally automatic – the people bid for certain keywords which are associated with their ads.

Google has improved the way in which customers interact with it. Another important component is innovative environment within the company. Google is different from traditional companies in the following ways. Its HR department has highly variable size. Google fights bureaucracy by keeping the team size small. Instead of asking managers to evaluate the employees, peer reviews are used in Google. Fellow developers evaluate the projects written during the 20% free-time rule and select the most promising. The ability to work on what you like allows experienced developers to advance in the career ladder without being forced into becoming a manager.

In addition to the interesting content, the book has unique style of presenting it. Blogs are cited very often. Using this book I discovered many new interesting blogs.

This book is a bridge to understanding processes going on in the Internet industry. It will help improve your own company or evaluate other Internet companies.

Book reading: Productive Projects and Teams

Thursday, April 16th, 2009

I have read a very interesting book titled Peopleware: Productive Projects and Teams. The two American authors seem to ruin the American approach to managing computer industry entirely. As consultants they have participated in a number of projects from those including only a few people to large-scale teams of thousands of people.

This book is a call for change. Written 20 or so years ago it starts with a tempting statements that American managers are not good at all. I guess this is an easy way to make the book an Amazon hit but otherwise it puts a grain of salt for the rest of the book. I have been reading this kind of books earlier, for example Noam Chomsky books on politics.

The authors of this book provide a number of arguments to support their claim. In particular, they explain why the rank and file developers are unhappy and how to make them happy. I am likely to trust their claims because they have conducted numerous sociological studies in various companies.

They say that quality is an indispensable attribute of developer’s happiness. As a developer, you are happy with your project only if its quality exceeds that of what you have developed in the past. However, the quality is often added to the project if time permits. I would say that there are a number of things that make developer happy including quality. But the authors seem to focus exclusively on it. On the other hand, they provide an easy solution to improve quality. Citing a famous author of the past, quality is free if you are willing to pay for it. Is not it amazing?

There is a saying that you should do your homework well because you cannot afford to re-do it after the teacher points out your errors. In other words, getting quality right increases your productivity. Therefore, this claim seems true. As a final argument, the authors ask for a country with highest quality and highest productivity. Their answers were Japan and Japan which totally agrees with their theory. My answers were Germany and China.

I would say that quality affects productivity if there is a teacher which you cannot trick into thinking that you are good. But in real life the customers are easily tricked into buying products that are not worth the money. This is capitalism with all its tricks. After discussing this book with my colleague Sami Raivio I have realized that this is how US works: they generate a product of average quality but with few striking features, then advertise it like crazy. After people get bored with the initial product they offer upgrade with a discount. Therefore, they keep their customers around with a chain of constant upgrades.

So to me it looks like authors are trying to undermine the foundations of capitalism and to advocate socialism. This will infuriate those people who admire the US way of living but take a look at present day’s US economy. With all the government intervention it starts to look more and more like socialism. It was predicted that in the 21 century the governments will play a bigger role in economics. The romantic 20th century during which a single generation of people accumulated unprecedented wealth seems to end.

Food for thought

Thursday, March 5th, 2009