Optimizing web pages

Sunday, February 7th, 2010

While working on Timeline Builder I noticed that loading a timeline takes a while. This is because Timeline is implemented as lots and lots of Javascript and even though it is minified and all files are concatenated into one the result is still quite big.

Compressing your scripts is a low-hanging fruit in optimizing the size of your web pages. But popular hosting providers don’t implement this feature by default, quite surprisingly.

One low-hanging fruit is to use compression when delivering the files from your web site. Quite surprisingly, this feature is not available by default if you host your web site on popular hosting service such as GoDaddy. But searching on the Internet allows one find out how to do it. OpensourceTutor is a blog with lots of interesting content.

This post describes how to use Apache for our purposes. Basically, it depends on the version of Apache. For 2.x version you have to implement the following .htaccess:


<ifmodule mod_deflate.c>
SetOutputFilter DEFLATE
</ifmodule>

This approach works with hosting provider Logol where I have one of my test sites. However, GoDaddy uses a different version of Apache and another approach is needed. I have tried replacing mod_deflate with mod_gzip but this did not work out.

The approach suggested at OpensourceTutor is to use the Apache rewrite engine. When an HTTP request arrives Apache will check whether a file with the same name but with a postfix .gz exists and if yes then it will serve the compressed file. Therefore, the web developer needs to upload the compressed version of each file that he wants to optimize. The .htaccess code looks as follows:


RewriteEngine on
RewriteCond %{REQUEST_FILENAME}.gz -f
RewriteRule ^(.+) $1.gz [QSA,L]

There is a third approach explained here in detail. It works for pages written in PHP. The basic idea is to prepend <? ob_start("ob_gzhandler"); ?> to your pages and that will turn on compression.

These are only few ideas on how to improve your web pages. There are automatic tools that analyze your web pages and make suggestions, most notably YSlow from Yahoo and Page Speed from Google. After reading their reports I learned that there are lots of ways in which I can improve my web site, including

  • Adding expires headers
  • Using entity tags
  • Minifying CSS and Javascript

These ideas mean that I need to maintain two versions of my web site – one which is publicly visible and optimized with compressed and minified scripts and another with uncompressed content which is convenient for development. Then I will need a script that converts the development version into production version. Even though such development configuration seems natural I have not found any tools that facilitate its implementation on popular hosting sites.

Page Speed can generate nice graphs as the one shown above. Each track represents one HTTP request. After I added compression the length of the tracks has been reduced, therefore the results of your efforts are easily measurable.

These two extensions demonstrate the flexibility of Firefox. I have noticed that people achieve glory these days not through a large software development effort such as an operating system or a word processor but through a little application such as a plugin. Typically, plugin runs on a bigger platform along with thousands of other plugins. I have read in Economist that last fall a pizza-tycoon kind of game running on Facebook has attracted several million gamers in a couple of month – an unimaginable rate for the gaming industry alone. But social network makes it possible.

Anyway, there are plugins that are so successful that people build their plugins on top of them. PageSpeed and YSlow are examples of this type of plugin because they are built on top of another Firefox plugin Firebug. Another example of plugin used to develop other plugins is Greasemonkey.

Adding features to SIMILE Timeline

Sunday, January 24th, 2010

I have become interested in adding features to the original implementation of SIMILE Timeline. I have a feeling that it has great potential but despite the years it is been available it did not become widely accepted. In fact, it appeared on a number of cool websites including those of educational institutions, government, etc. However, these are more like showcases of a cool technology whereas I think timeline is something that everybody could use on a daily basis. Given the amount of adoption of social networks one could try to put a timeline there to display your friends activity. In fact, Timeline is available as a plugin for WordPress and I should say it is a very cool thing with lots of bells and whistles but for example there is no such thing as Timeline Google gadget.

One reason of the lack of adoption is the difficulty of sharing. The original implementation of Timeline provides a JavaScript API. In order to use Timeline on your website you have to write approximately 100 lines of JavaScript code which is an unacceptable barrier in many cases. Even if this code is generated automatically and you offer it as copy-and-paste to a user, the code snippet which is 100 lines long is just too much. What is needed is a short solution in a form <script>bla-bla-bla</script> and this is one of the features that I have added.

But there is another, more difficult problem. If you want to visualize RSS feeds that do not belong to your website then you’d run into Ajax same domain policy issue. In fact, I have been running into this problem earlier but then the solution was to implement a PHP proxy that would read the desired RSS feed from another web site and forward it to you. This approach worked fine as long as Timeline stayed exclusively on my site but now we need to share it. Therefore, if a user puts Javascript code on his/her website and this code tries to access an RSS feed through Ajax then it would also need to use a PHP proxy. But PHP is a server-side technology. It is not possible to upload a PHP file to a social network or your iGoogle home page. Therefore, another solution is needed.

I have searched Internet and found a few interesting approaches to the same domain policy problem. One of them is to use AJAX through Flash because apparently Flash is less restrictive. But I was unable to get this approach to work, obviously due to security issues. It looks like the destination site of your AJAX request needs some sort of modifications whereas in my case the location of RSS feed that I want to fetch is totally arbitrary.

In the script tag you do not have to specify actual .js source. You can use any source, for example a PHP script! Just make sure that it outputs Javascript. In other words, if it looks like a duck, quacks like a duck, then it is a duck!

This problem looks totally intractable. But the power of Javascript always amazed me. This time <script> tag came to rescue. Typically, when you specify the URL of the script it is ending with .js, for example <script src="myscript.js"></script> but the funny thing is that you don’t have to! In the src attribute you can specify anything, for example a PHP script that outputs Javascript like this

<script src="myscript.php"></script>

Just make sure that your PHP script outputs Javascript code. The rest is fairly simple: think of the PHP proxy that reads an arbitrary RSS feed and wraps it into Javascript. The PHP script should output a Javascript variable assignment with the RSS feed at the right-hand side of the assignment expression. Then we can use another extremely useful feature of Javascript which is its built-in XML parser:


var xmlDoc=document.implementation.createDocument("","",null);
xmlDoc.load(myrss);

The details of this approach are descried in this OReilly article.

With this technique taken into use it is possible to share a Timeline with arbitrary RSS feeds! The next step is to build a number of plugins, widgets, and gadgets for all kinds of social web sites! Such a bright prospective for the Timeline!

Timeline Builder – my new project

Thursday, December 31st, 2009

Only hours before the New Year 2010 I decided to describe the project that I have been working on this year. In fact, I wanted to summarize my efforts during the whole decade as I started my career in computer industry in year 2000 but I would rather do it later. Now I will present Timeline builder, the project that I have been working on for almost a year.

I started implementing Timeline builder in the beginning of this year. The idea was to allow to display an RSS feed on a Simile timeline. Surprisingly, original implementation of Timeline does not allow to use RSS directly as the event source, either JSON or XML. In order to implement this idea I have written a converter from RSS to JSON using PHP. Then I was using a database to store the URIs of the feeds that a user wanted to display on a timeline. That is, mixing RSS feeds was a required feature from the beginning. That initial version of Timeline builder completed in June was working.

However, after I read a few books on JavaScript I reazlied that it was possible to improve the implementation because RSS is supported natively in JavaScript. Since Timeline is written in JavaScript it was very easy to use RSS as an event source. Indeed, after loading the feed using Ajax it became available as a JavaScript object. This object even has a special field responseXML and because RSS is a subset of XML it is returned in this field as a result of Ajax request. Thus, it was actually very easy to add support of RSS in Timeline.

Because of that, I have extended my Timeline builder with a number of features including:

  • Vertical timelines
  • Icons of arbitrary size. The traditional timeline supports only 20×20 icons which are too small.
  • Labels of arbitrary width and height

After implementing the support of RSS as a JavaScript object I was able to get rid of database. I realized that it is possible to store the data as either JavaScript or in JSON format. These features allow to implement next-level databases, the web databases. I have read an article in CACM that was saying that traditional relational model is not acceptible in many cases for a number of reasons. Thus, using a web database in JSON format is a step forward. I am thinking now that I should implement a web service based on the timeline which would make my web site Web 2.0 compatible.

Timeline builder is still under construction. But I have already added a timeline showing my blog posts to the front page of my website!

Google Charts Builder

Friday, January 9th, 2009

Internet offers lots of information, above any reasonable amount for a normal human being. Thus, consuming information is not that easy. Researchers found out that there are two types of people – those who prefer web pages with few highlights and those who get excited when they see lots of numbers, possibly visualized.

In other words, lack of detail is not always what people are looking for. Personally, I get excited when I see lots of detailed information. Then I bookmark such a web page with the idea that I will visit it again to read it carefully but that happens rarely. I will revisit a web page if it is really important. This is why I like semantic web – it has lots of inter-linked structured information. On my web site I have lots of information represented as charts. Most of it is hidden in a protected zone, for example financial information.

In year 2005 I read an article in IEEE Spectrum. It described the idea of Gordon Bell who worked on MyLifeBits project. The idea is to record all possible information, everything that happens to you throughout the day using a camera. I am trying to implement this idea without a camera, though.The example above shows the number of visits to swimming pool per month.

I need a tool to visualize vast amounts of information. I want to try out different representations – bars, lines, circles, etc. and I need this charts on the web. There are lots of good drawing programs but they are not web-ready.

Google charts bridge the gap between information visualization and web. All the data and metadata are encoded in the URL. Of course, building such URLs manually is not easy. Thus I needed a tool to build charts quickly with different parameters.

I have searched Internet but what I found were tools that asked lots of questions. Instead, give me a chart that is almost ready, I will plug in my data and get the result. In other words, we need an
example-driven tool.

Here is Google Charts Builder. A sample chart is displayed as soon as the page is opened. I tried to make it as nice as possible. Not any type of the chart is supported yet, I will add them as necessary.

I have used this tool to build lots of charts and it allowed me to experiment with scale, chart type, labels, colors, etc.

There are tools for other Google APIs, for example Google Maps builder.

Web APIs are very powerful

Monday, December 22nd, 2008

While implementing Weather forecast on the homescreen of a Nokia device I faced a problem of inserting entries to Google calendar. I have thought of a number of approaches including using Twitter app called TwitterCal that connects the calendar with a Tweet user. In fact, this is quite an interesting application. Implemented as a Twitter user, it receives your messages in a quick add format and inserts them to your calendar. Unfortunately, the application is defunct. Thus the idea was to minimize the development effort. Now I am thinking of a pipeline of Twitter robots. One robot generates weather forecasts. It is connected to the TwitterCal which delivers the weather forecasts to your Google calendar.

But I could not avoid programming. Thus I started using Google Calendar API. As I am used to programming in PHP I wanted to use PHP API. However, that is quite a huge package and it uses Zend, I needed a simpler implementation. Thus, I read the documentation and decided to use good old curl to interact with Google servers. With a minimalistic API allowing only creating and deleting entries I implemented the first version of my project.

I need to deliver weather forecast for any city. Thus I have a search box in which the user enters the desired location. I am using BBC Weather site to search for the location. Here is what the result looks like when I enter New York:

Weather NY

This is not appropriate for a script. In fact, there is a weather forecast just for New York on BBC Weather but there is no direct way of getting there, only through clarifying questions. One idea is to make the script click on the first link on the above page but I have thought of a more elegant solution – use Google Search API!.

We are using Google search on a daily basis. Have you ever noticed that the results are available only in HTML format but not in RSS? Thus, automating search was always a problem. Not anymore, though. Google Search AJAX API allows you to launch queries from within your webpage using Javascript. The idea is that there is an edit box on a web page in which user enters a query and there is a DIV element on your web page in which the results are inserted. Google search is much more robust, it does not ask clarifying questions. If you search for BBC Weather New York the first result is the desired page with NY weather forecast. I am feeling lucky!

When a user of our project enters city name there is no need to display all possible results. We only need the URL of the first result. Thus the trick is to hide the output window to which the Google Search API inserted the results. After the results have been added to the hidden window and the first link has been extracted, we will show that URL on our web page as if we found it on our own! The whole power behind the Google search is neatly hidden on our modest web site. That’s the power of web APIs.

There is a lot of work going on right now on creating new APIs. Their capabilities are limitless. I am thinking of allocating a time slot every weekend to study web APIs that popular web sites expose. A Web API allows you to leverage great powers of those web giants when designing your own web page. You have total control on the look and feel of the results, the external sites shape the data only.

Today I found out that Nokia has adopted the idea of implementing custom homescreen widgets! Here is a contest which goal is to select the best homescreen widget for an upcoming Nokia N97 device. Should I submit Weather on the homescreen there once again?

Weather forecast project

Friday, December 5th, 2008

I have submitted the project to Forum Nokia contest. While working on the project I learned a few interesting techniques. I was surprised that such a simple idea would inspire additional project ideas. In particular, after completing the project I got interested in exploring the following directions:

  • Web search. In the project it was necessary to get weather forecast for any location that user might specify. BBC Weather was the source of information as it has nice RSS feeds with 3-day forecast for any place. There is a form in which user enters city name and then clicks Submit button. The BBC Weather website would then search its database of locations and give the user the forecast for the place that was requested. However, we needed to use the form automatically. Obviously, when Submit button is clicked, a certain GET/POST request is executed. Thus filling in the form automatically was not a problem at all. However, the search mechanism of the website was trying to deliver its best, it asked clarifying questions which was not acceptable for artificial intelligence of a PHP script. For example, when entering New York the BBC search engine asked whether the user meant New York City, or New York airport, or whatever else. But this was happening only for a big city such as New York. In most cases the search delivered the required information without asking any questions. How would one adapt a PHP script to using a web form that asks clarifying questions?
  • Creating screencasts and uploading them on YouTube. I am using Ubuntu and I wanted to make a screencast with text narrated. I quickly discovered Istanbul – a program for creating screencasts. It can capture the whole screen or a certain window. However, a screen of YouTube player is smaller than a screen of a Web browser which I needed to record. I decided to show the global view of a website and then zoom-in on certain areas of the screen. Thus, I needed a video editing software that would allow me to zoom-in on an area of a video. Open Movie Editor was the software that I was able to install. I tried a bunch of programs but they all had library dependencies which Ubuntu package manager was unable to resolve. Also, I narrated description that I needed to stitch to the video. And finally I needed to make sure that the output of video editing software is acceptable for YouTube.

I will describe how I solved each of these problems within next few posts. In general, I found out that each problem was a lot of fun to tackle and that it developed skills necessary in general. For example, while implementing a script that used a search engine I figured out how to use web APIs. These mechanisms are the future of the web which will consist of web applications. Semantic web that operates on structured formats will also make heavy use of web APIs. Automating search might improve the penetration of deep search which is at the moment has very basic capabilities.

I noticed how important it is nowdays to accompany your project with a video. Internet users are short-attention people. They visit a web site, look at it, and then switch to the next web site within a minute. Very few people are able to grasp the text-only information within such a short time frame. This is why multimedia content such as images and video is added to the web. While reading blogs I found out that I like those blogs that have a shiny image or a video. There is a buzz on the web that personal blogging is dead because of commercial bloggers that deliver information in larger quantities. When was it ever that quantity dominated quality? Any personal blog that delivers attractive content will become popular. The problem is that individual bloggers are writing text-only entries. They should include images as well as video. But they need tools in order to do that. In my video story I am going to describe a long chain of tools that allowed me to generate a YouTube video. I spent one week to generate one-minute video.

Weather forecast on your mobile device

Wednesday, November 12th, 2008

I am opening up a project that I was working on in my spare time during last month. It is called Weather forecast on the homescreen of a Nokia mobile device.

The homescreen of a mobile phone is designed to list upcoming meetings and to-do notes. However, there is extra space that is often empty. It is often necessary to get certain information such as weather forecast. Placing it on the homescreen allows to access it immediately.

The idea behind the implementation is to add an RSS feed with weather forecast to user’s Google calendar. After that, the user will use a synchronization software that inserts the entries to the mobile’s device calendar. They appear as to-do notes which are displayed on the homescreen automatically.

As of now, the project generates weather forecasts only. But it is possible to display any type of information such as train schedule, stocks, top stories, etc. using this approach.

http://weather.alexeysmirnov.name

Ruby ported to Symbian OS

Monday, November 3rd, 2008

I have installed lots of applications on my Nokia phone. Nowdays I cannot think my life without a few of them such as Qik and Mobile TV.

Developing for a mobile platform was always different from desktop programming. There has been a number of efforts to facilitate transition from desktop to mobile programming, for example scripting languages. Python has been ported long time ago to Symbian. Now Ruby is available as well.

The download includes binaries – the Ruby engine, the frontend, and examples. There is an example that captures a video and an image – very similar to what Qik does. I am now quite convinced that with knowledge of a scripting language mobile programming is not that difficult. It is only a matter of choice whether to use Python or Ruby.

It so happened that I have not developed in either of these languages. When I am developing for the web I am using PHP and JavaScript. Obviously, PHP is the most popular language, lots of web software is written in it, each hosting provider allows to use PHP on their web site. It looks like Ruby is increasing its share in the scripting languages market at a very fast pace. Therefore, I should learn Ruby to develop for desktop, web, and mobile altogether. However, there is lots of interesting applications in which Python is necessary. For example, Google Apps uses Python.

It is funny how essential these scripting languages are nowdays. They are modern equivalent of UNIX command line tools such as sed and awk. Being able to program in those languages is an essential attribute of a geek.

Book reading: Programming Collective Intelligence

Tuesday, October 21st, 2008

Programming Collective Intelligence is a visionary book in the sense that I think it predicts a lot of what will happen to the Internet soon. I have been thinking and blogging a few times on how we process information in the Internet age. Instead of reading magazines and newspapers we should use blogs as our source of news. The main reason is that blogs offer much more customized news feed. In a typical newspaper, how much of its content is of interest to a reader? I guess half is a big value but typically it is less than that.

I start my working day with consuming two sweet drinks. One drink is a cup of coffee made by Mocamaster – a trendy brand of coffee makers priced as much as 1,000 Euros. Yes, you can get a coffee maker much cheaper than that but Mocamaster delivers its promise – the coffee is really tasty. Another morning drink is a virtual information soup made of 100 blogs. I glance over most of the stories quickly and select those that I am interested in. I might read them in greater detail later on during the day, in the evening, or on a weekend. I do not know which drink gives me more pleasure – the delicious Mocamaster product or sweet virtual soup. I like the latter a lot because it is rich with media content – with bright images, cool videos, wow-type web pages.

However, I often discover news that I wish I found out earlier. In other words, there are so many news sources that reading them all or just looking at the headlines of major blogs will take too much time. We need targeted information delivery service.

This is the main idea of this book. In fact, it starts with explaining how to make recommendations given a set of preferences of a number of people and your own preferences. What are those cool things that you have not tried out yet but everybody else did? The example described in the book is applied to Delicious which does not offer recommendations yet. In the wild, such a system has been implemented in Digg and in Google Reader. I found out that the recommended blogs are quite relevant.

I often try to decide what my interests are. The blogs that I am reading might answer this question if one builds groups of them. In fact, I have done this manually, but I found out that this categorization is not perfect. The book answers this question in Chapter 3.

After that the book deviates into a number of additional topics such as search, neural networks, discrete optimization. The author Toby Segaran has a great ability to explain difficult concepts using simple words and pictures. As most of the stuff was familiar to me I was wondering how easy a new concept seemed and how much time I spent originally understanding it.

After that the main melody of the book is there again – the next chapter explains how to filter documents, for example to decide if a particular news story is interesting to you or not. Then the book deviates again into decision trees and building price models and even matching people on a dating site. However, there comes our melody again – this time it explains how to extract trends from a lot of news sources, that is decide what people are discussing today. This feature is similar to Google News except that the user has no control of news sources.

I was surprised when I found out that Python is such a popular language in a scientific community. The book describes lots of libraries dealing with numerical data or displaying various charts. The book will serve as a great introduction to Python language even though there are lots of introductory books available. In fact, learning Python this way it easier and more enjoyable.

After reading the book I definitely want to try out the tricks explained there and improve my information soup. This book is my virtual cookbook.

Cell phones in medical imaging

Tuesday, May 6th, 2008

Prof. Rubinsky and his team came up with the novel idea of physically separating the data acquisition hardware, the image processing software, and a monitor so that the most complicated element – the processing software used to reconstruct the raw data into a meaningful image – can reside at an offsite location.

UCBerkeley News