Marathons mashup, continued

Sunday, March 23rd, 2008

As promised, here is a mashup displaying marathons in Europe on a map. I used this web page that lists marathons around the world. However, instead of processing it manually which would take a lot of time, I used a very convenient Yahoo Pipes framework. It allows to build mashups in just under an hour after you learn how to use it. Yahoo Pipes has its tricks which you have to master before you will start enjoying it. I think it takes half a day to play around with a few examples and figure out how it works. After that, it becomes a lot of fun to use.

Yesterday I have created a mashup of marathons in Finland and today I have re-implemented it in Yahoo Pipes. The latter was certainly easier. The goal of writing programs without knowing programming has almost been achieved. Creating mashup manually required knowledge of PHP, HTML. However, to implement mashup in Yahoo Pipes I needed to use regular expressions which is an advanced computer science knowledge.

Source code search engine

Wednesday, December 26th, 2007

We use a search engine on a daily basis. There are special engines out there, for example to search source code. Being a coder, I have never used such an engine previously. Today I found out that searching source code helps indeed.

I was looking for a Netscape cookie file format, so often used in various scripting libraries, libcurl in my case. Whenever fetching a web page the library stores cookies in a file that looks like this:

# Netscape HTTP Cookie File
# http://www.netscape.com/newsref/std/cookie_spec.html
# This file was generated by libcurl! Edit at your own risk.

.google.com	TRUE	/	FALSE	1261681111	PREF	ID=fe8qqqa672708ce5:TM=11qqq09111:LM=1198qqq111:S=TyWPAAvqqqLb50ta

The library has a URL that refers to the description of the format but that page does not exist. So I wanted to find out the format directly from libcurl’s source code. Koders was the search engine of choice. After searching for a while, I found out that function get_netscape_format() performs the desired conversion.

 aprintf(
    "%s%s\t" /* domain */
    "%s\t"   /* tailmatch */
    "%s\t"   /* path */
    "%s\t"   /* secure */
    "%" FORMAT_OFF_T "\t"   /* expires */
    "%s\t"   /* name */
    "%s",    /* value */
    /* Make sure all domains are prefixed with a dot if they allow
       tailmatching. This is Mozilla-style. */
    (co->tailmatch && co->domain && co->domain[0] != '.')? ".":"",
    co->domain?co->domain:"unknown",
    co->tailmatch?"TRUE":"FALSE",
    co->path?co->path:"/",
    co->secure?"TRUE":"FALSE",
    co->expires,
    co->name,
    co->value?co->value:"");

It is easy to match the source code and the example given above to understand the cookie file format.

Again, again, again

Tuesday, November 27th, 2007

Being a mostly C/C++ developer, I was suprised how PHP handles inheritance. Of course, differently from anybody else. Imagine a child class that inherits from a parent. Typically, a child’s constructor invokes parent’s one according to the hierarchy. The interesting observation is that if parent invokes a method, that call goes to the child under construction!
Here is an example code:

class test_parent {
function test_parent() {
$this->func();
}

function func() {
echo “func_parent\n”;
}
}

class test_child extends test_parent {
function test_child() {
test_parent::test_parent();
}

function func() {
echo “func_child\n”;
}
}

$tmp = new test_child();

?>

The output is func_child which is the result of test_parent invoking func(). Here is an equivalent C++ code that prints out func_parent as the child is not available at the time when parent initializes itself:

#include

class test_parent {
public:
test_parent() {
this->func();
}

void func() {
printf(“func_parent\n”);
}
};

class test_child: test_parent {
public:
test_child() {
}
void func() {
printf(“func_child\n”);
}
};

int main() {
class test_child *t;

t=new test_child();
return 0;
}

Me, the Columnist

Tuesday, November 27th, 2007

So it is probably time to start a daily column on a PHP quirk. Call it a PHP quirk a day. Here is another one.

It is well known that certain values can evaluate to NULL, for example an empty string or a zero integer. There is a special comparison operator === designed to deal with that. It takes into account types of variables being compared and returns false if they are different. This is particularly useful when a function returns a mixed type, another difference a rookie PHP programmer needs to learn. For example, when searching for a substring, the function returns false if nothing is found or it might return 0 as the index of the substring. Comparing the return value with false will evaluate to TRUE in the latter case which is not what we wanted.

Using === is a good programming habit. There are cases though when you rely on common sense which tells me that an object cannot evaluate to NULL after being created. This is a wrong common sense. Consider the following example:

class js_incomplete_object {
function js_incomplete_object() {
}
}

$tmp=new js_incomplete_object();

if ($tmp==null) {
echo "null\n";
} else {
echo "not NULL\n";
}
?>

Yes, the answer is NULL. I do not know why. However, once you add a variable to the class, that is write the following in the constructor:

$this->a=1;

the answer becomes not NULL. That’s it for today.

PHP4 is a tricky language

Sunday, November 25th, 2007

I have been using PHP4 for a while and I keep discovering its new tricks once in a while. Most recently, I was trying to pass a reference using a global variable. References is an interesting PHP feature. They are different from pass-by-reference found in common languages such as C++. The manual says it resembles UNIX softlink but this is not exactly true. Leaving this analogy aside, I will describe how references interact with global variables.

Consider the following example. You create a DOM tree and then you use two functions. The first chooses an arbitrary tree element, while the
second uses that element. Let us use a global variable as the means of communication between the two functions:

// global variable
var $dom_el;

test1(); // call first function
test2(); // call second function

function test1() {
global $dom_el, $Doc;

$dom_el = &$Doc->childNodes->item(0);
}

function test2() {
global $dom_el;

// $dom_el does not contain the reference! You cannot use it here.
// I am using PHP 4.4.6
}

If we assign a reference to the desired tree node in the first function then surprisingly the reference will disappear after the first function returns. That is, that reference is not saved in the global variable. Most likely, it is a feature of PHP 4 language. Specifically, I am using PHP 4.4.6. Thus, using a global variable to pass a reference from one function to another does not work.

However, if we save the reference as a field of an object, then the reference is preserved. Therefore, we need to wrap the global variable
in a class:

class Pointer {
var $i;

function Pointer() {
$this->i=null;
}

function set(&$_i) {
$this->i = &$_i;
}

function &get() {
return $this->i;
}
}

?>

Now the code looks like this:

$dom_el=new Pointer();

function test1() {
global $dom_el, $Doc;
$tmp = &$Doc->childNodes->item(0);
$dom_cur->set($tmp);
}

function test2() {
global $dom_el;

$tmp = &$dom_cur->get();
//$tmp has the desired reference
}

Regular expressions continued

Monday, May 28th, 2007

Think of a regular expression to replace \\servername\share with servername\share, that is to get rid of the leading \\. The patterns looks tricky as it has to include double characters twice, that is each backslash is quadrupled:

$text=’\\\\servername\share’;
echo $text.”\n”;

$pattern=’\\\\\\\\’;
echo “pattern=”.$pattern.”\n”;

$data=ereg_replace($pattern, “”, $text);
echo $data.”\n”;

Tricky regular expressions

Monday, May 14th, 2007

I am working with regular expressions in PHP and there are a lot of things to take care of. A few comments from the development log, they are not very clear.

In a few words: preg_match_all() generates a segfault when a regexp
like this is used on a large file:

(a|b)*

The following hack works:

(a*|b*)*.

I am including the code and the URL where you can get the big text
file that makes it crash.

Also, take a look at the following bug reports:

http://bugs.php.net/bug.php?id=41385

http://bugs.php.net/bug.php?id=41235