From C/C++ to PHP
Author: Alexey Smirnov
Please e-mail your feedback to ale…@gmail.com.
PHP is an example of a language of a new generation. Designed with the ease of use goal in mind, it is significantly less strict than C/C++. It supports dynamic typing which allows to assign values of different types to the same variable throughout a program. There are no pointers in PHP. Instead a mechanism of references is used. As of object-oriented features of PHP, they are inferior to those of C++, at least in PHP 4.x which we discuss in this tutorial.
We do not consider a more recent PHP 5.x branch as most installations on production web servers use PHP 4.x due to its proven dependability.
In this tutorial we will discuss pitfalls that a typical C/C++ developer is likely to encounter when starting coding in PHP. If you are unfamiliar with PHP at all we suggest that you follow links to PHP manual provided in this tutorial and then come back to find out what you should take care of.
We will start with comparison operators. In PHP, a function can generate mixed result type. For example, function strstr that finds first occurrence of a substring in a string might return NULL if the substring is not found or 0 if the string starts with the substring. In C/C++, it is impossible to distinguish between 0 and NULL. Type casting is different in PHP when applied to objects.
There is pass-by-reference mode in C/C++. In PHP, references are used instead of pointers but they are not as powerful and there are cases when you cannot use them. We will discuss those special cases.
Object inheritance is available in PHP 4.x. However, function overloading implementation is different from that in C/C++.
Finally, we will discuss regular expressions. Most programmers are familiar with them even though they are not available in C/C++. However, theoretical knowledge is often not enough. There is Perl regular expression support in PHP. Learning how to use regular expression in one scripting language will help you use them in any other scripting language.
It is well known that certain values can evaluate to NULL, for example an empty string or a zero integer. The mechanism of type casting makes NULL, an empty string, and integer 0 appear the same. In PHP, a function can generate results of different types. An example of such a function is searching a substring which will return NULL if substring not found or it might return integer 0 as the index of the occurrence. When comparing the return value of this function in an if() expression using traditional == operator, the NULL and 0 will look the same.
There is a special comparison operator === (PHP Manual) designed to deal with that. It takes into account types of variables being compared and returns false if they are different. In our example, if we compare the return value with NULL using ===, then integer 0 will not match with it.
Using === is a good programming habit. It prevents you from bad things type casting can do. There are cases though when you are absolutely sure that the type casting will not hurt you or when you want to leverage type casting. If that is the case, you should use ==. However, type casting as it is implemented in PHP 4.x treats certain objects as NULLs even after they have been created using new operator.
class js_incomplete_object { function js_incomplete_object() { } } $tmp=new js_incomplete_object(); if ($tmp==null) { echo "NULL\\n"; } else { echo "not NULL\\n"; }
Yes, the answer is NULL. I do not know why. However, once you add a variable to the class, that is write the following in the constructor:
the answer becomes not NULL.
References (PHP Manual) are used instead of pointers, even though they are not the same. Often, they behave like UNIX symbolic links. However, there are situations when you cannot use references. For example, you cannot assign a reference to a global variable inside a function.
Consider the following example. You create a DOM tree and then you use two functions. The first chooses an arbitrary tree element, while the second uses that element. Let us use a global variable as the means of communication between the two functions. It is supposed to pass the reference from one function to another.
// global variable var $dom_el; test1(); // call first function test2(); // call second function function test1() { global $dom_el, $Doc; $dom_el = &$Doc->childNodes->item(0); } function test2() { global $dom_el; // $dom_el does not contain the reference! You cannot use it here. }
If we assign a reference to the desired tree node in the first function then surprisingly the reference will disappear after the first function returns. That is, that reference is not saved in the global variable. Thus, using a global variable to pass a reference from one function to another does not work.
However, if we save the reference as a field of an object which is a global variable itself, then the reference is preserved. Therefore, we need to wrap the global variable in a class:
class Pointer { var $i; function Pointer() { $this->i=null; } function set(&$_i) { $this->i = &$_i; } function &get() { return $this->i; } }
Now the code looks like this:
$dom_el=new Pointer(); function test1() { global $dom_el, $Doc; $tmp = &$Doc->childNodes->item(0); $dom_cur->set($tmp); } function test2() { global $dom_el; $tmp = &$dom_cur->get(); //$tmp has the desired reference }
Basic object-oriented concepts such as encapsulation, object inheritance, and function overloading are available (PHP Manual). There are no protection mechanisms such as public and private in C/C++, but the good news is that they are available in PHP 5.x.
Function overloading implementation is different. Imagine a child class that inherits from a parent. Typically, a child’s constructor invokes parent’s one according to the hierarchy. In PHP, the parent is not invoked automatically, that is, you have to do it manually in the child’s constructor. Another interesting observation is that if parent invokes a method, that call goes to the child under construction!
Here is an example code:
class test_parent { function test_parent() { $this->func(); } function func() { echo “func_parent\\n”; } } class test_child extends test_parent { function test_child() { test_parent::test_parent(); } function func() { echo “func_child\\n”; } } $tmp = new test_child();
The output is “func_child” which is the result of test_parent invoking func(). Here is an equivalent C++ code:
class test_parent { public: test_parent() { this->func(); } void func() { printf(”func_parent\\n”); } }; class test_child: test_parent { public: test_child() { } void func() { printf(”func_child\\n”); } }; int main() { class test_child *t; t=new test_child(); return 0; }
It that prints out “func_parent” as the child is not available at the time when parent initializes itself.
Many programmers are familiar with regular expressions. However, problems emerging in practice often require going beyond theoretical knowledge. There are a number of extensions to the traditional language of regular expressions, such as Perl extended patterns. We will not consider them in this tutorial but at least find out what the possibilities are.
Instead, let us consider a easier example. Try to generate a regular expression to replace \\servername\share with servername\share, that is to get rid of the leading \\. The patterns looks tricky as it has to include double characters twice, that is each backslash is quadrupled:
echo $text.”\n”;
$pattern=’\\\\\\\\’;
echo “pattern=”.$pattern.”\n”;
$data=ereg_replace($pattern, “”, $text);
echo $data.”\n”;
The reason it is quadrupled is because this string is processed twice. First, when you include it in “”, each special sequence is converted into a special character, just like in C/C++. For example, \n gets converted to a newline character. In our example, each sequence of \\ gets converted to \. There is another step. The regular expressions analyzer also looks for sequences of special characters. Of course, this language includes additional sequences such as choice [...], etc. In our case, each sequence of \\ gets converted into \. That is, an original sequence of four backslashes narrows down to just one backslash.
A final observation is that rewriting regular expressions manually often allows you to avoid infinite loops. It is not quite clear why the following trick helps, but at least that means that hacking regular expressions is worth the effort.
The actual example is more complex than this, but in a few words preg_match_all() generates a segfault when a regular expression of the following kind is used on a large file:
The following hack helps get rid of the indefinite loop:
To summarize, you need to do a lot of tweaking to make PHP do what you want it to do, but the effort is worth it in the end. Good luck with your PHP projects.