This year, as part of my annual trip to Canada and the USA, I’ve been asked to give two talks at the annual PHP Québec conference in Montréal. I haven’t been back to that city since 1993 when I graduated from University, and it will be interesting to see how it goes. (Although I suspect that while Beijing basks in nearly 20C (nearly 70F) weather every day and even Seattle and New York were closer to 10C (50F), Montréal is still hanging below freezing most days and has over a metre of snow on the ground).
I will be giving talks on internationalisation (commonly just called i18n) and giving your database servers a break with memcached. If you’re anywhere in the neighbourhood, come on by for some good fun. I’ll be getting back to regular programming content this weekend.
The slides for my presentations are here:
[Sorry for the delay in fixing the memcached link – I have had a severe flu for the last few days. It should be okay now.]
Imagine, if you will, the following scenario:
What you might not have noticed, especially if you – like me – have a few thousands rows of data, is that MySQL might have screwed you along the way and not really told you all that clearly.
After a few days of operation on my live server, I started to notice a few weird things—foreign keys weren’t being enforced properly, and there were some values in the database that probably shouldn’t have been possible. I furrowed my brows and put it on my list of stuff to investigate.
Well, yesterday, I added a new table to the database, and it went something like this:
mysql> CREATE TABLE Fudgecicles
>(
> id INTEGER AUTO_INCREMENT PRIMARY KEY,
> value VARCHAR(255) NOT NULL,
>)
>ENGINE = InnoDB;
Query OK, 0 rows affected, 1 warning (0.10 sec)
Where did that warning come from?
mysql> SHOW WARNINGS;
It is here that MySQL tells me:
+---------+------+-----------------------------------------------------+
| Level | Code | Message |
+---------+------+-----------------------------------------------------+
| Warning | 1266 | Using storage engine MyISAM for table 'Fudgecicles' |
+---------+------+-----------------------------------------------------+
Augh! No! Bad! Bad, MySQL, Bad! Why on earth would it do that? I didn’t misspell InnoDB or even use “incorrect” casing in the name. There’s nothing wrong with the schema I specified and I’ve done this hundreds of times before.
Well, after some research, I then tried the following:
mysql> SHOW ENGINES;
And MySQL helpfully gave me the following:
+------------+---------+----------------------------------------------------------------+
| Engine | Support | Comment |
+------------+---------+----------------------------------------------------------------+
| MyISAM | DEFAULT | Default engine as of MySQL 3.23 with great performance |
| MEMORY | YES | Hash based, stored in memory, useful for temporary tables |
| InnoDB | DISABLED| Supports transactions, row-level locking, and foreign keys |
| BerkeleyDB | NO | Supports transactions and page-level locking |
| BLACKHOLE | NO | /dev/null storage engine (anything you write to it disappears) |
| EXAMPLE | NO | Example storage engine |
| ARCHIVE | YES | Archive storage engine |
| CSV | NO | CSV storage engine |
| ndbcluster | NO | Clustered, fault-tolerant, memory-based tables |
| FEDERATED | NO | Federated MySQL storage engine |
| MRG_MYISAM | YES | Collection of identical MyISAM tables |
| ISAM | NO | Obsolete storage engine |
+------------+---------+----------------------------------------------------------------+
12 rows in set (0.00 sec)
The InnoDB database engine had been disabled somewhere along the way and I hadn’t even noticed. It is enabled, by default, on the standard Linux and Mac OS X MySQL binaries that I’ve been downloading. So something changed along the way that made this all stop.
Far worse, I then began to worry about my existing tables, all of which we supposed to be InnoDB. Upon executing the following for each of them:
mysql> SHOW CREATE TABLE Fudgecicles;
I would see something like:
| Table | Create Table
| Fudgecicles | CREATE TABLE `Fudgecicles` (
`id` int(11) NOT NULL auto_increment,
`value` varchar(255) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
Sure enough, every single one of my tables was MyISAM, and not the InnoDB it was supposed to be.
The problem turns out to be that InnoDB is somewhat finicky about my.cnf settings, and will frequently refuse to operate if settings change in this file in such a way that makes any existing data incompatible with the way it would write new data.
In my case, it turns out that changing the settings for the InnoDB binary data and log file sizes were somehow incompatible with the existing binary data and log files (ibdata1 and ib_logfileX). Upon starting the server, InnoDB finds this inconsistent state and simply refuses to start up. This is the first problem.
The second, and far more serious problem, is that MySQL just switches the database persistence engine on you and only provides a little warning. If you’re loading in thousands – if not tens of thousands – of rows, those warnings are easily lost in the scroll-a-thon that ensues. This is bad behaviour. MySQL should simply refuse to create your table if your selected persistence engine is not available.
Fixing this problem involved three parts. The first, and easiest, is to get the InnoDB engine back. You have two choices:
I chose the second method.
| WARNING: Using this second method can result in data-loss if you’re not careful. Some database engines store things in the binary data and log files before writing them to the actual table files, and if they’re deleted, you might lose those changes. I only selected this path because all my tables were MyISAM, I had a full backup, and spent a good 10 minutes after deleting the files verifying that all data were correctly restored. |
You then stop and restart the MySQL database engine, and InnoDB will be back. You can verify that all is well in InnoDB land by executing:
mysql> SHOW InnoDB STATUS;
And you will receive a gloriously long and detailed set of information on how things are going.
Part two of the process involved converting my tables back to InnoDB.
mysql> ALTER TABLE FishSticks ENGINE = InnoDB;
This proves to be tedious and time consuming, as you cannot simply go from table A to table Z doing the conversion – because of the foreign keys and references, they have to be done in the right order. When one of the ALTER TABLE statements fails, you can find out what happened by re-executing the SHOW InnoDB STATUS command – it will tell you why it wouldn’t convert the table to InnoDB.
But, eventually, I got them all done. It was then that I noticed that none of the foreign keys were set up properly any more.
So, the last step of the process is to re-establish the FOREIGN KEYs. I did this by, for each foreign key in each table I had, executing the following commands:
mysql> ALTER TABLE FishSticks DROP KEY [dead foreign key name];
mysql> ALTER TABLE FishSticks ADD FOREIGN KEY (keyname) REFERENCES Table (fieldname);
The good news is that saving my database and getting back to all sorts of InnoDB goodness only took about an hour in total, for about 30 tables or so.
However, if MySQL had simply reported an error a few days earlier instead of blithely just switching tables types behind the scenes, I might have avoided this whole mess in the first place. Oh well, at least I learned a few neat little commands I can play around with now! Lesson learned!
Here’s to hoping that this article helps some other folks solve the same problem a bit quicker!
Download version 0.9 of StripTags for PHP5
One of the greater dangers facing web application authors today are Cross Site Scripting attacks (given the initialism XSS, so as not to be confused with cascading style sheets). In this, people filling in forms on your web site (such as a comment on a blog entry, etc.) include malicious input that, when others go to view it, can cause effects that range from the annoying (popping up advertisements) to the dangerous (redirecting you to a site that “spoofs” the current site and spies on your input).
A simple example of this would be if you implement a bulletin board-like system via which users can enter small messages of their own. A user could choose to enter in the comment body:
<script>
document.location = "http://maliciousspoofsite.com";
</script>
When they submit this page and somebody else goes to view it, they are redirected, possibly without even knowing it, to another site with all sorts of potential consequences.
Good news arrives with a very basic solution to this problem in the form of thestrip_tags function in PHP. This function simply looks for any markup elements in a given string and removes them:
<?php
$str = "This is a<strong>string</strong> with
<script>document.location = 'http://moo.cow';</script>";
$str = strip_tags($str);
echo $str;
?>
This script prints out:
This is a string with
document.location = 'http://moo.cow';
While it may render output less attractive, it has effectively neutralised the danger.
Another option is the htmlspecialchars function (and its close cousin, htmlentities), will simply convert any < or > characters into the HTML entities: < and > respectively.
Unfortunately, these can be extremely restrictive when we are writing web applications where we want to allow some degree of user input. If we want to let users include hyperlinks, images, or other harmless types of markup, we have a problem.
Thestrip_tags function does have a solution to this, but only a very crude one (which the authors admit freely and warn about well in advance). You can pass a second parameter to this function which is a string of permitted tags, such as the following:
<?php
$str = "This <em>is</em> a <strong>string</strong> with
<script>document.location = 'http://moo.cow';</script>";
$str = strip_tags($str, '<em><strong>');
echo $str;
?>
The output is now:
This <em>is</em> a <strong>string</strong> with
document.location = 'http://moo.cow';
While this is a nice improvement, it opens up huge security holes for us depending on those tags we permit:
<?php
$malicious = <<<EOSTR
This is a malicious string with a picture in it:
<img src="http://url/abc.jpg"
onMouseOver="document.location = 'http://badurl';"/>
<script>
document.location = "http://badurl";
</script>
EOSTR;
$str = strip_tags($str, '<img>');
echo $str;
?>
While the above code will correctly filter the <script> markup element out, it will still produce the following output:
This is a malicious string with a picture in it:
<img src="http://url/abc.jpg"
onMouseOver="document.location = 'http://badurl';"/>
document.location = "http://badurl";
Effectively, the strip_tags function says: If a tag is permitted, then all possible attributes on it are also permitted.
What we would ideally like is a system that protects us not just against malicious tags, but also against malicious attributes within those tags. Even on harmless seeming div or span elements, you can include style attributes that can cause all sorts of mischief.
So, we need to write our own version of the strip_tags function that lets us not only specify which tags are permitted, but also which attributes . I have seen a number of these floating around on the Intarwebs and unfortunately they more often than not do not work properly.
<, and then begin processing assuming a tag has the following basic structure:
<tagName attribute="value"> </tagName>;
Thus, the common approach is to:
<tag attribute = "value"> </tagName>
<tag[tab][tab]attribute = 'value ' attribute2 />
<tag
attribute
= ' value' / >
</tag>
<tag attribute =' <<<<Some Attribute >>>>>' >
blah blah blah </ tag>
Changing spaces to tabs or newlines, including multiple spaces, or placing < and > characters within attribute values all break many of the algorithms based on simple string searching or regular expressions (and these regular expressions are already quite horrific).
Even worse, not a single solution I have seen thus far is UTF-8 aware, and will very likely damage or destroy any multi-byte input.
While some may retort right away that not all of these markup variants are “allowed” by various specifications, the reality is that all of these work in every web browser i have tried (well, if i replace “tag” and “attribute” with something meaningful!). Therefore, we as application authors, have to worry about them and process them correctly.
In the end, we have no choice but to write a parser or “state machine” which keeps track of “where” we currently are, whether it is parsing an element, parsing an attribute, or speeding through the value of an attribute. We need to be able to handle all of the variations above and more.
Thus, I have written the StripTags class, attached at the bottom of this article. Included within the archive is a test script which demonstrates some of the input on which I have tested it (it is actively being used in a couple of web applications) and shows some example usage.
The class is fully UTF-8 aware. All of the files in the archive are UTF, so please be careful when loading and saving them—if your editor misbehaves, it might mess things up.
To use theStripTags class, you pass to it an array. The keys are the names of the markup elements you would like to permit while the values are arrays of attributes you would like to permit on each of these. For example:
<?php
$filter = array(
'a' => array('href'),
'img' => array('src', 'border', 'alt', 'title'),
'strong' => array(),
'em' => array(),
'p' => array('align')
);
$st = new StripTags($filter);
$safer = $st->strip($some_unsafe_string);
?>
One type of XSS that we have not yet discussed is a bit more annoying:
<img src="javascript:alert('oh noes!!!')"/>
The ability to embed script in attribute values makes life very difficult for us. One might think that we can just search for and get rid of javascript: in attribute value strings, but we still would have problems with:
<img src="vbscript:alert('oh noes!!!1!!11!')"/>
<img src=javasc
ript:a
lert('X
SS')>
There are other languages than javascript, and Unicode escape sequences can be used to encode Javascript.
The StripTags class currently takes a rather basic approach to this:
RemoveColons property is set to TRUE (which is the default), then the StripTags function will remove any colon characters or Unicode escape sequences representing colons from attribute value strings. It will, however, let strings start with:
http:
https:
ftp:
This is a bit restrictive, but until I implement of a better solution, the way I will leave it. You can, again, turn this off completely setting RemoveColons = FALSE, but then I’d probably tell your users not to be careful (well, I might tell them that anyway … !)
Here is version 0.9 of the StripTags class (I won’t consider it 1.0 until I come up with a robust solution to the inline attribute script attacks).
Download version 0.9 of StripTags for PHP5
Please do feel free to mail me at marcwan@chipmunkninja.com. This code will only work for PHP 5. It uses class syntax and semantics not available in prior versions. I have tested it with each version starting with PHP 5.0.2
For those programmers coming to PHP from other languages, the distinction between the keywords break and continue is quite clear. The former is used to abort loop execution or a switch statement while the latter is used to skip to the top (or bottom) of a loop.
So, despite knowing better, I still found myself spending more time than I’d like debugging the following code:
<?php
foreach ($daysOfTheWeek as $day)
{
switch (strtolower($day))
{
case 'monday':
case 'tuesday':
case 'wednesday':
case 'thursday':
case 'friday':
$this->processWeekdayData();
case 'saturday':
case 'sunday':
// nobody works on the weekend.
continue;
}
//
// this keeps getting thrown on saturday and sunday !!! how come?
//
throw new InvalidDayException('not a valid day');
}
?>
The problem?
The continue keyword in a switch statement does not work the same way as in other languages. In PHP, continue and break are synonymous inside this construct:
<?php
switch ($months)
{
// start with vowels
case 'august':
break;
case 'april':
continue; // exactly the same as "break" !!!
default:
return 'OK';
}
throw new StartsWithVowelException('Months with vowels are creepy');
?>
While we are on the topic, these two keywords have a feature in PHP that make them a bit more interesting and powerful than their peers in other languages.
Both can be given a numerical argument when used in a loop that indicates how many loops to continue through or break out of. For example, to abort two loops while in the inner loop:
<?php
//
// search through our 2D array until we find $value
//
$lowerval = strtolower($value);
foreach ($TwoDArray as $otherArray)
{
foreach ($otherArray as $value)
{
if (strtolower($value) == $lowerval)
{
// we found the value -- break out of both loops
break 2;
}
}
}
?>
Similarly, to restart the execution of an outer loop from within an inner one:
<?php
//
// verify that each sub array contains the given value
//
$lowerval = strtolower($value);
foreach ($TwoDArray as $otherArray)
{
foreach ($otherArray as $value)
{
if (strtolower($value) == $lowerval)
{
// we found the value -- this one definitely has it.
continue 2;
}
}
// if we've reached here, then the inner loop doesn't have the
// value. ¡aiiee!
}
?>
PHP has a few quirks that can make programmers from other languages scratch their heads, but you will quickly find that these are very easy to get used to and make the language the wonderfully productive one that it is.
As I sat down to edit “Core Web Application Programming with PHP and MySQL”, I would sometimes find errors in the text so blindingly obvious and stupid that I would question whether or not I was truly qualified to write such a book. And yet, after talking with some other people who write books (and recalling days when I wrote huge amounts of code), it seems that this is all common and with much proof-reading and the hard work of some friendly reviewers, I was able to write a book of extremely high quality.
Of course, that just meant I would be even more devastated when the first technical errors WERE found in the book.
There have been a couple, but they’re not that killer serious.
In Chapter 21, where the book discusses writing your own output handler, the constant in the $_SESSION array to check is HTTP_ACCEPT_ENCODING, without the letter ‘S’ on the word accept.
There are a couple of errors in the source code, the most glaring of which DID get fixed, but never made it on to the shipping CD ROM (d’oh!). In the SimpleBlog sample, in the file lib/entrymanager.inc, the class DBManager is accidentally misspelled DBMananager. Just fix it and change it back to the correct spelling and the sample will compile fine under PHP 5.0.x (x <= 4).
The other problem with the samples is that some new things have appeared in newer versions of PHP 5.0.y (y >= 5). PHP now defines a class called InvalidArgumentException, which conflicts directly with the class I have defined using the same name. The easy fix for this problem is to simply change the name of the class slightly, to something like MyInvalidArgumentException or some such thing.
To save you the hassle of tracking down and fixing all of these problems, I have put a new copy of the book source code up on the chipmunkninja.com servers. You can download these from here: phpwasrcupdate_2005-12-01.zip.
As always, if you see any other problems or errors in the book, or just want to comment on it, please feel free to drop me some mail.
I remain chagrined, but I’ll get over it.
As I sit here watching “The Muppets Take Manhattan” in Spanish in the middle of a Costa Rican thunderstorm, I find my mind drifting back to a recent project where I spent a day debugging a frustratingly annoying problem: A user would visit the web application I was working on, and after a given page was loaded, all of the session data associated with their visit would be suddenly gone. The user would no longer be logged into the site, and any changes they made (which were logged in session data) were lost.
I spent tonnes of time in the debugger (while at times unreliable and frustrating on huge projects, the Zend debugger is still an invaluable aid for the PHP application developer) and kept seeing the same thing: the session data were simply being erased at some point, and the storage in the database would register ’’ as the data for the session.It was driving me crazy. I would sit there in the debugger and go through the same sequence each time:
In retrospect, looking at the above list, it seems blindingly obvious what I had been running into, but it was very late in a very long contract, and I blame the fatigue for missing what now seems patently obvious: A race condition.
For those unfamiliar with what exactly this is, a race condition is seen most often in applications involving multiple “threads of execution” – which include either separate processes or threads within a process – when two of these threads (which are theoretically executing at the same time) try to modify the same piece of data.
If two threads of execution that are executing more or less simultaneously (but never in exactly the same way, because of CPU load, other processes, and chance) try to write to the same variable or data storage location, the value of that storage location depends on which thread got there first. Given that it is impossible to predict which one got there first, you end up not knowing the value of the variable after the threads of execution are finished (in effect, “the last one to write, wins”) (see Figure 1).

Normally, when you write web applications in PHP, this is really not an issue, as each page request gets their own execution environment, and a user is only visiting one page at a time. Each page request coming from a particular user arrives more or less sequentially and shares no data with other page requests.
Ajax changes all of this, however: suddenly, one page visit can result in a number of simultaneous requests to the server. While the separate PHP processes cannot directly share data, a solution with which most PHP programmers are familiar exists to get around this problem: Sessions. The session data that the various requests want to modify are now susceptible to being overwritten by other ones with bad data after a given request thinks it has written out updated and correct data (See Figure 2).

In the web application I was working on, all of the Ajax requests were being routed through the same code that called session_start() and implicitly session_write_close() (when PHP ends and there is a running session, this function is called). One of the Ajax requests would, however, set some session data to help the application “remember” which data the user was browsing. Depending on the order in which the various requests were processed by the server, sometimes those data would overwrite other session data and the user data would be “forgotten”.
As an example of this problem, consider the following example page, which when fully loaded, will execute two asynchronous Ajax requests to the server.
<?php
// race1.php
// part 1
session_start();
//
// We will prime the session data with a simple value here.
// One of the Ajax requests going to race2.php will clobber
// this value periodically.
//
$_SESSION['fudgecicle'] = 'eeeek!';
?>
<html>
<head>
<title>Race Condition Demo Page</title>
</head>
<script>
/* part 2 */
/**
*=-------------------------------------------------------=
* getNewHTTPObject
*=-------------------------------------------------------=
* This function is here just to create a new
* XmlHttpRequest object.
*/
function getNewHTTPObject()
{
var xmlhttp;
/** Special IE only code ... */
/*@cc_on
@if (@_jscript_version >= 5)
try
{
xmlhttp = new ActiveXObject("Msxml2.XMLHTTP");
}
catch (e)
{
try
{
xmlhttp = new ActiveXObject("Microsoft.XMLHTTP");
}
catch (E)
{
xmlhttp = false;
}
}
@else
xmlhttp = false;
@end @*/
/** Every other browser on the planet */
if (!xmlhttp && typeof XMLHttpRequest != 'undefined')
{
try
{
xmlhttp = new XMLHttpRequest();
}
catch (e)
{
xmlhttp = false;
}
}
return xmlhttp;
}
/**
*=-------------------------------------------------------=
* onLoadFunction
*=-------------------------------------------------------=
* We call this function when the page loads, and it starts
* two asynchronous Ajax requests to race2.php.
*/
var xml1;
var xml2;
function onLoadFunction()
{
xml1 = getNewHTTPObject();
xml2 = getNewHTTPObject();
xml1.open('GET', 'http://localhost/race2.php?req1', true);
xml2.open('GET', 'http://localhost/race2.php?req2', true);
xml1.onreadystatechange = handleResponse1;
xml2.onreadystatechange = handleResponse2;
xml2.send('');
xml1.send('');
}
/**
*=-------------------------------------------------------=
* handleResponse1
*=-------------------------------------------------------=
* This handles the response from the first ajax request.
* It puts the returned string in the <div> element
* with th eid 'fudgecicles'
*/
function handleResponse1()
{
if (xml1.readyState == 4)
{
document.getElementById('fudgecicles').innerHTML =
xml1.responseText;
}
}
/**
*=-------------------------------------------------------=
* handleResponse2
*=-------------------------------------------------------=
* This handles the response from the second ajax request.
* We don't actually care about this, so we just ignore
* the results.
*/
function handleResponse2()
{
// we don't care about this response
}
</script>
<!--
part 3
This page just executes the onLoadFunction() call and
contains a single <div> where the result will be
placed.
-->
<body onLoad='onLoadFunction();'>
<div id='fudgecicles'>Waiting for response from Server</div>
</body>
</html>
The code is divided into three main sections:
session_start() and opens the HTML headers.getNewHTTPObject() is used to create new objects. The onLoadFunction() is executed when the page finishes loading and starts the ball rolling, while the other two functions are simply used to wait for and handle the responses and results from the asynchronous requests.<body> section of the document, which contains a single <div> element to hold the results and an attribute on the <body> element to make sure that the onLoadFunction() is called when the document finishes loading.The asynchronous Ajax requests are then made to race2.php and are processed by the following code, which can handle two different Ajax work requests:
<?php
// race2.php
session_start();
//
// if they ask for req1, then just return the current value
// of the $_SESSION['fudgecicles'] data. If they ask for
// req2, then have this session simulate a bogus value by
// setting $_SESSION['fudgecicles'] to ''.
//
// NOTE: the two for loops are to simulate the various
// Ajax processes doing lots of stuff, and force
// the processor to interrupt the processes
// periodically to let the other work. This helps
// the race condition occur more.
//
if (isset($_GET) and isset($_GET['req1']))
{
// waste time
for ($x = 0; $x < 10000; $x++)
strpos('asdfasdf', 'tttttttttttttttasdfioap');
if (isset($_SESSION['fudgecicle'])
and $_SESSION['fudgecicle'] != '')
{
$retString = $_SESSION['fudgecicle'];
}
else
$retString = '(empty)';
}
else if (isset($_GET) and isset($_GET['req2']))
{
// waste time
for ($x = 0; $x < 10000; $x++)
strpos('asdfasdf', 'tttttttttttttttasdfioap');
$retString = '';
$_SESSION['fudgecicle'] = '';
}
header('Content-Type: text/plain');
echo $retString;
?>
This PHP script handles the two request types differently, and creates the race condition by having the second request type req1 set the sesion data to ’’. (In a real world application, you might have accidentally had this request set some value you thought was meaningful).
If you install the two files race1.php and race2.php on your server, and then load race1.php into your borwser, you will periodically see that the test string is set after the page is completely loaded, and other times it will be “(empty)”, indicating that the second Ajax request has clobbered the value.
Now that we are aware of this problem and how it can manifest itself, the next question is, of course, how do we solve it? Unfortunately, I think this is one of those problems best solved by avoiding it. Building in logic and other things into our web application to lock the threads of execution (i.e. individual requests) would be prohibitively expensive and eliminate much of the fun and many of the benefits of asynchronous requests via Ajax. Instead, we will avoid modifying session data when we are executing multiple session requests.
Please note that this is much more specific than saying simply that we will avoid modifying session data during any Ajax request. Indeed, this would be a disaster: In a Web 2.0 application, we are mostly likely using Ajax for form submission and updating the state of the user data (i.e. session data) as the data are processed and we are responding to the changes. However, for those requests we are using to update parts of pages dynamically, we should be careful to avoid modifying the session data in these, or at least do so in a way that none of the other requests are going to see changes in their results depending on these session data.
Ajax requests and session data do not have be problematic when used together: With a little bit of care and attention, we can write web applications that are powerful, dynamic, and not plagued by race condition-type bugs.
Got other suggestions or solutions to this problem? Add a comment or mail me, and I’ll update the article.
Download version 1.0 of StripTags for PHP5
After some further development over the last couple of weeks, I have released version 1.0 of the StripTags class for PHP.
This class is designed to replace the strip_tags function in PHP, which does not work particuarly well. It serves to help website authors avoid cross-site-scripting (XSS) attacks in user-created content, for sites such as blogs or forums where users can enter entries, articles, or comments.
You can read more about the class and XSS in general in the following article:Helping Prevent XSS Attacks in PHP5
The big new feature change in this version of the class is the ability to find XSS attacks injected via unicode-enrypted attributes, such as:
<IMG SRC=javascript
:alert('XSS')>
We now successfully find these and neutralise them by inserting extra junk in the attribute string so that they are not processed by client browsers.
Please note that this class is not a 100% complete solution to XSS. We do not handle all of the ways that XSS can be achieved through CSS and other forms of style (and thus always recommend that you not permit users to enter STYLE elements or “style” attributes on other elements). Solving this problem requires significant amount of work and effort, and I believe that if you want to give users that degree of input control, you should have them use a Wiki-language engine such as Textile.
The README and INSTALL documents have full information on how to use the class as well as what it does and does not do.
As always, please feel free to email me with any questions, comments, or bug reports. I’ll fix the latter as quickly as I can.
The RSS feeds on the Chipmunk Ninja site have been changed slightly. For compatibility, the old rss/tech.rss and rss/personal.rss feeds will continue to work, but there are now three new feeds:
| rss/tech | Tech Articles |
| rss/personal | Personal Articles and Stories |
| rss/all | All Chipmunk Ninja Blog Entries |
I have also fixed up the RSS output, particular the XML headers and the way tags are exposed through the <category> element.
As always, please let me know if there are any problems.
I am happy to announce the immediate availability of Payjacks, currently at version 0.2.0. Payjacks is a PHP/Ajax web application framework I’ve written using the object-oriented features in PHP5+.
Payjacks can be downloaded here:
http://chipmunkninja.com/download/payjacks-0.2.0.tar.gz
Payjacks is an object oriented PHP-Ajax web application framework I’ve written to help write robust and organised web applications. It was designed to require a minimal amount of effort to get your own web application up and running, while helping with such tasks as accessing a (MySQL, currently) database or providing a framework for sending asynchronous Ajax requests back to the server.
Payjacks uses many of the new object-oriented features in PHP 5 to do its work, and handles most of the details required to run a robust web application.
Some of the main features are:I use Payjacks for almost all of my “web page writing” these days. You can use the SimplePage class to just throw out a single PHP/HTML page, or you can use the WebApplication class and template off one of the samples to create more robust web applications.
Since Payjacks takes care of so many of the details that many web application authors simply neglect (error handling, URL processing, security details), using it gives you nearly everything you want except for the actual HTML of your pages.
If you want to use asynchronous Ajax requests, Payjacks helps to make this very easy too.
Payjacks does not help with generation of well-styled HTML or page content. It merely gives you the framework in which to place your content. There is nothing preventing you from designing and developing perfectly secure and robust yet hideously ugly web applications using Payjacks. It is for this reason that graphic designers earn their paycheques. Even my blog site, http://chipmunkninja.com is very rudimentary and clearly shows the limits of my design skills.
In order to run Payjacks, you need:
On Apache, this is mod_rewrite, and can easily be compiled into your Apache build. I have tested and run this on Apache 1.3.xx and 2.0.yy.
For IIS, there are a number of mod_rewrite clones, ranging from very free and open source to very expensive and not open source.
Payjacks is currently version 0.2.0 and should be considered development quality only right now. I plan on making some minor changes to function signatures in the next few versions (mostly to get rid of some parameters that I thought I would need for localisation, but have since realised were the wrong way to go about the problem).
If you have any bugs, comments, or questions, please do not hesitate to contact me at marcwan@chipmunkninja.com.
PHP is a sufficiently rich programming environment that it is not common that I truly need to execute external programs on the server on which it executes. However, every once in a while, this situation does come along, and for these, it is important to understand the options that PHP provides, what their differences are, and their relative strengths and weaknesses.
There are four primary choices for executing external programs in PHP:
system function.exec function.shell_exec function or its syntactic analogue, the backtick operator, ( ` ).passthru function.proc_, such as proc_open and proc_close, but these are quite advanced, and beyond the scope of this article.
We will first discuss the usage of these three functions in their most basic forms.
The system function in PHP takes a string argument with the command to execute as well as any arguments you wish passed to that command. This function executes the specified command, and dumps any resulting text to the output stream (either the HTTP output in a web server situation, or the console if you are running PHP as a command line tool). The return of this function is the last line of output from the program, if it emits text output.
For example:
// Windows Users replace "ls ..." with "dir ..."
$lastLine = system('ls /Users/marcwan');
echo "<br/>LastLine: $lastLine<br/>\\n";
The output of this command on my system is:
Desktop Documents Download Library Movies Music Pictures
Public Sites bin blah.php books make1.png make1.tiff media src
yici1.png yici1.tiff
LastLine: yici1.tiff
The return value of the function will be FALSE if the function completely fails to execute.
In addition to the return value of the system function, however, there is also the question of the return status of the program being executed (just as functions in PHP all have return values (NULL if you don’t explicitly choose one), all commands executed in most operating systems also have return values, typically a 32bit integer value). For those programs you are executing whose return status you wish to know, you can pass a second argument to the system function, which tells PHP where to put this value.
For example:
// define this variable here so that it can actually be used in the system fn
$returnValue = -1;
system('ls /Users/marcwan', $returnValue);
echo "Return Value: $returnValue<br/>\\n";
The output of this program will be:
Desktop Documents Download Library Movies Music Pictures
Public Sites bin blah.php books make1.png make1.tiff media src
yici1.png yici1.tiff
Return Value: 0
For most modern operating systems, a return value of 0 is considered a success, and other values indicate failure. Some programs will have a range of values to indicate the exact way in which they failed while others will just return 1 or -1 to indicate that they were unsuccessful.
The system function is quite useful and powerful, but one of the biggest problems with it is that all resulting text from the program goes directly to the output stream. There will be situations where you might like to format the resulting text and display it in some different way, or not display it at all.
For this, the exec function in PHP is perfectly adapted. Instead of automatically dumping all text generated by the program being executed to the output stream, it gives you the opportunity to put this text in an array returned in the second parameter to the function:
// you define this variable here so that it exists for the call to exec
$output = null;
// Windows users: 'dir c:\\' or something similar
exec('ls /Users/marcwan', $output);
echo "<pre>" . var_export($output, TRUE) . "</pre>\\n";
Once again, on my machine, the following output is printed:
array (
0 => 'Desktop',
1 => 'Documents',
2 => 'Download',
3 => 'Library',
4 => 'Movies',
5 => 'Music',
6 => 'Pictures',
7 => 'Public',
8 => 'Sites',
9 => 'bin',
10 => 'blah.php',
11 => 'books',
12 => 'make1.png',
13 => 'make1.tiff',
14 => 'media',
15 => 'src',
16 => 'yici1.png',
17 => 'yici1.tiff',
)
Of course, instead of just directly emitting this output, we could instead choose to process it and only emit those files with a ’.png’ extension, for example:
foreach ($output as $fileName)
{
if (strtolower(substr($fileName, -4)) == '.png')
echo "<br/>$fileName\\n";
}
And once again, you can get the return code from the exec function as follows:
$output = null;
$returnValue = -1;
exec('ls /Users/marcwan', $output, $returnValue);
echo "Return Value: $returnValue<br/>\\n";
echo "<pre>" . var_export($output, TRUE) . "</pre>\\n";
Most of the programs we have been executing thus far have been, more or less, real programs1. However, the environment in which Windows and Unix users operate is actually much richer than this. Windows users have the option of using the Windows Command Prompt program, cmd.exe (See Figure 1). This program is known as a command shell.
Figure 1— The Windows Command Prompt
Similarly, Unix (and Mac OS X) users can run a command shell such as sh, bash, or csh (see Figure 2). These are typically quite advanced interactive systems that allow elabourate scripts such as for x in *.png; do echo $x; done to be executed.
Figure 2— bash on Mac OS X
On both Windows and Unix, the commands for the various command shells can be put into files and executed by these programs. On Windows, convention has it that these script file have the extension .cmd while on Unix it is not uncommon to see .sh.
For those sitautions where we want to run these shell scripts from within PHP, we can use the shell_exec function, as follows:
// lists all .bat files anywhere in the system
$output = shell_exec('FOR /R C:\\ IN (*.bat) DO echo %x'); // Windows
// lists all .png files in /Users/marcan only
$output = shell_exec('for x in /Users/marcwan/*.png; do echo $x; done'); // Unix
echo "<pre>" . var_export($output, TRUE) . "</pre>\\n";
On my system this produces:
'/Users/marcwan/make1.png
/Users/marcwan/yici1.png
'
PHP also has a syntactic operator called the backtick operator which does the exact same thing as the shell_exec function. Instead of calling the shell_exec function, you simply wrap the command in the backtick character, `, which is very different from the single quote character '.
// lists all .bat files anywhere in the system
$output = `FOR /R C:\\ IN (*.bat) DO echo %x`; // Windows
// lists all .png files in /Users/marcan only
$output = `for x in /Users/marcwan/*.png; do echo $x; done`; // Unix
echo "<pre>" . var_export($output, TRUE) . "</pre>\\n";
It is worth noting that I avoid this operator at all times as it is not easy to spot in code, and I like potential security trouble spots to be as visible as humanly possible.
One fascinating function that PHP provides similar to those we have seen so far is the passthru function. This function, like the others, executes the program you tell it to. However, it then proceeds to immediately send the raw output from this program to the output stream with which PHP is currently working (i.e. either HTTP in a web server scenario, or the shell in a command line version of PHP).
When would this be useful? When you are working with image files and want to execute a program to show or otherwise manipulate these files. A most simple example would be as follows:
/**
* Windows:
*/
header('Content-Type: image/bmp');
passthru('type c:\\Windows\\zapotec.bmp');
/**
* Unix
*/
header('Content-Type: image/jpg');
passthru('cat /Users/marcwan/yici1.png');
The result will be an image in your browser. This is extremely powerful if you want to use some programs to resize or otherwise manipulate image files and then just dump the output to the client. There are a number of interesting utilities that come with various JPEG source code distributions that do exactly this and are powerful tools.
Armed with this basic knowledge, we have a one major thing left to deal with before we go off into the real world and begin using these functions: Security.
Allowing users to specify any portion of the string you pass to the exec, system, shell_exec, or passthru functions is a huge security risk, and you need to plan carefully in advance before allowing any of this. |
For example, the following code is simply a recipe for disaster:
$output = shell_exec('cat /webserver/info/user_files/' . $_POST['user_info_file']);
echo $output
If a malicious user were to set the value of the user_info_file post variable to bob; cat /etc/passwd. The output of both files would now be dumped, which is definitely not in your best interests.
There are two key ways which we will get around this:
X:, where X could be a valid drive letter on the system (Windows only, of course).escapeshellarg function to further make the parameters safer. I.e. $output = shell_exec("cat /web/users/info/" . escapeshellarg($processedUserInfoFile));A way in which we are helped by the operating system is that most web servers operate as a user with restricted permission, and thus can only do things that user is permitted to do.
One last security note we should mention here is that if your php.ini file has safe_mode set to On, then the shell_exec function and analagous backtick operator is not available.
We have covered a lot of ground in this article, but we should now have a solid understanding of the various execution functions available to us in PHP, as well as how they differ. While we should be extremely concerned about the security of running programs external to our web server, we can do so in a reasonably safe and controlled manner, which can only be a good thing for our web applications and their customers.
—
1 It is worth noting that dir, type, and many other commands on Windows are not programs, but actualy commands built into the cmd.exe program. Fortunately the system execution functions are smart enough to know to execute cmd.exe for you if you ask to execute one of them.
Thanks to Keith from the UK for pointing out something odd in my book that doesn’t seem to work as it did in earlier versions of PHP:
If you have a regular expression (I use the POSIX ones almost exclusively since they’re UTF-8 aware whereas the Perl ones were not when last I inquired), and you want to set a range for the number of matches on a particular expression you can use the syntax:
$expr = '[a-zA-Z]{5,50}'; // matches between 5 (incl) and 50 (incl) letters
Now, the problem is: what if you want to have the number of characters in the range be PHP variables that you can set in a configuration file or some such thing? Your first attempt, and what I used in my book, might be:
$expr = "[a-zA-Z]\{$min,$max}"; // double quotes for var expansion
And you would get a wonderfully annoying error message from the PHP engine:
Parse error: syntax error, unexpected ',', expecting '}' in Filename on line 5
No amount of backslashes will fix this problem. It turns out that the PHP parser consumes { and } characters when performing complex variable expansion, so …. all you have to do is add an extra set around each of the variables you wish to expand. PHP leaves the other two alone:
$expr = "[a-zA-Z]{{$min},{$max}}"; // extra { }s are consumed.
And what you are left with is a wonderfully working regular expression.
I never really intended for it to spiral out of control like that. I had just started writing my book (a programming book on designing and writing web applications using outrageously nerdy technologies – truly the “Great American Novel”), and found myself frequently needing a little pick-me-up that only mind-altering substances could provide. You wouldn’t think that writing a book would be all that hard – you either know a lot about something or learn about it, and then sit down and write about it in your chosen language (which I am currently pretending still qualifies as “English”). The material to be covered should be planned out well in advance and one can thus sit down and write, write, Write!
I never really intended for it to spiral out of control like that. I had just started writing my book (a programming book on designing and writing web applications using outrageously nerdy technologies – truly the “Great American Novel”), and found myself frequently needing a little pick-me-up that only mind-altering substances could provide. You wouldn’t think that writing a book would be all that hard – you either know a lot about something or learn about it, and then sit down and write about it in your chosen language (which I am currently pretending still qualifies as “English”). The material to be covered should be planned out well in advance and one can thus sit down and write, write, Write!
I never really intended for it to spiral out of control like that. I had just started writing my book (a programming book on designing and writing web applications using outrageously nerdy technologies – truly the “Great American Novel”), and found myself frequently needing a little pick-me-up that only mind-altering substances could provide. You wouldn’t think that writing a book would be all that hard – you either know a lot about something or learn about it, and then sit down and write about it in your chosen language (which I am currently pretending still qualifies as “English”). The material to be covered should be planned out well in advance and one can thus sit down and write, write, Write!
I never really intended for it to spiral out of control like that. I had just started writing my book (a programming book on designing and writing web applications using outrageously nerdy technologies – truly the “Great American Novel”), and found myself frequently needing a little pick-me-up that only mind-altering substances could provide. You wouldn’t think that writing a book would be all that hard – you either know a lot about something or learn about it, and then sit down and write about it in your chosen language (which I am currently pretending still qualifies as “English”). The material to be covered should be planned out well in advance and one can thus sit down and write, write, Write!
Alas, it does not seem to work out that way. You sit down—and draw a complete blank. The notes that you have so carefully researched, planned out, and written up sit uselessly at your side why you wonder why the hell you’re writing a book in the first place. What possible qualifications do you, a computer nerd with no previous writing experience, have to write a six to eight hundred page book that people will use to write serious applications and perform non-trivial financial transactions?And so, my writing day would begin. I actually do know what I’m talking about, and I know what I need to do. It can just take a bit of a kick to the derrière (definitelynot English) to get going. It is here where a little chemical boost comes in handy
I started out with maybe one hit a day. I would dope myself up, sit down, and churn out a good thirty-odd pages of writing. At the end of the day, I would go home, having not earned a single penny, but still possessed with a feeling of accomplishment – I was one chapter closer to the end of the book. Some days (it eventually grew to be “most days”) I would give myself a reward before going home, so that I still had some energy for the evening.
Then came the difficult patches. Chapters for which I was not entirely prepared when I sat down to write. I had done the research, and planned out a sequence of topics, but I would still find myself strangely requiring more time to fully convert them into the pages of text they would need to become. On these days, a little extra boost would be just what I needed to keep me going throughout the day. I was up to a good three times a day now, and the book was proceeding. Chapters 1-12 flew off my fingertips into the laptop.
I started running around the lake nearby. Spending most of the last decade in front of computers had left me a little less athletic than I would have liked. I learned that the huffing and puffing would suck ever so slightly less if I gave myself a little “present” before tying up the cross-trainers. Running around the lake became just another excuse to drug myself up a some point during the day.
It was in late November or early December that the earthquakes started. Seattle is known to lie on a number of earthquake faults, and is supposed to be only slightly less dangerous a place in which to live than California. One can only imagine the glee with which the local news media trumpet this fact on a weekly basis. When we are not being told that our neighbours are axe murdering child molesters, we are being warned that the if the city does not first fall into the ocean because of a monstrous quake causing horrible loss of life (at least a couple hundred people, including women and children!), the big pretty mountain nearby – which last erupted some zillions of years ago, thereby still qualifying it in the most vague of definitions as an “active volcano” – will suddenly explode and bury us all in hot ash and and lava while we sleep, and asphyxiate those who survive in a cloud of poison gas.
The earthquakes we had were all pretty mild, and would typically occur in the wee hours of the morning, perhaps around three or so. I would awaken to the bed shaking and a strong feeling of tremors. I would always look around to see what the cats were up to, and it was with some surprise that I would see them still fast asleep, usually on my wife. Her refusal to move them no matter how uncomfortable they made her sleep was always infinitely more pleasing to them than my insistence on chucking them off of me if they were sitting on my head. Looking outside the bedroom window, I would see no swinging of lampposts, and the trees were usually still in the night calm.
In the morning, I would ask my wife, “did you feel the earthquake, honey?” To which I would get a quizzical look and the inevitable “No. There was an earthquake? Are you sure?”. I was. At least for the first three or four over the first week. It was during the second week of regular quakes that she looked at me one morning at breakfast and announced: “I know what it is: I’m bouncing around in the bed and you’re feeling it shaking.” True, she does move around a lot when she sleeps (part of the self-contortions required to keep two cute cats happily sleeping), and our bed is an Ikea wooden bed that tends to send any energy sent its way straight back at the occupants.
So, for the the next two or three quakes, I found myself not worrying much about them. Still, to calm my nervous mind, I started looking up as soon as the I felt the quakes, and noticed that neither the wife nor the cats demonstrated even the slightest bit of evidence that she had been shifting around. Instead, I saw the usual slumbering mass shares my bed. Something wasn’t quite right.
It was not too long after the quakes that the dizzy spells began. Every day – without fail – I started feeling dizzy around eleven-thirty, twelve o’clock. I would find myself unable to concentrate much and further hits to try and improve my attention proved completely useless. I grew worried. Having had such good fortune in the early stages of writing, I had committed myself to a rather aggressive schedule with the publisher. Losing momentum was simply not an option.
Despite my best hopes to the contrary, the jogging (if you can really call it that – lumbering or lurching might be entirely more appropriate) didn’t seem to help much. The earthquakes, which were now clearly something I could self-diagnose as “night tremors”, and the dizzy spells continued unabated. My slouching over in uncomfortable chairs to type on my laptop sitting on wobbly, crappy tables that barely held much else was also causing my body to express some dismay with me in the form of a muscular stitch in my side.
I was dying. It was clear to me that my addiction was ballooning out of control and that it, combined with the other abuse I was heaping on my body – awful slouching and hour after hour in front of the most unforgiving laptop – was killing me methodically and painfully. I hadn’t been to the doctor for a physical in a good ten years (I just never got around to it) and it was patently obvious that I was sporting at least a dozen grapefruit sized tumours in various places in my body that were squeezing the life out of me.
In a desperate attempt to save myself and identify the source of the problem (instead of just doing the smart thing and going and seeing my damn doctor), I started trying to cut things out of my life to see what was doing the serious damage. I quit the drugs cold turkey, and while I couldn’t stop writing the book – this was a long-time dream on the cusp of being realised that I did not wish to jeopardise – I could at least stop hunching in such a quasimodoesque fashion and work on better posture.
To be sure, I decided to cut down on any sugar in my life (since that’s bad for you as well), and resolved to drink a glass of red wine (since that’s extremely good for you and only a fool doesn’t know that). If there was a magic recovery, I could work to identify the real culprit and if there was no recovery, I clearly was dying and would truly need to see the doctor.
I slept ¡como las muertas! for the next two weeks. Fifteen hours a day was insufficient. I would go to bed at ten in the evening, wake up approaching noon, and would usually take a nap sometime in the afternoon. During the few waking hours left in my day, I would write – now with better posture – and still exercise, since clearly that was not the source of my problems (although it sure did suck that much more without any chemical assistance). The classic symptoms of chemical withdrawal were playing out before me – extreme fatigue, lack of focus, and general dissatisfaction with life. I had officially become … a junkie.
Slowly, the dizzy spells went away. As did the “earthquakes”. After two weeks, I was back to feeling mostly normal, although I had resolved that a trip to the doctor would probably still be a good idea.
For the holiday season, we travelled to Germany to visit some dear friends; the husband is a doctor and the wife, a former executive, now just enjoys the good life being married to a doctor. We mentioned my recent lifestyle to her, at which point her eyes nearly popped out of her head.
“You drank HOW many doppio espressos every day?” The rise in her voice expressed both shock and consternation. “My brother went to the doctor recently with many health problems such as shivering, dizziness, and tremors, and the doctor told him to stop drinking coffee or he would be dead inside of three months.”
Awesome. I had officially taken grabbing a quick coffee to a whole new level – one that threatened my health and my life. The sad part is, I don’t even really like coffee. I used to avoid it entirely, disliking the bitter taste immensely. Three years ago, however, while studying italian in Rome, I found myself extremely fatigued one morning and couldn’t resist heading to the local “bar” for un caffè, a lovely pick me up that gave me the worst migraine for hours. To this date, I still cannot drink drip coffee, but a good Italian espresso is hard to beat for flavour and stimulation.
After coming back from Germany, I returned to the cafè where I did most of my writing. I did, however, cut down significantly on the caffeine. Instead of three or more double espressos or entire pots of tea, I would have a cup of decaffeinated tea or something else equally innocuous. I went and saw my doctor, who, after much poking, prodding, and drawing of all sorts of blood with various diabolical looking needles, told me there was nothing wrong with me, and if I’d just lose a few more pounds, would be the very model of a modern major general – or at least a healthy geek.
The book was finished after six total months of writing. A few chapters were quite rough – I caught the flu for two weeks in January (more withdrawal?) which was reflected quite clearly in the writing that occurred during that timeframe – but great feedback from reviewers and lots of time spent proofreading and reviewing helped me write a book that, while not perfect, is something of which I am quite proud.
I however, find myself continually having to be careful with the caffeine. In such small and concentrated doses, it is entirely too easy to throw back multiple espressos without even realising it. Why the American government spends billions “proving” that marijuana is bad for you (“wow, if you zap mice with painful electrodes whenever they don’t take the drugs, you can get them to consistently prefer taking drugs to getting zapped!”) while doing nothing about the caffeine addiction pandemic sweeping the nation is entirely unclear to me.
Which is why I will likely spend more of my life in cafés than in the corridors of power. At least I have my “escape” mechanism.
Payjacks is an object oriented PHP – Ajax web application framework I’ve written to help write robust and organised web applications. It was designed to require a minimal amount of effort to get your own web application up and running, while helping with such tasks as accessing a (MySQL, currently) database or providing a framework for sending asynchronous Ajax requests back to the server.
Payjacks uses many of the new object-oriented features in PHP 5 to do its work, and handles most of the details required to run a robust web application.
Web
object-oriented
user:marc
Requests
asynchronous
PHP
Ajax
Framework
Application
License:BSD