» tagged pages
» logout

sorted by: recent | see : popular
Content Tagged with User:marc + XSS

Helping Prevent XSS Attacks in PHP5

Download version 0.9 of StripTags for PHP5

One of the greater dangers facing web application authors today are Cross Site Scripting attacks (given the initialism XSS, so as not to be confused with cascading style sheets). In this, people filling in forms on your web site (such as a comment on a blog entry, etc.) include malicious input that, when others go to view it, can cause effects that range from the annoying (popping up advertisements) to the dangerous (redirecting you to a site that “spoofs” the current site and spies on your input).

A simple example of this would be if you implement a bulletin board-like system via which users can enter small messages of their own. A user could choose to enter in the comment body:

<script>
document.location = "http://maliciousspoofsite.com";
</script>

When they submit this page and somebody else goes to view it, they are redirected, possibly without even knowing it, to another site with all sorts of potential consequences.

Good news arrives with a very basic solution to this problem in the form of the strip_tags function in PHP. This function simply looks for any markup elements in a given string and removes them:

<?php

  $str = "This is a<strong>string</strong> with
<script>document.location = 'http://moo.cow';</script>";

  $str = strip_tags($str);

  echo $str;
?>

This script prints out:

  This is a string with
document.location = 'http://moo.cow';

While it may render output less attractive, it has effectively neutralised the danger.

Another option is the htmlspecialchars function (and its close cousin, htmlentities), will simply convert any < or > characters into the HTML entities: &lt; and &gt; respectively.

Unfortunately, these can be extremely restrictive when we are writing web applications where we want to allow some degree of user input. If we want to let users include hyperlinks, images, or other harmless types of markup, we have a problem.

The strip_tags function does have a solution to this, but only a very crude one (which the authors admit freely and warn about well in advance). You can pass a second parameter to this function which is a string of permitted tags, such as the following:

<?php
  $str = "This <em>is</em> a <strong>string</strong> with
<script>document.location = 'http://moo.cow';</script>";

  $str = strip_tags($str, '<em><strong>');

  echo $str;
?>

The output is now:

  This <em>is</em> a <strong>string</strong> with
document.location = 'http://moo.cow';

While this is a nice improvement, it opens up huge security holes for us depending on those tags we permit:

<?php

$malicious = <<<EOSTR

This is a malicious string with a picture in it:

<img src="http://url/abc.jpg"
     onMouseOver="document.location = 'http://badurl';"/>

<script>
  document.location = "http://badurl";
</script>
EOSTR;

$str = strip_tags($str, '<img>');

echo $str;
?>

While the above code will correctly filter the <script> markup element out, it will still produce the following output:

This is a malicious string with a picture in it:

<img src="http://url/abc.jpg"
     onMouseOver="document.location = 'http://badurl';"/>

  document.location = "http://badurl";

Effectively, the strip_tags function says: If a tag is permitted, then all possible attributes on it are also permitted.

What we would ideally like is a system that protects us not just against malicious tags, but also against malicious attributes within those tags. Even on harmless seeming div or span elements, you can include style attributes that can cause all sorts of mischief.

So, we need to write our own version of the strip_tags function that lets us not only specify which tags are permitted, but also which attributes . I have seen a number of these floating around on the Intarwebs and unfortunately they more often than not do not work properly.

As they parse through the string, they look for opening tags, <, and then begin processing assuming a tag has the following basic structure:

<tagName attribute="value"> </tagName>;

Thus, the common approach is to:
  • Get the opening <
  • Extract the tagName that comes right after and verify that it is permitted.
  • Skip the space character after the tag name.
  • Get the attribute name, which is the text up until the = sign
  • Get the value of the attribute, which is enclosed in double quotes
  • Get the closing >
Unfortunately, for most of the code I have seen, once you stray outside of the most basic of definitions of markup, the algorithm breaks. Consider the following markup:


<tag      attribute     = "value"> </tagName>

<tag[tab][tab]attribute = 'value  '     attribute2 />

<tag
   attribute
= ' value' /   >
</tag>

<tag attribute =' <<<<Some Attribute >>>>>' >
       blah blah blah </      tag>

Changing spaces to tabs or newlines, including multiple spaces, or placing < and > characters within attribute values all break many of the algorithms based on simple string searching or regular expressions (and these regular expressions are already quite horrific).

Even worse, not a single solution I have seen thus far is UTF-8 aware, and will very likely damage or destroy any multi-byte input.

While some may retort right away that not all of these markup variants are “allowed” by various specifications, the reality is that all of these work in every web browser i have tried (well, if i replace “tag” and “attribute” with something meaningful!). Therefore, we as application authors, have to worry about them and process them correctly.

In the end, we have no choice but to write a parser or “state machine” which keeps track of “where” we currently are, whether it is parsing an element, parsing an attribute, or speeding through the value of an attribute. We need to be able to handle all of the variations above and more.

Thus, I have written the StripTags class, attached at the bottom of this article. Included within the archive is a test script which demonstrates some of the input on which I have tested it (it is actively being used in a couple of web applications) and shows some example usage.

The class is fully UTF-8 aware. All of the files in the archive are UTF, so please be careful when loading and saving them—if your editor misbehaves, it might mess things up.

To use the StripTags class, you pass to it an array. The keys are the names of the markup elements you would like to permit while the values are arrays of attributes you would like to permit on each of these. For example:

<?php
  $filter = array(
    'a' => array('href'),
    'img' => array('src', 'border', 'alt', 'title'),
    'strong' => array(),
    'em' => array(),
    'p' => array('align')
  );

  $st = new StripTags($filter);

  $safer = $st->strip($some_unsafe_string);

?>

One type of XSS that we have not yet discussed is a bit more annoying:

<img src="javascript:alert('oh noes!!!')"/>

The ability to embed script in attribute values makes life very difficult for us. One might think that we can just search for and get rid of javascript: in attribute value strings, but we still would have problems with:

<img src="vbscript:alert('oh noes!!!1!!11!')"/>
<img src=&#106;&#97;&#118;&#97;&#115;&#99;
&#114;&#105;&#112;&#116;&#58;&#97;
&#108;&#101;&#114;&#116;&#40;&#39;&#88;
&#83;&#83;&#39;&#41>

There are other languages than javascript, and Unicode escape sequences can be used to encode Javascript.

The StripTags class currently takes a rather basic approach to this:

If the RemoveColons property is set to TRUE (which is the default), then the StripTags function will remove any colon characters or Unicode escape sequences representing colons from attribute value strings. It will, however, let strings start with:

http:
https:
ftp:

This is a bit restrictive, but until I implement of a better solution, the way I will leave it. You can, again, turn this off completely setting RemoveColons = FALSE, but then I’d probably tell your users not to be careful (well, I might tell them that anyway … !)

Here is version 0.9 of the StripTags class (I won’t consider it 1.0 until I come up with a robust solution to the inline attribute script attacks).

Download version 0.9 of StripTags for PHP5

Please do feel free to mail me at marcwan@chipmunkninja.com. This code will only work for PHP 5. It uses class syntax and semantics not available in prior versions. I have tested it with each version starting with PHP 5.0.2

User:marc: Chipmunk Ninja Technical Articles

StripTags 1.0 Released

Download version 1.0 of StripTags for PHP5

After some further development over the last couple of weeks, I have released version 1.0 of the StripTags class for PHP.

This class is designed to replace the strip_tags function in PHP, which does not work particuarly well. It serves to help website authors avoid cross-site-scripting (XSS) attacks in user-created content, for sites such as blogs or forums where users can enter entries, articles, or comments.

You can read more about the class and XSS in general in the following article:

Helping Prevent XSS Attacks in PHP5

The big new feature change in this version of the class is the ability to find XSS attacks injected via unicode-enrypted attributes, such as:


<IMG SRC=&#106;&#97;&#118;&#97;&#115;&#99;&#114;&#105;&#112;&#116;
      &#58;&#97;&#108;&#101;&#114;&#116;&#40;&#39;&#88;&#83;&#83;&#39;&#41;>

We now successfully find these and neutralise them by inserting extra junk in the attribute string so that they are not processed by client browsers.

Please note that this class is not a 100% complete solution to XSS. We do not handle all of the ways that XSS can be achieved through CSS and other forms of style (and thus always recommend that you not permit users to enter STYLE elements or “style” attributes on other elements). Solving this problem requires significant amount of work and effort, and I believe that if you want to give users that degree of input control, you should have them use a Wiki-language engine such as Textile.

The README and INSTALL documents have full information on how to use the class as well as what it does and does not do.

As always, please feel free to email me with any questions, comments, or bug reports. I’ll fix the latter as quickly as I can.

Download version 1.0 of StripTags for PHP5

User:marc: Chipmunk Ninja Technical Articles