Bitquark

The unexpected dangers of preg_replace()

Published

I was reminded by a recent article of an obscure and dangerous property of PHP's preg_replace() function which can lead to code execution in some not-all-that-uncommon circumstances. I recently found some code vulnerable to this attack in the wild, so I thought I'd put together a quick writeup for pentesters and PHP coders who may not be familiar with the danger.

Let's start with a code example:

<?php
$in = 'Somewhere, something incredible is waiting to be known';
echo preg_replace($_GET['replace'], $_GET['with'], $in);
?>

The code will take a user-supplied regular expression and replace whatever it matches with a user-supplied string. So if we were to call preg_replace.php?replace=/Known/i&with=eaten, the script would perform a case-insensitive regex search (the i modifier) and echo Somewhere, something incredible is waiting to be eaten. Seems safe enough, right?

You've just been owned

The above code is vulnerable to code injection as it fails to account for dangerous PCRE modification flags in the input string. Most modifiers are quite harmless and let you do things like case-insensitive and multi-line searches, however one modifier, "e" will cause PHP to execute the result of the preg_replace() operation as PHP code. Let me just restate:

Setting the e regex modifier will cause PHP to execute the replacement value as code.

Why does PHP have this option? I have no idea, and it's actually been deprecated in later revisions (PHP >= 5.5.0) because of its recklessly insecure nature. Many people are still using PHP 5.2 or moving onto PHP 5.3, and even when deprecated the option will still work (it'll generate a warning at a log level turned off by default), so the issue will be around for a while yet.

Exploiting the code

To exploit the code, all the attacker has to do is provide some PHP code to execute, generate a regular expression which replaces some or all of the string with the code, and set the e modifier on the regular expression:

An attack on preg_replace()

That's all there is to it. In this example the complete command output is displayed when the system() call is executed, followed by the string with the match replaced with the last line of output (which is what system() returns).

The null byte problem

It's common to find code in which the coder manually sets the regex delimiters and any modifiers they want to use:

<?php
$in = 'Somewhere, something incredible is waiting to be known';
echo preg_replace('/' . $_GET['replace'] . '/i', $_GET['with'], $in);
?>

This code seems safe because the attacker can no longer end the regular expression with their own modifier. Safety is an illusion, however, because of the way preg_replace() handles null bytes. By passing in a "spoofed" end delimiter and e modifier followed by a null byte chaser, the end delimiter and modifiers in the code never get processed.

A more sophisticated attack on preg_replace()

Instead, the attacker's string - with the execution modifier - terminates the expression and leads to code execution.

Securing the code

PHP provides a handy function named preg_quote() which will quote any nasty characters in the input string and prevent code injection. Even when using this function, be aware that your custom regex delimiter my fall outside of the scope of the function. Use the the second parameter to specify the delimiter which you've chosen:

<?php
$in = 'Somewhere, something incredible is waiting to be known';
echo preg_replace('#' . preg_quote($_GET['replace'], '#') . '#', $_GET['with'], $in);
?>

Using preg_quote() renders all regex characters inert, so if you need to allow some access to use regular expressions, you'll need to escape your delimitation character by hand. Be very careful though, this approach is error prone; you'll need to escape the escape character as well, otherwise the attacker can just escape your escaping with their own escape character.

The implications of this issue stretch far and wide. Its subtle yet deadly nature make it an easy vulnerability to miss when developing and reviewing code. Be careful out there, and always think about how you use your input.

Update: Note that if searching for issues related to regex replacement in your code, also check for other vulnerable function variants such as ereg_replace(), eregi_replace(), mb_ereg_replace() and mb_eregi_replace()!