phphtmlimagepreg-replacecontentplaceholder

Convert an image placeholder into an HTML <img> tag using preg_replace()


I want to replace square-braced image placeholders with valid HTML markup.

A sample placeholder might look like this:

[img:http://example.com/_data/025_img.jpg]

I want is to change the bit where it says [img: ... ] with <img> tag and get a result like this:

<img src='http://example.com/_data/025_img.jpg' border='0' />

Additional information about user uploaded images relevant to this task:

  1. user uploads images to their profile
  2. image names are stored in db.
  3. they are listed next to a form which has a textarea
  4. while typing the text, I would like to offer the user to include one OR MORE of their images by adding the following tag [img: ... ] where ... is the link that would be copied upon clicking on the images which are listed from the user gallery.
  5. I'm using Codeigniter and passing the textarea through the view and into the controller->model where it is sanitized by a helper for all sorts of things ... sql/quotes etc.. XSS is also enabled on CI;
  6. then I would like to scan the text and see where the user has the [img: ... ] tag and exchange that into a <img> tag and render the post with images followed by text.

So the actual input from the user will be something along the lines of:

The brown fox jumped over foo bar [img:http://example.com/_data/025_img.jpg] and then went to bed [img:http://example.com/_data/0277_img.jpg] while thinking about [img:http://example.com/_data/1115_img.jpg]

That is the reason I asked for preg_replace(), rather than preg_match(). preg_match() doesn't make the text follow the images.


Solution

  • Let's get the easy thing out of the way first.

    /\[img:([^\]]+)\]/
    

    That is:

    Run this through preg_match and element 1 in the match array will very likely be an image URL that you can easily insert into an img tag.

    But you shouldn't. Not right away.

    First, this is insecure as heck. What's going to happen when I write this?

    [img:javascript:alert(document.cookie);]
    

    Uhoh. That's not going to be good.

    You're probably going to want to make sure that the thing that the user claims is a URL really is a URL. You can try doing this by calling parse_url. It will give you back an array of URL components. Make sure that the thing has a domain and a path, and is served over HTTP or HTTPS.

    Okay, but what happens when the user enters this?

    [img:http://www.example.com/foo.jpg" onmouseover="alert(document.cookie)"]
    

    That's a valid...ish... URL that will be successfully deconstructed by parse_url and may well pass basic checks for well-formedness. Filtering out spaces and quotes (single and double) will be a good starting point, but there are still more things to worry about.

    The bottom line is that markup like this is a vector in XSS, or Cross-site scripting vulnerabilities.

    You can probably mitigate some of the threat by passing the URL through htmlspecialchars. That will at least nuke quotes and brackets, and it's hard to be nasty with those taken care of. Just watch out for character set sillyness, some non-UTF-8 character encoding can include things that are ASCII quotes...

    You probably want to use a real markup language for this (even if it's just markdown), and you probably want to use a whitelist-based HTML filter like HTML Purifier on the result. This will help protect you from some levels of insanity.

    Remember, you're only paranoid if they aren't out to get you. The web is full of people that are so stupid that they're malicious, and people that are so malicious that it's stupid.