Im trying to make a stable system that will allow users to paste any mixture of BB / Html code into an input and i will sanitize and strip the data AS I WANT.
The content is copied from forums and the issue is that they all seems to use different code. Some display more than one
some use a self closing br tag. Others use a [URL =] And other just use [URL]URL[/URL] etc.
So far, I use HTMLpurifier to strip everything except for img tags.
HTMLpurifier doesnt (as far as i can see) remove BBCode. So, given a string like so:
[URL=http://awebsite.com]My Link [IMG]imagelink.png[/IMG][/URL]
How can i remove the URL tags and just leave the IMG tags.
I want to remove all the URL tag options so the url given and the text as well which may prove difficult.
So far i have got quite far by converting [IMG] tags etc using REGEX which works but i feel there are too many variants to hardcode this.
Any suggestions on a more efficient way / possible way to remove the URL tags?
Option 1
If you just want to remove tags such as [URL=http://awebsite.com]
and [/URL]
, leaving the content inside, the regex is simple:
Search: \[/?URL[^\]]*\]
Replace: Empty string
In JavaScript
replaced = string.replace(/\[\/?URL[^\]]*\]/g, "");
In PHP
$replaced = preg_replace('%\[/?URL[^\]]*\]%', '', $str);
Option 2: Also Removing content such as MyLink
Here, we'll replace the content following [URL...]
that is not another tag.
Search: \[URL[^\]]*\][^\[\]]*|\[/URL[^\]]*\]
Replace: Empty string
JavaScript:
replaced = string.replace(/\[URL[^\]]*\][^\[\]]*|\[\/URL[^\]]*\]/g, "");
PHP:
$replaced = preg_replace('%\[URL[^\]]*\][^\[\]]*|\[/URL[^\]]*\]%', '', $str);