I have a piece of code that turn single words or phrases into clickable internal links from a given list. Code is supposed to do this action only if the word or phrase is not linked yet. It is working very great actually except for one point: Code is considering name from src images attribute.
So,
<img src="img/xiaomi.jpg" />
is outputting
<img src="img/<a href="site.com/tag/xiaomi">Xiaomi</a>.jpg" />
As you can see probably regex is too greedy and getting what is not to get.
Code is modified to simplicity but is used as follows:
$content = 'All post content itself with all html tags a site can have. <p>Blabla</p> <img src="img/xiaomi.jpg" /> <p>Bliblibli</p> <p>Lorem ipsum xiaomi</p>';
$contentCopy = 'All post content itself with all html tags a site can have. <p>Blabla</p> <img src="img/xiaomi.jpg" /> <p>Bliblibli</p> <p>Lorem ipsum xiaomi</p>';
$list = $this->cache->get('wordsList');
foreach($list as $word){
$var = $word->word;
$text = preg_replace('/<a[\S\s]+?<\/a>(*SKIP)(*FAIL)|\b'.$var.'\b/i', '<a href="'.base_url('site/tag/'.url_title($var)).'" target="_blank" title="'.ucfirst($var).'">$0</a>', $text);
}
$content = str_replace($contentCopy,$text,$content);
Can you guys please help to improve this code?
Apparently the problem is only in image tags.
I use this snippet to auto create internal links for stored pages and help on site SEO
You may replace <a[\S\s]+?<\/a>
with (?:<a[\S\s]+?<\/a>|<img\b[^>]*>)
. Here is a variation with a .
and s
modifier instead of [\s\S]
:
'~(?:<a.*?</a>|<img\b[^>]*>)(*SKIP)(*FAIL)|\b'.$var.'\b~si'
Quick details:
(?:<a.*?</a>|<img\b[^>]*>)
- <a
, any 0+ chars as few as possible, </a>
, or <img
, word boundary, any 0+ chars other than >
and then >
(*SKIP)(*FAIL)
- PCRE verbs that make the current match fail at the current index and start the next match search from this index where failure occurred|
- or\b...\b
- whole word $var
(only works if it contains just word chars, else you need to preg_quote($var, "~")
and use other boundaries).See the regex demo.