I'm fooling around with Yahoo! pipes and I'm hitting a wall with some regular expression. Now I'm familiar with regular expressions from Perl but the rules just seem to be different in Yahoo! pipes.
What I'm doing is fetching a page and trying to turn it into a feed, my regex for stripping out the link from the HTML works fine but the title which I want to be what was in <i> tags just outputs the original text.
Sample text that matches in Perl and on this online regexp tester:
<a rel="nofollow" target="_blank" HREF="http://changed.to/protect/the-guilty.html"><i>"Fee Fi Fo Fun" (English Man)</i></a> (See also this other site <a rel="nofollow" target="_blank" href="http://stackoverflow.com">Nada</a>) Some other text here
RegEx for the title:
(?i).*?<i>([^<]*).* [ ] g [x] s [ ] m [ ] i
RegEx for the link:
(?i).*?href="([^"]*).* [ ] g [x] s [ ] m [ ] i
Somehow the case-insensitive checkbox seems broken. Luckily you can substitute with (?i)
, which works nicely.
Here is a nice web2.0-ish tool to test regular expressions with: RegExr. But for some reason it's still beta. ;-)