http://www.example.com/books?_pop=mheader
What would be the regular expression to match this and any URL that has "books" in the URLs as one of the pattern matches ? This site has a books category and various other sub-categories under that. How do I traverse down to search all the URLs for book ?
require 'anemone'
Pattern = %r[(\/books)*]
Anemone.crawl("http://www.example.com/") do |anemone|
anemone.on_pages_like(Pattern) do |page|
puts page.url
end
end
http://rubular.com/ is a useful tool to test regex for Ruby.
The regex would be simple, /http:\/\/.+(books)/
. It matchs http://
as well to help ensure it is a url. Here is a rubular test against http://www.example.com/reference-books-2300.