I want to save the URL on_pages_like
a certain match. Anemone is doing its thing, and records are being created that store the URLs, but:
find_or_create_by_url
instead of create!
, so I'm not duplicating records each time.I want to save the URL. Currently the URL is being saved to the DB like:
--- !ruby/object:URI::HTTP scheme: http user: password: host: www.a4apps.com port: 80 path: /Websites/SampleCalendar/tabid/89/Default.aspx query: opaque: registry: fragment: parser:
I want it like:
http://www.a4apps.com//Websites/SampleCalendar/tabid/89/Default.aspx
The reason I'm saving to a Postgres table is I want another task to later modify that table using the URL of each record, and, I'm kind of new to this and was a little overwhelmed by the thought of adding a second DB suggested on the anemone site.
I tried tweaking the basic code over the past few days but haven't found the solution yet.
This is my Rake task:
namespace :db do
desc "Fetch a4apps urls"
task :fetch_a4apps => :environment do
require 'anemone'
Anemone.crawl("http://www.a4apps.com/") do |anemone|
anemone.on_pages_like(/\/SampleCalendar\/[^?]*$/) do |page|
Calendarparts.create!(:url => page.url)
end
end
end
end
My view does nothing other than to output the data onto a webpage:
<% @calendar.each do |part| %>
<tr valign="top">...
<td><%= part.url %> </td>...
</tr>
<% end %>
My controller:
class CalendarController < ApplicationController
def cainventory
@calendar = Calendarparts.all
end
end
Ok, so I think I figured it out. Don't know if its the ideal/correct way but I am pulling the path part out of the url and appending the original domain to the beginning of it.
namespace :db do
desc "Fetch a4apps urls"
task :fetch_a4apps => :environment do
require 'anemone'
website = 'http://www.a4apps.com'
Anemone.crawl(website) do |anemone|
anemone.on_pages_like(/\/SampleCalendar\/[^?]*$/) do |page|
Calendarparts.find_or_create_by_url(:url => website + page.url.path)
end
end
end
end