require 'open-uri'
require 'nokogiri'
def scrape(url)
html = open(url).read
nokogiri_doc = Nokogiri::HTML(html)
final_array = []
nokogiri_doc.search("a").each do |element|
element = element.text
final_array << element
end
final_array.each_with_index do |index|
puts "#{index}"
end
end
scrape('http://www.infranetsol.com/')
In this I'm only getting the a
tag but I need the email id and phone number into an excel file.
All you have is text. So, what you can do, is to only keep string tha look like email or phone number.
Fo instance, if you keep your result in an array
a = scrape('http://www.infranetsol.com/')
You can get element with an email (string with a '@') :
a.select { |s| s.match(/.*@.*/) }
You can get element with a phone number (string with at least 5 digits) :
a.select{ |s| s.match(/\d{5}/) }
The whole code :
require 'open-uri'
require 'nokogiri'
def scrape(url)
html = open(url).read
nokogiri_doc = Nokogiri::HTML(html)
final_array = []
nokogiri_doc.search("a").each do |element|
element = element.text
final_array << element
end
final_array.each_with_index do |index|
puts "#{index}"
end
end
a = scrape('http://www.infranetsol.com/')
email = a.select { |s| s.match(/.*@.*/) }
phone = a.select{ |s| s.match(/\d{5}/) }
# in your example, you will have to email in email
# and unfortunately a complex string for phone.
# you can use scan to extract phone from text and flat_map
# to get an array without sub array
# But keep in mind it will only worked with this text
phone.flat_map{ |elt| elt.scan(/\d[\d ]*/) }