I have a script that scrapes HTML article pages of a webshop. I'm testing with a set of 22 pages of which 5 article pages have a product description and the others don't.
This code puts the right info on screen:
if doc.at_css('.product_description')
doc.css('div > .product_description > p').each do |description|
puts description
end
else
puts "no description"
end
But now I'm stuck on how to get this correctly to output the found product descriptions to an array from where I'm writing them to a CSV file.
Tried several options, but none of them works so far.
If I replace the puts description
for @description << description.content
, then all the descriptions of the articles end up in the upper lines in the CSV although they do not belong to the articles in that line.
When I also replace "no description" for @description = "no description"
then the first 14 lines in my CSV recieve 1 letter of "no description" each. Looks funny, but it is not exactly what I need.
If more code is needed, just shout!
This is the CSV code I use in the script:
CSV.open("artinfo.csv", "wb") do |row|
row << ["category", "sub-category", "sub-sub-category", "price", "serial number", "title", "description"]
(0..@prices.length - 1).each do |index|
row << [
@categories[index],
@subcategories[index],
@subsubcategories[index],
@prices[index],
@serial_numbers[index],
@title[index],
@description[index]]
end
end
It sounds like your data isn't lined up properly. If it were you should be able to do:
CSV.open("artinfo.csv", "w") do |csv|
csv << ["category", "sub-category", "sub-sub-category", "price", "serial number", "title", "description"]
[@categories, @subcategories, @subsubcategories, @prices, @serial_numbers, @title, @description].transpose.each do |row|
csv << row
end
end