i need to parse a website that contains <p>
tags (many of them) i want to get them and put them on a csv file (in same column).
After testing, i'm seeing the paragraphs are not on the same column, it's because of the <br>
that are on <p>
tags example :
HTML :
<div class="text">
<p> hello <br> friends </p>
<p> parsing is cool <br> using <br> simpleHTMLdom </p>
</div>
When i parse the html below i get the two <p>
but not on same csv "column".
My code :
if($html_book_page->find('.text')){
foreach($html_book_page->find('div[class=text] p') as $bookPreview){
array_push($book, $bookPreview->plaintext);
}
}
$book is the array containing all text and i put $book on csv like :
fputcsv($open_csv, array_values($book), ',', ' ');
Any way to get : (header of csv : TEXT ) and inside : "Hello friends parsing is cool using simpleHTMLdom" ? Because for moment i have "Hello" and in another column i've "friends" .. "parsing is cool" ..."using".... "simpleHTMLdom"
Thank you all
Why don't you do a jQuery.remove()
before your CSV insert? Something like this:
$('.text p').find('br').remove()
If you don't want to permanently remove <br>
from the page, you could do something like this in your for-loop:
foreach($html_book_page - > find('div[class=text] p') as $bookPreview) {
$bookPreview.innerHTML.replace("<br>", "");
array_push($book, $bookPreview - > plaintext);
}