I currently got this far in scraping with htmldom (as far as examples go)
<?php
require 'simple_html_dom.php';
$html = file_get_html('https://nitter.absturztau.be/chillartaholic');
$title = $html->find('title', 0);
$image = $html->find('img', 0);
echo $title->plaintext."<br>\n";
echo $image->src;
?>
However instead of retrieving a title and image, I'd like to instead get all lines in the target page that begin with:
<a class="tweet-link"
and display the lines scraped - in their entirety - top to bottom below.
(First scraped line would then be:
> <a class="tweet-link"
> href="/ChillArtaholic/status/1413973360841744390#m"></a>
Is this possible with htmldom (or are there limitations on the scrapeable number of lines et all?)
Strangely enough, the answer from yesterday is gone.
This was the consensus that works (altho their answer had many different other approaches) :/
<?php
$dom = new DOMDocument;
@$dom->loadHTML($html);
$links = $dom->getElementsByTagName('a');
$url = 'https://nitter.absturztau.be/chillartaholic';
$html = file_get_contents($url);
$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//a[@class="tweet-link"]');
foreach ($nodes as $node){
echo $link->nodeValue;
echo $node-> getAttribute('href'), '<br>';
}
?>