I use this code to crawl the website, but I want the link as a separate result.
I want the tag result separate from Artists to put them inside variables.
<?php
require 'vendor/autoload.php';
use Symfony\Component\DomCrawler\Crawler;
$client = new \GuzzleHttp\Client();
$url = 'https://hentaifox.com/gallery/58091/';
$res = $client->request('GET', $url);
$html = ''.$res->getBody();
$crawler = new Crawler($html);
foreach ($crawler->filter('#content .left_content .info .artists') as $domElement)
{
$domElement = new Crawler($domElement);
$manga_tag = $domElement->html();
print_r($manga_tag);
echo "<br>";
};
i don't know how to do this with Symfony's DomCrawler, but PHP has decent built-in tools to parse HTML, namely "DOMDocument" and "DOMXPath", and in DOMDocument it would look like this:
$domd = @DOMDocument::loadHTML($html);
$xp = new DOMXPath($domd);
$tags = array();
$artists = array();
foreach ($xp->query("//a[contains(@href,'/tag/')]/span[1]") as $tag) {
$tags[trim($tag->textContent)] = merge_relative_absolute_urls('https://hentaifox.com/gallery/58091/', $tag->parentNode->getAttribute("href"));
}
foreach ($xp->query("//a[contains(@href,'/artist/')]/span[1]") as $artist) {
$artists[trim($artist->textContent)] = merge_relative_absolute_urls('https://hentaifox.com/gallery/58091/', $artist->parentNode->getAttribute("href"));
}
print_r([
'artists' => $artists,
'tags' => $tags
]);
function merge_relative_absolute_urls(string $base_url, string $relative_url): string
{
// strip ?whatever in base url (the browser does this too, i think)
$pos = strpos($base_url, "?");
if (false !== $pos) {
$base_url = substr($base_url, 0, $pos);
}
// strip file.php from /file.php if present
$pos = strrpos($base_url, "/");
if (false !== $pos) {
$base_url = substr($base_url, 0, $pos + 1);
}
if (0 === stripos($relative_url, "http://") || 0 === stripos($relative_url, "https://") || 0 === strpos($relative_url, "//") || 0 === strpos($relative_url, "://")) {
return $relative_url;
}
if (substr($relative_url, 0, 1) === "/") {
$info = parse_url($base_url);
$url = ($info['scheme'] ?? "") . "://" . $info['host'];
if (isset($info['port'])) {
$url .= ":" . $info['port'];
}
$url .= $relative_url;
return $url;
}
$url = $base_url . $relative_url;
return $url;
}
output:
$ php wtf3.php
Array
(
[artists] => Array
(
[Sahara-wataru] => https://hentaifox.com/artist/sahara-wataru/
)
[tags] => Array
(
[Big-breasts] => https://hentaifox.com/tag/big-breasts/
[Sole-male] => https://hentaifox.com/tag/sole-male/
[Nakadashi] => https://hentaifox.com/tag/nakadashi/
[Blowjob] => https://hentaifox.com/tag/blowjob/
[Full-color] => https://hentaifox.com/tag/full-color/
[Big-ass] => https://hentaifox.com/tag/big-ass/
[Blowjob-face] => https://hentaifox.com/tag/blowjob-face/
)
)