symfonydomcrawler

DomCrawler is removing part of the html


When I get the content without DomCrawler, I get the html with custom tags like @click but when I use $this->crawler->filter('something')->html() DomCrawler is removing my @click tags.

Here an example without using DomCrawler:

enter image description here

And here is using DomCrawler:

enter image description here

As you can see, DomCrawler is removing all the @clicks, how can I stop this?


Solution

  • Unfortunately, you can't. DomCrawler uses DOMDocument under the hood and will not allow the "@click". Also:

    The DomCrawler will attempt to automatically fix your HTML to match the official specification.

    The modifiers to disable this would be LIBXML_HTML_NOIMPLIED which is not used in the addHmlContent method of DomCrawler:

    //... Symfony\Component\DomCrawler\Crawler.php
    $dom->loadHTML($content);
    // ...
    

    and even calling @$dom->loadHTML($content, LIBXML_HTML_NOIMPLIED); would not work in your case.

    Example:

    $html = <<<TEST
       <html>
           <div class="test" @click="something"></div>
       </html>
    TEST;
        dump($html);
        //<html>\n
        //    <div class="test" @click="something"></div>\n
        //</html>
    
        // Symfony Crawler
        $crawler = new \Symfony\Component\DomCrawler\Crawler();
        $crawler->addHtmlContent($html);
        dump($crawler->html());
        //<body>\n
        //    <div class="test"></div>\n
        //</body>
    
        // Custom crawler with LIBXML_HTML_NOIMPLIED
        $crawler = new \MyCrawler\Crawler();
        $crawler->addHtmlContent($html);
        dump($crawler->html());
        //  <div class="test"></div>