I'm using a Crawler4j and Jsoup to crawl a website and it works fine for the HTML text, but there are some important contents, which default values are hard coded in CSS and then dynamically set with JavaScript. For example, I have the and I need the width value, which in CSS is hard coded as 10px, but modified in JavaScript to, let's say, 5px.
Is there a way to get this value without using another crawler? Or a simple alternative? I have already quite a lot of code, so I don't want to rewrite everything if there is a possibility to do that with the Crawler4j.
I hope my question is clear enough and thank you in advance for your help!
This is not possible with crawler4j
nor with jsoup
. They both handle only static HTML content.
There are several open issues related dynamic JavaScript execution on the official GitHub Repository: #49, #197 and #220.
To achieve your objectives, you would need to build a stack based on Selenium, CasperJS and/or PhantomJS, which could then be used for advanced crawling including JavaScript execution.