I'm trying to scrape a bunch of local html files. Each one has a piece of javascript embeded inside the file, with a different window.open path, like so:
<script>
function goTo() {
if (document.getElementById('somedomain').checked) {
window.open("http://www.somedomain.com");
}
if (document.getElementById('visit').checked) {
window.open("http://extract-this-url.com/?somevar=12345&anothervar=59305&etc=etc");
}
}
</script>
I'm trying extract that second URL - it'll be a different URL for each file (As will the first 'somedomain' url).
I've been looking at SimpleHTMLDOM but it doesnt look like it can do javascript thats embedded within a HTML file.
Is there any decent way of doing this?
Just use a regexp:
preg_match('#visit.*?window\.open\("(.*?)"#is',$text,$matches);
print_r($matches);