I'm working on a small scraper for fun and when I grab some image urls from certain sites they come back really weird.
For example:
scraped url:
https:\/\/cdn1.vox-cdn.com\/thumbor\/zN9XawbQJgFPkuAcA2JEGgqApm8=\/cdn0.vox-cdn.com\/uploads\/chorus_asset\/file\/3700712\/tomorrowland54fdf04f23efb_2040.0.jpg
desired url:
https://cdn1.vox-cdn.com/thumbor/zN9XawbQJgFPkuAcA2JEGgqApm8=/cdn0.vox-cdn.com/uploads/chorus_asset/file/3700712/tomorrowland54fdf04f23efb_2040.0.jpg
it's adding unnecessary backslashes, so that url doesn't work when you follow it, it gives an error.
I tried using the stripslashes function as it seems like that's it's purpose but it didn't work. The url just stayed the same.
(edit) here's the code i'm using to grab urls:
function GetImages($page_dom) {
$found_links = [];
$images = $page_dom->getElementsByTagName('img');
foreach ($images as $image) {
$img_src = $image->getAttribute('src');
$found_links[] = $img_src;
}
return $found_links;
}
When you call json_encode
, use the JSON_UNESCAPED_SLASHES
option to prevent it from escaping slashes.
But this shouldn't really be necessary. If you're outputing JSON, you should be sending it to a program that parses JSON, and the JSON parser will translate \/
to /
.