[SOLVED] Get image from RSS Feeds with no image URL

Get image from RSS Feeds with no image URL

I would just like to to know how other developers manage to properly get/extract the first image in the blog main content of a site from URL in the RSS feed. This is the way I think of since the RSS feeds don't have image URL of the post/blog item in it. Though I keep on seeing

<img src="http://feeds.feedburner.com/~r/CookingLight/EatingSmart/~4/sIG3nePOu-c" />

but it's only 1px image. Does this one has relevant value to the feed item or can I convert this to maybe the actual image? Here's the RSS http://feeds.cookinglight.com/CookingLight/EatingSmart?format=xml

Anyway, here's my attempt to extract the image using the url in the feeds:

function extact_first_image( $url ) {  
  $content = file_get_contents($url);

  // Narrow the html to get the main div with the blog content only.
  // source: http://stackoverflow.com/questions/15643710/php-get-a-div-from-page-x
  $PreMain = explode('<div id="main-content"', $content);
  $main = explode("</div>" , $PreMain[1] );

  // Regex that finds matches with img tags.
  $output = preg_match_all('/<img[^>]+src=[\'"]([^\'"]+)[\'"][^>]*>/i', $main[12], $matches);  

  // Return the img in html format.
  return $matches[0][0];  
}

$url = 'http://www.cookinglight.com/eating-smart/nutrition-101/foods-that-fight-fat'; //Sample URL from the feed.
echo extact_first_image($url);

Obvious downside of this function: It properly explodes if <div id="main-content" is found in the html. When there's another xml to parse with another structure, there will be another explode for that as well. It's very much static.

I guess its worth mentioning also is regarding the load time. When I perform loop through out the items in the feed, its even more longer.

I hope I made clear of the points. Feel free to drop in any ideas that could help optimize the solution perhaps.

Solution

The image urls are in the rss file, so you can get them just by parsing the xml. Each <item> element contains a <media:group> element that contains a <media:content> element. The url to the image for that item is in the "url" attribute of the <media:content> element. Here is some basic code (php) for extracting the image urls into an array:

$xml = simplexml_load_file("http://feeds.cookinglight.com/CookingLight/EatingSmart?format=xml");

$imageUrls = array();

foreach($xml->channel->item as $item)
{
    array_push($imageUrls, (string)$item->children('media', true)->group->content->attributes()->url);
}

Keep in mind, though, that the media doesn't necessarily have to be an image. It can be a video or an audio recording. There might even be more than one <media:group>. You can check the "type" attribute of the <media:content> element to see what it is.