phpxmlrssfeedxml-namespaces

How to get rid of @attributes key in json response when reading a xml media:content attribute?


Does anyone know how I can remove the @attributes key from a JSON api response?

Actually I'm trying to access the url property of a media:content in an xml file https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml

To do this I had to use the following code

$feed_item['image'] = $xml->channel->item[$index]->children('media', True)->content->attributes()['url'];

But then when returning the JSON this @attributes is appearing... I only need the url enter image description here

private $rss_urls = array(
    'BBC' => ['url' => 'http://feeds.bbci.co.uk/news/world/us_and_canada/rss.xml'],
    'NPR' => ['url' => 'https://www.npr.org/rss/rss.php?id=1001'],
    'NYT' => ['url' => 'https://rss.nytimes.com/services/xml/rss/nyt/HomePage.xml', 'image_tag' => 'media:content'],
);
function fetch_rss(&$feed, $url, $source_name, $image_tag = null)
{
    $xml = simplexml_load_file($url);
    unset($xml->attributes()->domain);
    if (!$xml)
        return false;

    foreach ($xml->channel->xpath('//item') as $index => $xml_item) {
        $feed_item = false;
        $feed_item['title'] = strip_tags(trim($xml_item->title));
        $feed_item['description'] = strip_tags(trim($xml_item->description));
        $feed_item['link'] = strip_tags(trim($xml_item->link));
        $feed_item['date'] = strtotime($xml_item->pubDate);
        $feed_item['source'] = $source_name;
        if ($image_tag) {
            $feed_item['image'] = $xml->channel->item[$index]->children('media', True)->content->attributes();
        }
        $current_date = strtotime('today');
        if (date('Y-m-d', $feed_item['date']) === date('Y-m-d', $current_date)) {
            $feed[] = $feed_item;
        }
    }
    return $feed;
}

Solution

  • Every access to elements and attributes in SimpleXML returns an object with a bunch of overloaded behaviour, rather than just a string of the content. If you serialise that object directly to JSON, it tries to capture additional context, which in this case isn't what you wanted.

    In order to get just the content, you have to cast the object to string, most clearly by using the (string) operator:

    $feed_item['image'] = (string)$xml->channel->item[$index]->children('media', True)->content->attributes()['url'];
    

    The reason your other elements didn't have this problem is that they are used in contexts that force them to be strings, but you could safely get into the habit of always adding the explicit cast.


    Two asides:

    1. If I'm reading your code correctly, this line can be simplified to $feed_item['image'] = (string)$xml_item->children('media', True)->content->attributes()['url'];
    2. It's not a good idea to rely on the "local alias" for namespaces, because the code generating the XML might change and pick a different alias for the same namespace. Instead, look for the xmlns:media attribute to find the permanent namespace identifier to use with the children method, e.g. ->children('http://example.com/media-thing') instead of ->children('media', true)