streamingaudio-streaminginternet-radio

get info from streaming radio


Is there a standard way to query a streaming radio service for the currently playing song?

I'm currently implementing this differently for each station e.g., SomaFM:

  $wg=join("\n",`wget -q -O - https://somafm.com/secretagent/songhistory.html`);
  $wg=~/\(Now\).*>([^<]*)<\/a><\/td><td>([^<]*)/s;  
  print "Secret Agent\n$1\n$2\n"

or Radio Svizzera Classica:

$wg=join("\n",`wget -q -O - http://www.radioswissclassic.ch/en`);
$wg=~/On Air.*?titletag">([^<]*).*?artist">([^<]*)/s;
print "Radio Svizzera Classic\n$1\n$2\n"

Is there a standardized alternative to parsing HTML pages, which can break when layouts change?


Solution

  • For SHOUTcast/Icecast style stations with ICY metadata (which make up the bulk of internet radio stations), the best thing to do is get this data from the stream itself.

    First, you need a URL to the actual stream. If you go to SomaFM's Secret Agent page at http://somafm.com/secretagent/, you'll see links to listen in other players. As an example, let's use the 128k AAC link, which points at http://somafm.com/secretagent130.pls. This isn't the actual stream... it's a playlist file that contains links to the actual stream. Open it in your favorite text or code editor to see what I mean:

    [playlist]
    numberofentries=2
    File1=http://ice1.somafm.com/secretagent-128-aac
    Title1=SomaFM: Secret Agent (#1  ): The soundtrack for your stylish, mysterious, dangerous life. For Spies and PIs too!
    Length1=-1
    File2=http://ice2.somafm.com/secretagent-128-aac
    Title2=SomaFM: Secret Agent (#2  ): The soundtrack for your stylish, mysterious, dangerous life. For Spies and PIs too!
    Length2=-1
    Version=2
    

    Internet radio stations typically include multiple servers here for failover. If the listener gets disconnected from one, the player will usually roll to the next item. This is also useful when one server reaches its listener limit... the player will (hopefully) eventually hit another server that's active.

    Anyway, fire up a copy of Wireshark or some other packet sniffer. Hit one of the URLs in your audio player, and inspect the traffic. The first thing we'll look at is the request and response.

    GET /secretagent-128-aac HTTP/1.1
    Host: ice1.somafm.com
    User-Agent: VLC/2.2.4 LibVLC/2.2.4
    Range: bytes=0-
    Connection: close
    Icy-MetaData: 1
    
    HTTP/1.0 200 OK
    Content-Type: audio/aacp
    Date: Sat, 20 May 2017 20:43:56 GMT
    icy-br:128
    icy-genre:Various
    icy-name:Secret Agent from SomaFM [SomaFM]
    icy-notice1:<BR>This stream requires <a href="http://www.winamp.com/">Winamp</a><BR>
    icy-notice2:SHOUTcast Distributed Network Audio Server/Linux v1.9.5<BR>
    icy-pub:0
    icy-url:http://SomaFM.com
    Server: Icecast 2.4.0-kh3
    Cache-Control: no-cache, no-store
    Pragma: no-cache
    Access-Control-Allow-Origin: *
    Access-Control-Allow-Headers: Origin, Accept, X-Requested-With, Content-Type
    Access-Control-Allow-Methods: GET, OPTIONS, HEAD
    Connection: Close
    Expires: Mon, 26 Jul 1997 05:00:00 GMT
    icy-metaint:45000
    

    These internet radio servers are either HTTP (in the case of Icecast and others) or really close to it (legacy SHOUTcast), and accept normal GET requests. In this case, my player (VLC) makes a GET request for /secretagent-128-aac, which is the path to the actual stream.

    My player also includes one key request header:

    Icy-MetaData: 1
    

    This Icy-MetaData header asks the server to mux metadata with the audio stream data. That is, the "now playing" track information is going to be sent periodically injected into the stream.

    In the server response headers, there's another key header:

    icy-metaint:45000
    

    This tells us two things... the first is that the server agreed to send metadata. The second is that the metadata interval is 45,000 bytes. Every 45,000 bytes, the server will inject a chunk of metadata. Let's go back to our packet sniffer, and see what this looks like:

    ICY Metadata Hex Dump

    The very first byte of the metadata chunk, 0x06, tells us how long the metadata chunk is. Take the value of that byte, multiply it by 16, and you'll have the length of the metadata chunk in bytes. That is, 0x06 for the first metadata chunk byte tells us that the next 96 bytes will be metadata, before returning to regular stream data. Note that this means the entire metadata is 97 bytes... 1 byte for the length indicator, and then 96 bytes (in this case) for the rest.

    Now, let's get into the actual text metadata format:

    StreamTitle='Buscemi - First Flight To London';StreamUrl='http://SomaFM.com/secretagent/';
    

    It's looks pretty straightforward. key='value', semicolon ; delimited. There are some big catches with this though. For example... there's no truly standard method for escaping the single quote. If the metadata value needs to contain a single quote, sometimes it's \', sometimes it's '''. Sometimes it's not escaped at all!

    Additionally, not all servers use the same character encoding. You can probably safely assume UTF-8, but do expect that some servers might be different, or just simply broken in their own metadata encoding.

    Anyway, now that you know how all of this works, you can implement. If you'd like, I have some code you can license. One is a Node.js API server which when given a stream URL will return the metadata for you, doing all the buffering and parsing server-side. The other is a client-side player based on MSE... note though that this only works with servers that support CORS, and as far as I know, only my own servers (AudioPump CDN) do that today. If you're interested in any of this code, feel free to e-mail me at brad@audiopump.co. If you have questions about my answer here on Stack Overflow, post a comment here.