I'm building a web app to run on my home LAN, which will stream video files from a central server to whichever device I feel like watching on.
These video files (.mp4, .mkv, etc.) often have multiple audio streams/tracks. These streams within a single file may or may not vary in encoding (AAC, AC3, MP3, etc.).
My question is, how can I predict which audio stream will actually be used when the video is played in an HTML5 <video>
element? (I have a feeling this might vary from browser to browser, so if I have to pick one for the sake of the question, let's go with Google Chrome).
I've been able to figure out some information by trial and error, but I haven't been able to find any actual documentation. For example, I know that Google Chrome refuses to play AC3 audio. When presented with a video file containing only an AC3 audio stream, it will play silently (no sound). Presented with a video containing both an AC3 stream and an AAC stream, it will play the latter- the only one it supports.
However, I don't know what method it uses to, say, pick between two AAC streams. Does it just pick the one with the lowest stream index? Does it attempt to filter based on the tagged language of the stream? Does it evaluate the connected hardware and take channels into account (e.g., selecting a 5.1 stream when appropriate hardware is connected)?
I'd appreciate if anyone can link to any documentation regarding the above. I've been searching for hours and haven't managed to find any. Or, short of that, share your experience. Ideally, I'd like to be able to structure my audio streams in a way where my 'preferred' streams are used when supported, and progressively worse streams are used as fallbacks when compatibility is worse.
You'll want to take a look at the HTML spec: https://html.spec.whatwg.org/multipage/media.html#media-element
A media resource can have multiple audio and video tracks. For the purposes of a media element, the video data of the media resource is only that of the currently selected track (if any) as given by the element's
videoTracks
attribute when the event loop last reached step 1, and the audio data of the media resource is the result of mixing all the currently enabled tracks (if any) given by the element'saudioTracks
attribute when the event loop last reached step 1.
One way of setting the tracks to play is with the media fragment.
<audio src="test.webm#track=comments" />
If there is no selection of track, the first audio track is loaded:
If the media resource is found to have an audio track
Create an
AudioTrack
object to represent the audio track.Update the media element's
audioTracks
attribute'sAudioTrackList
object with the newAudioTrack
object.Let enable be unknown.
If either the media resource or the URL of the current media resource indicate a particular set of audio tracks to enable, or if the user agent has information that would facilitate the selection of specific audio tracks to improve the user's experience, then: if this audio track is one of the ones to enable, then set enable to true, otherwise, set enable to false.
This could be triggered by media fragment syntax, but it could also be triggered e.g. by the user agent selecting a 5.1 surround sound audio track over a stereo audio track.
If enable is still unknown, then, if the media element does not yet have an enabled audio track, then set enable to true, otherwise, set enable to false.
If enable is true, then enable this audio track, otherwise, do not enable this audio track.
Fire an event named
addtrack
at thisAudioTrackList
object, usingTrackEvent
, with thetrack
attribute initialized to the newAudioTrack
object.