websocketjettynettygrizzly

WebSocket frame fragmentation in an API


Would exposing a WebSocket fragmentation have any value in a client-side API?

Reading the RFC 6455 I became convinced a non-continuation frame doesn't guarantee you anything in terms of its semantics. One shouldn't rely on frame boundaries. It's just too risky. The spec addresses this explicitly:

Unless specified otherwise by an extension, frames have no semantic
meaning.  An intermediary might coalesce and/or split frames, if no
extensions were negotiated by the client and the server or if some
extensions were negotiated, but the intermediary understood all the
extensions negotiated and knows how to coalesce and/or split frames
in the presence of these extensions.  One implication of this is that
in absence of extensions, senders and receivers must not depend on
the presence of specific frame boundaries.

Thus receiving a non-continuation frame of type Binary or Text doesn't mean it's something atomic and meaningful that has been sent from the other side of the channel. Similarly a sequence of continuation frames doesn't mean that coalescing them will yield a meaningful message. And what's even more upsetting, a single non-continuation type frame may be a result of coalescing many other frames.

To sum up, groups of bytes sent over the WebSocket may be received regrouped pretty much any way, given the byte order is the same (that's of course in absence of extensions).

If so, then is it useful to introduce this concept at all? Maybe it's better to hide it as a detail of implementation? I wonder if WebSocket users have found it useful in such products like Netty, Jetty, Grizzly, etc. Thanks.


Solution

  • Fragmentation is not a boundary for anything.

    It's merely a way for the implementation to handle itself based on memory, websocket extensions, performance, etc.

    A typical scenario would be a client endpoint sending text, which is passed through the permessage-deflate extension, which will compress and generate fragments based on its deflate algorithm memory configuration, writing those fragments to the remote endpoint as it has a buffer of compressed data to write (some implementations will only write if the buffer is full or the message has received its final byte)

    While exposing access to the fragments in an API has happened (Jetty has 2 core websocket APIs, both that support fragment access), its really only useful for those wanting lower level control on streaming applications. (think video / voip where you want to stream with quality adjustments, dropping data if need be, not writing too fast, etc ...)