I am using undertow to run a websocket server that sends cursor positions. I am running undertow 2.3.5.Final
, on openjdk 11.0.22 2024-01-16
.
Problem: If I open two tabs, eventually I get a corrupt message.
Here's what a successful message looks like:
{"op":"refresh-presence","room-id":"123","data":{"5c494daf-3c68-4ef6-b649-69177cc10105":{"peer-id":"5c494daf-3c68-4ef6-b649-69177cc10105","user":null,"data":{}},"b4eb6d8a-562a-4b8c-8e44-ecd43cba3621":{"peer-id":"b4eb6d8a-562a-4b8c-8e44-ecd43cba3621","user":null,"data":{}},"88f03150-696d-4e09-afbd-1381a2fa2f7e":{"peer-id":"88f03150-696d-4e09-afbd-1381a2fa2f7e","user":null,"data":{"cursors-space-default--main-123":{"x":723,"y":345,"xPercent":56.484375,"yPercent":52.43161094224924}}}}}
And here is a failure:
ors-space-default--main-123":{"x":562,"y":12,"xPercent":43.9ab521,"r56nt":1b3273a68bf-fefe7d735"}
^ This looks like the begging part is "cut off". xPercent
also looks weird: 43.9ab521
, it's like some of the uuids have contaminated the values.
And another example:
This is a successful message:
{"op":"set-presence-ok","room-id":"123","client-event-id":"9093a7cc-96ea-42e5-943e-e7c7962bbfcc"}
And this is a failure
{"op":"set-presence-ok","room-id":"123","client-event-id":"6186021","u44-45ser"265-211b8cu44315"}
It looks like the uuid
here got cut up
Setup
Here's how I set up the websocket connection:
(defn ws-request [^HttpServerExchange exchange ^IPersistentMap headers ^WebSocketConnectionCallback callback]
(let [handler (-> (WebSocketProtocolHandshakeHandler. callback)
(.addExtension (PerMessageDeflateHandshake.
true ; client?
6 ; deflaterLevel: is a number from 0 (no compression) to 9 (maximum compression)
)))]
(when headers
(set-headers (.getResponseHeaders exchange) headers))
(.handleRequest handler exchange)))
And here's how I send messages:
(defn send-json!
"Serializes `obj` to json, and sends over a websocket."
[obj ws-conn]
(let [obj-json (->json obj)
p (promise)
_ (WebSockets/sendText
^String obj-json
^WebSocketChannel ws-conn
(proxy [WebSocketCallback] []
(complete [ws-conn context]
(deliver p nil))
(onError [ws-conn context throwable]
(deliver p throwable))))
ret @p]
(when (instance? Throwable ret)
(throw ret))))
Potential culprit: PerMessageDeflateHandshake
If I remove the PerMessageDeflateHandshake
extension, I can no longer repro the corrupt messages.
I am not sure what the best way to approach this problem is.
The problem was that Websockets.sendText
was used in a multi-threaded environment.
According to this post, WebsocketChannel
should be thread-safe. My guess is that when combined with PermessageDeflate
, it is no longer thread-safe.
Adding a lock around sendText
fixed this for me.