ruby-on-railsjsonactionpack

Why is an apostrophe crashing the Rails 5.1.5 request parsing?


Our production rails server receives a post request (hook) from an external service (sparkpost) with the following format

data: {
...
"subject": "Your RedvanlyCategory: Men’s,<br>Redvanly Kent Pant, XL in Estate Blue arrived!",
...
}

Mind the the apostrophe character ’ not '. This breaks the rails request stack at:

[4ce93af4ed8b] [28a6b892-2c45-409b-90cf-3d1b4fa9b5f7] no implicit conversion of nil into String excluded from capture: DSN not set
[4ce93af4ed8b] [28a6b892-2c45-409b-90cf-3d1b4fa9b5f7]   
[4ce93af4ed8b] [28a6b892-2c45-409b-90cf-3d1b4fa9b5f7] ActionDispatch::Http::Parameters::ParseError (no implicit conversion of nil into String):
[4ce93af4ed8b] [28a6b892-2c45-409b-90cf-3d1b4fa9b5f7]   
[4ce93af4ed8b] [28a6b892-2c45-409b-90cf-3d1b4fa9b5f7] actionpack (5.1.7) lib/action_dispatch/http/parameters.rb:115:in `rescue in parse_formatted_parameters'

Why does this happen? The apostrophe seems to be a valid unicode character. Changing from ’ to ' no longer breaks the rails stack.

The header of the request is :

Accept  application/json
Accept-Encoding gzip
Content-Length  3971
Content-Type    application/json
Host    4ce93af4ed8b.ngrok.io
User-Agent  SparkPost
X-Forwarded-For 52.37.3.48
X-Forwarded-Proto   http

Edit: The curl to reproduce

curl --location --request GET 'http://localhost:3000/receive_sparkpost_hooks' \
--header 'Content-Type: application/json' \
--data-raw '[{"subject":"Your RedvanlyCategory: Men’s,<br>Redvanly Kent Pant, XL in Estate Blue arrived!"}]'

Previously we've seen some unicode characters (sparkpost & JSON should support UTF-8) that we're were crashing the rails stack and we've filtered them out using

encode('ASCII', 'binary', invalid: :replace, undef: :replace, replace: '')

The characters looked like enter image description here

I may believe we're dealing with this the wrong way and could use some advice in how to feed data to the service which in turn won't feed the rails API badly formatted unicode text.


Solution

  • The cause was a non-break unicode character present in the string (https://unicode-table.com/en/00A0/) yet not removed through the applied filtering in the question's text.