javajsonjacksonobjectmapper

Jackson error "Illegal character... only regular white space allowed" when parsing JSON


I am trying to retrieve JSON data from a URL but get the following error:

Illegal character ((CTRL-CHAR, code 31)):
only regular white space (\r, \n,\t) is allowed between tokens

My code:

final URI uri = new URIBuilder(UrlConstants.SEARCH_URL)
      .addParameter("keywords", searchTerm)
      .addParameter("count", "50")
      .build();
  node = new ObjectMapper().readTree(new URL(uri.toString())); <<<<< THROWS THE ERROR

The url constructed is i.e https://www.example.org/api/search.json?keywords=iphone&count=50

What is going wrong here? And how can I parse this data successfully?


Imports:

import com.google.appengine.repackaged.org.codehaus.jackson.JsonNode;
import com.google.appengine.repackaged.org.codehaus.jackson.map.ObjectMapper;
import com.google.appengine.repackaged.org.codehaus.jackson.node.ArrayNode;
import org.apache.http.client.utils.URIBuilder;

example response

{
    meta: {
        indexAllowed: false
    },
    products: {
        products: [ 
            {
                id: 1,
                name: "Apple iPhone 6 16GB 4G LTE GSM Factory Unlocked"
            },
            {
                id: 2,
                name: "Apple iPhone 7 8GB 4G LTE GSM Factory Unlocked"
            }
        ]
    }
}

Solution

  • The message should be pretty self-explanatory:

    There is an illegal character (in this case character code 31, i.e. the control code "Unit Separator") in the JSON you are processing.

    In other words, the data you are receiving is not proper JSON.


    Background:

    The JSON spec (RFC 7159) says:

    1. JSON Grammar

    A JSON text is a sequence of tokens. The set of tokens includes six tructural characters, strings, numbers, and three literal names.

    [...]

    Insignificant whitespace is allowed before or after any of the six structural characters.

    ws = *(

    %x20 / ; Space

    %x09 / ; Horizontal tab

    %x0A / ; Line feed or New line

    %x0D ) ; Carriage return

    In other words: JSON may contain whitespace between the tokens ("tokens" meaning the part of the JSON, i.e. lists, strings etc.), but "whitespace" is defined to only mean the characters Space, Tab, Line feed and Carriage return.

    Your document contains something else (code 31) where only whitespace is allowed, hence is not valid JSON.


    To parse this:

    Unfortunately, the Jackson library you are using does not offer a way to parse this malformed data. To parse this successfully, you will have to filter the JSON before it is handled by Jackson.

    You will probably have to retrieve the (pseudo-)JSON yourself from the REST service, using standard HTTP using, e.g. java.net.HttpUrlConnection. Then suitably filter out "bad" characters, and pass the resulting string to Jackson. How to do this exactly depends on how you use Jackson.

    Feel free to ask a separate questions if you are having trouble :-).