jsfutf-8primefacesxml-parsing

Browser's behavior UTF-8 page with extra character like U+FFFE (ERROR by ajax) XML Parsing Error: not well-formed


We are developing some JSF web applications with PrimeFaces.

It is possible in inputText to copy/paste text with a non UTF-8 character like `` (Here is the character) and save the page. But there is an error in the rendered HTML:

Error: Forbidden code point U+fffe.

The browser doesn't show (hide) the error, it shows the non UTF-8 character (works like in a normal case)!

But, after this page will be sent from the server to the browser in an Ajax response, the browser shows an error in the console! And the user can see the error!

Console message:

XML Parsing Error: not well-formed

What is the best way to handle this error?

I have a lot of bad ideas. :)

  1. Validate input before save!

    • Client side with Javascript can be general, but not secure.
    • Replace bad characters to '' can cause misunderstanding by user. There are a lot of bad characters which are like right-accented characters.
    • I can make a validator for JSF, but I should register it to all inputText components one by one.
  2. Perhaps I can search the Javascript what parses Ajax responses and patch it.

'Minimal' reproduction:

Github link

  1. Download the source
  2. mvn clean package
  3. download wildfly 32.0.1.Final
  4. copy the war from project's target to wildfly/standalone/deployments
  5. start wildfly
  6. http://localhost:8080/utf8/main.xhtml
  7. After copy paste the bad character and click the button, its failed.

I think, the main problem is that Ajax uses XML. And there are a lot of characters in UTF-8 that aren't valid in XML.

How can I check my string fast in Java if there are any invalid characters?

Some other happenings

There are some strange behavior. U+0002 character is removed by jsf writer.

  1. I type U+0002 and click the button
  2. Server side bean property was set.
  3. But the renderResponse phase the Html writer logic has removed this character.

With U+FFFE character is other. I wrote a validator and it throws ValidatorException, but by the renderResponse Phase U+FFFE character appeared.

I think this mojarra bug can be very identical bug:

https://github.com/eclipse-ee4j/mojarra/issues/4516


Solution

  • I've reproduced it. The problem is 2-fold. This is indeed a bug in Mojarra which in turn exposed another bug in PrimeFaces Ajax handler. I've created a new issue in Mojarra side: https://github.com/eclipse-ee4j/mojarra/issues/5464.

    I think this mojarra bug can be very identical bug:
    https://github.com/eclipse-ee4j/mojarra/issues/4516

    Yes indeed. I've fixed a similar issue a couple of years ago for the Mojarra implementation, however it concerned a different unicode range.

    You can for the time being work around it by using MyFaces instead of Mojarra (I've confirmed that it works fine in MyFaces), or by using Standard Faces Ajax instead of PrimeFaces Ajax as follows:

    <p:commandButton value="Click" action="#{hello.ajax}">
        <f:ajax execute="@form" render="@form" />
    </p:commandButton>
    

    That said,

    a non UTF-8 character

    if it really wasn't a UTF-8 character (the U stands for Unicode), it wouldn't have been listed in a website dedicated to Unicode characters ... ;)

    Unrelated to the concrete problem, I noticed that the XHTML template in your reproducer https://github.com/halcsi19790320/ajax-utf8-problem/blob/main/src/main/webapp/main.xhtml has a lot of unnecessary tags/attributes which you most probably tried to use to fix the problem.

    <?xml version='1.0' encoding='UTF-8' ?>
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
    <html ...accept-charset="utf-8">
        <h:head>
            <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
        </h:head>
    

    I can assure that none of these are relevant. The XML prolog is unnecessary for Facelets as well as the average IDE. The XHTML doctype is unnecessary if the client ultimately retrieves a plain HTML page. The legacy accept-charset attribute is ignored by modern browsers (last one which took it serious was Internet Explorer). The <meta http-equiv> is ignored when the HTML page is served via http(s):// protocol instead of via e.g. file://. The following HTML5 template contains less noise and works just fine:

    <!DOCTYPE html>
    <html ...>
        <h:head>
    
        </h:head>
    

    See also JavaServer Faces 2.2 and HTML5 support, why is XHTML still being used and the KISS principle.