We can set the default character encoding to use for reading request bodies by ServletContext#setRequestCharacterEncoding
(since Servlet 4.0).
I think that the character encoding for HttpServletRequest#getReader
can be set using ServletContext#setRequestCharacterEncoding(*)
.
But the reader that HttpServletRequest#getReader
returns seems to decode characters not using the encoding set by ServletContext#setRequestCharacterEncoding
.
My questions are:
ServletContext#setRequestCharacterEncoding
does not have an effect on HttpServletRequest#getReader
(but it have an effect on HttpServletRequest#getParameter
)?ServletContext#setRequestCharacterEncoding
and HttpServletRequest#getReader
behaviors? (I read Servlet Specification Version 4.0, but I can't find any spec about such behaviors.)
I have created a simple war application and tested ServletContext#setRequestCharacterEncoding
.
[Env]
[index.html]
<!DOCTYPE html>
<html>
<head>
<meta charset="UTF-8">
</head>
<body>
<form action="/SimpleWarApp/app/simple" method="post">
<!-- The value is Japanese character '\u3042' -->
<input type="text" name="hello" value="あ"/>
<input type="submit" value="submit!"/>
</form>
<button type="button" id="the_button">post</button>
<script>
document.getElementById('the_button').addEventListener('click', function() {
var xhttp = new XMLHttpRequest();
xhttp.open('POST', '/SimpleWarApp/app/simple');
xhttp.setRequestHeader('Content-Type', 'text/plain');
<!-- The body content is Japanese character '\u3042' -->
xhttp.send('あ');
});
</script>
</body>
</html>
[InitServletContextListener.java]
@WebListener
public class InitServletContextListener implements ServletContextListener {
@Override
public void contextInitialized(ServletContextEvent sce) {
sce.getServletContext().setRequestCharacterEncoding("UTF-8");
}
}
[SimpleServlet.java]
@WebServlet("/app/simple")
@SuppressWarnings("serial")
public class SimpleServlet extends HttpServlet {
@Override
protected void doPost(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
// req.setCharacterEncoding("UTF-8");
System.out.println("requestCharacterEncoding : " + req.getServletContext().getRequestCharacterEncoding());
System.out.println("req.getCharacterEncoding() : " + req.getCharacterEncoding());
String hello = req.getParameter("hello");
if (hello != null) {
System.out.println("hello : " + req.getParameter("hello"));
} else {
System.out.println("body : " + req.getReader().readLine());
}
}
}
I don't have any servlet filters. The above three are all the components of this war application. (GitHub)
Case 1: When I submit the form with a parameter 'hello', the value of 'hello' is successfully decoded as follows.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
hello : あ
Case 2:
When I click 'post' and send text content, the request body cannot be successfully decoded as follows.
(Although I confirm that the request body is encoded by UTF-8 like this: E3 81 82
)
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???
Case 3:
When I also set the encoding using HttpServletRequest#setCharacterEncoding
at the first line of the servlet's 'doPost' method instead, the request body successfully decoded.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ
Case 4:
When I use http.setRequestHeader('Content-Type', 'text/plain; charset=UTF-8');
javascript, the request body successfully decoded.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : あ
Case 5:
When I do not call req.getParameter("hello")
, the request body cannot be successfully decoded.
requestCharacterEncoding : UTF-8
req.getCharacterEncoding() : UTF-8
body : ???
Case 6:
When I do not call ServletContext#setRequestCharacterEncoding
at InitServletContextListener.java
, no character encoding is set.
requestCharacterEncoding : null
req.getCharacterEncoding() : null
body : ???
[NOTE]
(*)I think so because:
HttpServletRequest#getReader
says
"The reader translates the character data according to the character encoding used on the body".
HttpServletRequest#getCharacterEncoding
says
"Returns the name of the character encoding used in the body of this request".
HttpServletRequest#getCharacterEncoding
also says
"The following methods for specifying the request character encoding are consulted, in decreasing order of priority: per request, per web app (using ServletContext.setRequestCharacterEncoding, deployment descriptor)".
ServletContext#setResponseCharacterEncoding
works fine. When I use ServletContext#setResponseCharacterEncoding
, The writer that HttpServletResponse#getWriter
returns encodes the response body by the character encoding set by it.
It is an Apache Tomcat bug (specific to getReader()
) that will be fixed in 9.0.21 onwards thanks to your report on the Tomcat users mailing list.
For the curious, here is the fix.