javanewlinecharsettrailing-newline

What is the proper charset for decoding System#lineSeparator?


Let us say we need to verify the following method.

    /**
     * Prints {@code hello, world}, to the {@link System#out}, followed by a system dependent line separator.
     *
     * @param args command line arguments.
     */
    public static void main(String... args) {
        System.out.printf("hello, world%n"); // %n !!!!!!
    }

Now we can verify the method prints the hello, world.

    /**
     * Verifies that the {@link HelloWorld#main(String...)} method prints {@code hello, world}, to the
     * {@link System#out}, followed by a system-dependent line separator.
     *
     * @author Jin Kwon <onacit_at_gmail.com>
     * @see System#lineSeparator()
     */
    @DisplayName("main(args) prints 'hello, world' followed by a system-dependent line separator")
    @Test
    public void main_PrintHelloWorld_() {
        final var out = System.out;
        try {
            // --------------------------------------------------------------------------------------------------- given
            final var buffer = new ByteArrayOutputStream();
            System.setOut(new PrintStream(buffer));
            // ---------------------------------------------------------------------------------------------------- when
            HelloWorld.main();
            // ---------------------------------------------------------------------------------------------------- then
            final var actual = buffer.toByteArray();
            final var expected = ("hello, world" + System.lineSeparator()).getBytes(StandardCharsets.US_ASCII);
            Assertions.assertArrayEquals(actual, expected);
        } finally {
            System.setOut(out);
        }
    }

The questionable part is the .getBytes(StandardCharsets.US_ASCII).

I don't think it's wrong to presume the system-dependent line separator encodes with US_ASCII.

Is the Charset#defaultCharset() right for the %n?


Solution

  • You should use the same encoding that encoded the byte array returned by buffer.toByteArray().

    It is the PrintStream's job to turn strings into bytes, so what encoding does your PrintStream use? You created the PrintStream by calling this constructor. The documentation says:

    Characters written to the stream are converted to bytes using the default charset, or where out is a PrintStream, the charset used by the print stream.

    So you should use Charset.defaultCharset() to encode the expected string into a byte array.

    Also consider passing your own Charset to the PrintStream using this constructor, and use the same charset for encoding the expected string. This way you make it very clear that you are using the correct charset.