javaapacheparsingmboxmime4j

Mime4j: DefaultMessageBuilder fails to parse mbox content


I've downloaded mime4j 0.8.0 snapshot from subversion and built it with maven. The relevant jars I generated can be found here.

Now I try to parse a toy mbox file from mime4j test.

I use this sample code. Briefly:

final File mbox = new File("c:\\mbox.rlug");
int count = 0;
for (CharBufferWrapper message : MboxIterator.fromFile(mbox).charset(ENCODER.charset()).build()) {
    System.out.println(messageSummary(message.asInputStream(ENCODER.charset())));
    count++;
}
System.out.println("Found " + count + " messages");

+

private static String messageSummary(InputStream messageBytes) throws IOException, MimeException {
    MessageBuilder builder = new DefaultMessageBuilder();
    Message message = builder.parseMessage(messageBytes);
    return String.format("\nMessage %s \n" +
            "Sent by:\t%s\n" +
            "To:\t%s\n",
            message.getSubject(),
            message.getSender(),
            message.getTo());
}

The output is:

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Message null Sent by: null To: null

Found 5 messages

There are indeed 5 messages, but why are all fields null?


Solution

  • I found the problem.

    DefaultMessageBuilder fails to parse mbox files that have windows line separator \r\n. When replacing them with UNIX line separator \n the parsing works.

    This is a critical issue, since the mbox files downloaded from Gmail use \r\n.