I have an Javax.mail application what receive e-mails and find a text in the attached pdf and docx. I the file name contains utf-8 characters it looks like:
=?utf-8?Q?Sz=C3=A1mla.docx?=
In the function it is come from: String fileName = part.getFileName();
private boolean findInAttachment(Message message, String filterText) throws MessagingException, IOException, Exception {
boolean attachContains = false;
if(message.getContentType().contains("multipart")) {
Multipart multiPart = (Multipart) message.getContent();
int numberOfParts = multiPart.getCount();
for(int partCount = 0; partCount < numberOfParts; partCount++) {
MimeBodyPart part = (MimeBodyPart) multiPart.getBodyPart(partCount);
if(Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) {
// this part is attachment
String fileName = part.getFileName();
String fullName = AnalyzeEmailsApp.getUserDir() + File.separator + fileName;
if(isFileExtension(fileName, "docx")) {
part.saveFile(fullName);
attachContains = findInDocx(fullName, filterText);
DeleteAttachmentFile(fullName);
} else if(isFileExtension(fileName, "pdf")) {
part.saveFile(fullName);
attachContains = findInPdf(fullName, filterText);
DeleteAttachmentFile(fullName);
}
} // if
} // for
} // if
return attachContains;
}
How can I get the right file name (Számla.docx)?
you can use MimeUtility.decodeText() to decode the MIME encoded-word format.
import javax.mail.internet.MimeUtility;
String fileName = part.getFileName();
if (fileName != null) {
fileName = MimeUtility.decodeText(fileName);
}
The =?utf-8?Q?Sz=C3=A1mla.docx?= is RFC 2047 encoded-word syntax. MimeUtility.decodeText() handles both Q-encoding (quoted-printable) and B-encoding (base64).
MimeBodyPart part = (MimeBodyPart) multiPart.getBodyPart(partCount);
if (Part.ATTACHMENT.equalsIgnoreCase(part.getDisposition())) {
String fileName = part.getFileName();
if (fileName != null) {
fileName = MimeUtility.decodeText(fileName);
}
String fullName = AnalyzeEmailsApp.getUserDir() + File.separator + fileName;
// ... the rest
}