I am using the Google Cloud Storage Java SDK (V.2.20.1
) to upload files to my Bucket. I am trying to set the Content-Type of the file, which I am using Apache Tika to detect. The issue is that if the use the Content-Type returned by Tika, even though it is correct, when the file is uploaded it is corrupted and I cannot view it. If I manually set the Content-Type, the same value that Tika returned, then it uploads and I can view the file without issue.
This code does not work, I verify that the content type is exactly matching applicaiton/pdf
but it is corrupted on upload and I cannot view.
Tika tika = new Tika();
String contentType = tika.detect(inputStream);
System.out.println(contentType); //"application/pdf"
if("application/pdf".equals(contentType)) {
return bucket.create(Utilities.formatDirectoryName(directory) + name, inputStream, contentType);
} else {
System.out.println("INVALID TYPE");
return null;
}
This code does work by manually setting the Content-Type. The file uploads and I can view it without issue.
String contentType = "application/pdf";
System.out.println(contentType); //"application/pdf"
if("application/pdf".equals(contentType)) {
return bucket.create(Utilities.formatDirectoryName(directory) + name, inputStream, contentType);
} else {
System.out.println("INVALID TYPE");
return null;
}
When I view the information on the Cloud Storage UI everything shows correctly for both of the methods I listed above. Content-Type, size, etc. The difference is when I download the file to view, one does not work (corrupted) and the other one does work (views correctly).
I have run this test multiple times to ensure it wasn't just a weird upload glitch, but its consistent every time. I have also tried this with different types of files such as Power Points. Same result of using Tika vs manually setting the Content-Type. This is driving me crazy, please help!
Turns out using Tika
messes with the InputStream
marker, so once I run detect
I can't re-use that InputStream
to upload.
So instead I turn the InputStream
into a byte[]
then I can use that for detecting the type as well as saving
ByteArrayOutputStream baos = new ByteArrayOutputStream();
inputStream.transferTo(baos);
byte[] byteData = baos.toByteArray();
Tika tika = new Tika();
tika.detect(byteData);