Current Problem :- I have a setup of GCS to which i am uploading some files such as doc, docx, pdf. With the file upload the default metadata is also getting uploaded. THE FILES ARE GETTING UPLOADED AS A Blob. When we try to access the file I am getting a InputStream from which we cannot delete the metadata directly.
What I want ? I want to delete the default metadata ( Which may reveal the personal info of uploaded users ) while uploading or downloading the file from GCS server.
What problems I am facing ? While downloading the file the file is in blog type, or I am getting the file as Input stream from which we cannot delete the metadata directly.
What steps I need to follow to remove the Metadata from the files while downloading and uploading ?
How can we read the file metadata from Input stream and delete it ?
Tools and programming languages used :- Kotlin, http4k, Apache POI, PDFBox
val opc = OPCPackage.open("demoDox.docx")
val pp = opc.packageProperties
println(pp.creatorProperty)
pp.setCreatorProperty("Shubham") //we can update the core properties like this
println(pp.creatorProperty)
opc.close()
We can remove the docx metadata only when we know the file path. But as of now I am getting a InputStream from GCS.
Solved:
I was able to solve the problem using the code below:
val doc = HWPFDocument(response.body.stream)
println("Current author = ${doc.summaryInformation.author}")
val pp = doc.summaryInformation.removeAuthor()
println("Removed author = ${doc.summaryInformation.author}")
doc.close()