I'm building a web application, that, among other things, performs conversion of files from doc
to pdf
format.
I've been using LibreOffice installed on the same server along with my web application. By shelling out and calling libreoffice
binary from the code of my web app I am able to successfully convert documents.
The problem: when my web application receives several HTTP requests for doc->pdf conversion during a very short period of time (e.g. milliseconds), calling libreoffice
fails to start multiple instances at once. This results in some files being converted successfully, while some are not.
The solution to this problem as I see it would be this:
libreoffice
service once, make sure it accepts connections,libreoffice
service asking it to perform file format conversion,libreoffice
API requests to port or socket file).After a bit of research, I found a CLI tool called jodconverter
. From it, I can use jodconverter-cli
to convert the files. The conversion works, but unfortunately jodconverter
will stop the libreoffice
server after conversion is performed (there's an open issue about that). I don't see a way to turn off this behavior.
Alternatively, I'm considering the following options:
in my web app, make sure all conversion requests are queued; this obviously defeats concurrency, e.g. my users will have to wait for their files to be converted,
research further and use something called UNO, however there's no binding for the language I am using (Elixir) and I cannot seem to see a way to construct a UNO payload manually.
How can I use libreoffice
as a service using UNO?
I ended up going with an advice for starting many libreoffice
instances in parallel. This works by adding a -env:UserInstallation=file:///tmp/...
command line variable:
libreoffice -env:UserInstallation=file:///tmp/delete_me_#{timestamp} \
--headless \
--convert-to pdf \
--outdir /tmp \
/path/to/my_file.doc
The advice itself was spotted in a long discussion to an issue on GitHub called "Parallel conversions and synchronization".