javascriptnode.jsdockerlibreofficelibreoffice-basic

LibreOffice CLI Highlighting Words in Generated PDF using Macros


I wanted to make a generic system in my application where I could input a file of a readable type, let's say a Text Document and convert it to a PDF.

My research has lead me to the CLI version of LibreOffice which allows you to do just that, the conversion from a given file type to a PDF, specifically, this is the command I found,

libreoffice --headless --convert-to pdf path/to/file, this will automatically convert, if convertible the file that was specified and it will output a .pdf that contains, hopefully all the data.

Now, the end goal is to have the ability to specify a list of words which are to be highlighted within that document, specifically, highlight them with a yellow background.

This has led me to some things called Macros and I've noticed I could write one in JavaScript.

I created my JavaScript Macro for LibreOffice

function highlight( words ) {
    var document = XSCRIPTCONTEXT.getDocument();
    var descriptor = document.createSearchDescriptor();

    descriptor.setPropertyValue( "SearchCaseSensitive", true );
    descriptor.setPropertyValue( "SearchWords", true );

    for( let i = 0; i < words.length; i++ ) {
        descriptor.setSearchString( words[ i ] );

        let found = document.findFirst( descriptor );
        while( !found ) {
            let text = found.getText();
            text.CharBackColor = "#FFFF00"; // Yellow Hex
            found = document.findNext( found.getEnd(), descriptor );
        }
    }
}

So I tried to modify my command from above,

libreoffice --headless --norestore --invisible --convert-to pdf path/to/file macro:///home/macros.highlight(${words})

Notice that I run this command inside a container, there I have create a mount point which contains all my macros.

The command I'm showing above I have constructed programmatically using my Node.JS application.

Here is what I think are the problems.

  1. I don't think I understand when the Macro takes place, does it work before the or after the file has been convert to a PDF, does it actually do anything.
  2. Is my Macro actually loaded or not
  3. Does the command make sense at all, I can't seem to find a lot of useful resources online.

Solution

  • Your post raises a lot of issues, so I'll try to give an overview of as much as I can cover in a single answer.

    First of all, I don't think you should use both --convert-to and macro:/// at the same time. The good news is that a macro should be able to do the conversion to PDF, so you wouldn't need --convert-to at all. And macros can be run headless, although you'd first better make sure it's working correctly, because otherwise there's no way to see what is wrong.

    As for passing arguments, perhaps the most straightforward way is to create a Basic macro that takes the words list and then invokes the macro, as in my answer at https://stackoverflow.com/a/37347633/5100564 using Python.

    JavaScript macros are written with one macro in the whole file, not an invocation to a function, so I don't see any way to pass arguments. There is an obvious solution — Use Python instead. Or Basic or Java. Those languages work well for LibreOffice macros, while JavaScript support is only in experimental status.

    Alternatively, you could put the list of words to highlight in a file, and then somehow have your macro read the file, which could be done in JavaScript or a more supported macro language. For example, the word list could be put into a CSV file and then opened in Calc, and then the macro could find the list in the open Calc window.

    You mentioned creating a mount point for your macros. But I think what you need to do is put the macros in the locations referred to by LibreOffice as either user, application, or document. The user location would be something like $HOME/libreoffice, and the application location would be something like /share/libreoffice for all users (those are to give you an idea — you would need to look up the exact locations). If neither of these are what you want, it's possible to unzip a document and embed the scripts into it.

    By the way, I experimented with going from the command line directly to a macro that isn't Basic. For example, this seemed to work:

    soffice vnd.sun.star.script:hello.py$say_hi?language=Python&location=user&words=apple,banana
    

    A Python macro can unpack the URI to get the words argument using something like this:

    url = args[0]
    wordsString = url.split("&words=")[1]
    

    Another note: The JavaScript interpreter for LibreOffice is Rhino, not Node.js, so be aware of any differences if you are not as familiar with that dialect.

    Now, most of the above tries to follow the ideas in your question as closely as possible. However, I wonder if what you really want isn't to be writing a macro that gets run from LibreOffice, but rather, write some code that connects to a headless instance of LibreOffice that is listening on a socket. Then you can send the UNO commands. It's noticeably slower, but will likely be easier to develop the code, and errors can be printed to the console, making it easy to debug.

    If you do that, then it should be a lot easier to use your preferred language, whether that be JavaScript, Delphi, C#, or many others. I'd still recommend Python, but it's not as necessary. To get an idea of how connecting through a socket works, check out the tutorial at http://christopher5106.github.io/office/2015/12/06/openoffice-libreoffice-automate-your-office-tasks-with-python-macros.html, for example, where it says to first play with the shell to get familiar.