github-actionspandocgnu-parallel

GitHub Action with Pandoc & GNU Parallel


I have the following GitHub Action to convert with pandoc to several output files, where condition_check_files runs on a push on main and returns the names of added & modified markdown files:

  conditional_pandoc:
    runs-on: 'ubuntu-22.04'
    needs: [ condition_check_files ]
    if: needs.condition_check_files.outputs.bool_files_changed == 'True'
    env:
      list_changed_files: ${{ needs.condition_check_files.outputs.list_changed_files }}
    steps:
      - name: install_pandoc
        uses: pandoc/actions/setup@v1
        with:
          version: 3.5
      - name: run_pandoc
        run: >
          # List types which will only use --standalone. You can easily add more extensions if you're fine with this setting
          $ONLY_STANDALONE_OUTPUT_TYPES = "latex pdf html docx odt"';' \
          parallel --jobs 0 \
            # Pandoc creates an AST (Abstract Syntax Tree); reuse this by saving/reading from .ast
            pandoc --from markdown {} --to native -o '{.}.ast' ';'\
            for i in $ONLY_STANDALONE_OUTPUT_TYPES';' do \
              pandoc --from native '{.}.ast' --standalone -o '{.}.$i' ';' \
            done';' \
            # You can easily add individual conversion rules by using pandoc after the 'done' part. Keep in mind to finish all non-comment lines with backslash
            rm '{.}.ast' ::: $env:list_changed_files

but when I push a modified .md to the repo, I get the follwing error for this job:

pandoc: {}: withBinaryFile: does not exist (No such file or directory)
Error: Process completed with exit code 1.

I really can't seem to find an explanation on how to solve this on the web, so any help would be greatly appreciated.

Edit I

After @Benjamin W. advised to remove the comments, I did and got the following error:

Run $ONLY_STANDALONE_OUTPUT_TYPES = "latex pdf html docx odt"';' \ parallel --jobs 0 \
/home/runner/work/_temp/57d44aa5-ea88-4a42-b677-caf9210aa3af.sh: line 1: =: command not found
Error: Process completed with exit code 127.

Edit II

With the helpful comments and a bit of experimenting I got the following code:

      - name: run_pandoc
        env:
          Only_Standalone_Output_Types: "latex pdf html docx odt"
        run: |
          parallel --jobs 0 \
            pandoc --from markdown --to native {} -o '{.}.ast' ';' \
            for i in $Only_Standalone_Output_Types';' do \
              pandoc --from native '{.}.ast' --standalone -o '{.}.$i' ';' \
            done';' \
            rm '{.}.ast' ::: $list_changed_files

and now the Action finishes without an error! However, the output is nowhere to be found. Since I don't have any error, I don't know where to look for a solution. For reference, this bash script works locally (you need to insert some example Markdown files to work on)

#!/bin/bash

test_files="test.md test/beispiel.md"
OUTPUT_TYPES="latex pdf html docx odt"

parallel --jobs 0 \
    pandoc --from markdown --to native {} -o '{.}.ast' ';' \
    for i in $OUTPUT_TYPES';' do \
        pandoc --from native '{.}.ast' --standalone -o '{.}.$i' ';' \
    done';' \
    rm '{.}.ast' ::: $test_files

Solution

  • So after a lot of tinkering and trial-and-error here is a version which works the basic idea. Latex to pdf still throws an error because of missing packages. But instead of adding all the needed packages (listed here) I intend to get the docker image working. I will report back if this works successfully or edit this answer with a complete list of packages.

      conditional_pandoc:
        runs-on: 'ubuntu-22.04'
        needs: [ condition_check_files ]
        if: needs.condition_check_files.outputs.bool_files_changed == 'true'
        env:
          list_changed_files: ${{ needs.condition_check_files.outputs.list_changed_files }}
        steps:
          - uses: actions/checkout@v4  # In order to find the script pandoc.sh
            with:
              fetch-depth: 2
          - uses: pandoc/actions/setup@v1
            with:
              version: 3.5
          - uses: teatimeguest/setup-texlive-action@v3  # To convert to pdf (from latex)
            with:
              packages: |
                scheme-basic
                hyperref
                xcolor
                iftex
                #TODO
          - name: Run Pandoc
            run: |
              echo "$list_changed_files"
              bash ./pandoc.sh "$list_changed_files"
          - name: Commit files # transfer the new files back into the repository
            run: |
              git config --local user.name "GH_Action_Bot"
              git add ./content
              git commit -m "GH Action: Pandoc | New output for changed files"
              git push -f origin main
    

    The important part is that

    #!/bin/bash
    # List types which will only use --standalone. You can easily add more extensions if you're fine with this setting
    Only_Standalone_Output_Types="latex pdf html docx odt"
    # Use GNU Parallel to work each file on its own cpu core.
    # Pandoc creates an AST (Abstract Syntax Tree); reuse this by saving/reading from .ast
    # You can easily add individual conversion rules by using pandoc after the 'done' part. Keep in mind to finish all non-comment lines with backslash
    parallel --jobs 0 \
        pandoc --from markdown --to native './{}' -o './{.}.ast' ';'\
        for i in "$Only_Standalone_Output_Types"';' do \
            pandoc --from native './{.}.ast' --standalone -o './{.}.$i' ';' \
        done';' \
        rm './{.}.ast' ::: "$1"
    
    - uses: actions/upload-artifact@v4  # To save the files in the repo
      with:
        name: pandoc-artifact
        path: content/