pythongithub-actionsgithub-release

Upload a GitHub release with file size above 2GB


I have a CI/CD workflow on GitHub Actions. The pipeline performs tests and builds, and it should upload a release file if a tag was created.

However, there's a problem: the size of the release.zip is 3.5GB (after the build process), and each file in GitHub Releases must be 2GB or less*l. The main reason for this large size is one file which is 3.3GB. This file must be available locally on every machine that installs the application.

Here are some details about the app:

Here is the relevant job from the workflow:

  build:
    needs: test
    runs-on: windows-latest

    steps:
      # Checkout
      - uses: actions/checkout@v4
        with:
          lfs: true

      # LFS
      - name: Install Git LFS
        run: |
          choco install git-lfs

      - name: Pull lfs files
        run: |
          git lfs pull

      # Python install
      - name: Set up Python 3.10
        uses: actions/setup-python@v3
        with:
          python-version: "3.10"

      # PyInstaller build
      - name: Build
        run: |
          cd ..
          mkdir build
          cd build
          python -m pip install --upgrade pip
          pip install pyinstaller
          python -m pip install -r ..\requirements.txt
          pyinstaller ..\prod.spec
          xcopy /E /I .\dist\gui\ ..\gui\
          cd ..\
          dir

      # Release
      - name: Extract Tag Name
        id: extract_tag_name
        run: |
          $tagName = $Env:GITHUB_REF -replace 'refs/tags/', ''
          Write-Host "::set-output name=tag_name::$tagName"

      # Zip the output folder
      - name: Zip folder
        if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
        run: |
          7z a -r release.zip ./gui/

      - name: Create Release
        if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
        id: create_release
        uses: actions/create-release@v1
        with:
          tag_name: ${{ steps.extract_tag_name.outputs.tag_name }}
          release_name: Release ${{ steps.extract_tag_name.outputs.tag_name }}
          body: Release ${{ steps.extract_tag_name.outputs.tag_name }} created automatically by GitHub Actions.
          token: ${{ secrets.ACTION_RELEASE }}
          draft: false
          prerelease: false
        env:
          GITHUB_TOKEN: ${{ secrets.ACTION_RELEASE }}

      - name: Upload Release Assets
        if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
        id: upload_asset
        uses: actions/upload-release-asset@v1
        with:
          upload_url: ${{ steps.create_release.outputs.upload_url }}
          asset_path: ./release.zip
          asset_name: release.zip
          asset_content_type: application/zip
        env:
          GITHUB_TOKEN: ${{ secrets.ACTION_RELEASE }}

Several solutions have been attempted, but none have resolved the problem:


Solution

  • Unfortunately, this failed during the workflow due to the actions/upload-release-asset@v1 not supporting multiple files.

    The README of actions/upload-release-asset states its no longer maintained and suggests using softprops/action-gh-release instead, which does support multiple files:

    for example to upload everything under a 'dist' directory:

    # ...
          uses: softprops/action-gh-release@v1
          with:
            files: "dist/*"
            # ... use other parameters as needed
    

    Moreover, attempting to split files in this manner on my local machine resulted in corrupted files.

    Files are just sequences of bytes. You should always be able to split up any arbitrary file into chunks of smaller files. On Linux systems, the split command can do precisely this. See one such example described here. Consider generating and publishing hashes as well to help verify integrity before and after upload.

    Here's a Python implementation of splitting and joining arbitrary binary files to and from chunks:

    import os
    from pathlib import Path
    from hashlib import sha256
    
    def split_file(infile: str | Path, outdir: str | Path | None = None, n_chunks: int = 5):
        infile = Path(infile).absolute()
        if outdir is None:
            outdir = infile.parent
        else:
            outdir = Path(outdir).absolute()
    
        outfile_pattern = f'{os.path.basename(infile)}.chunk{{}}'
    
        inhash = sha256()
        file_size = os.stat(infile).st_size
        assert file_size >= n_chunks, f'file too small ({file_size}) to chunk into {n_chunks}'
        assert n_chunks >= 2
        chunk_size = (file_size // n_chunks) or 1
        outpaths = []    
        with open(infile, 'rb') as in_f:
            for i in range(1, n_chunks + 1):
                outfile_name = outfile_pattern.format(i)
                outfile_path = outdir / outfile_name
                outpaths.append(outfile_path)
                with open(outfile_path, 'wb') as out_f:
                    if i == n_chunks:
                        chunk_contents = in_f.read()
                    else:
                        chunk_contents = in_f.read(chunk_size)
                    out_f.write(chunk_contents)
                    inhash.update(chunk_contents)
        return outpaths, inhash.hexdigest()
    
    
    def join_files(chunk_paths, outfile, expected_sha_256_digest=None):
        new_hash = sha256()
        with open(outfile, 'wb') as out:
            for fp in chunk_paths:
                with open(fp, 'rb') as f:
                    chunk_contents = f.read()
                    out.write(chunk_contents)
                    new_hash.update(chunk_contents)
        if expected_sha_256_digest is not None:
            assert expected_sha_256_digest == new_hash.hexdigest()
        return new_hash.hexdigest()
    
    # splitting usage:
    source_file = 'myfile.bin'
    chunk_files, digest = split_file(source_file)
    print(f'split {source_file} (digest: {digest}) into chunks:', *chunk_files, sep='\n\t')
    
    # joining usage:
    new_hash = join_files(chunk_files, 'myfile-again.bin', digest)
    print(f'Created myfile-again.bin with matching digest ({new_hash})')
    

    Maximum compression: Utilized the zip -mx9 option for maximum compression

    Unclear whether this will get you to the size you need, but you can try using a better compression mechanism. 7zip with specific options, zlib, or LZMA2 may get you better results than what is possible with zip, depending on the nature of your data.