I have a CI/CD workflow on GitHub Actions. The pipeline performs tests and builds, and it should upload a release file if a tag was created.
However, there's a problem: the size of the release.zip is 3.5GB (after the build process), and each file in GitHub Releases must be 2GB or less*l. The main reason for this large size is one file which is 3.3GB. This file must be available locally on every machine that installs the application.
Here are some details about the app:
It's a Python application for Windows desktop.
The build process uses the PyInstaller library.
The build process includes the problematic file to be built into the app.
There is no server or cloud the app is connected to.
The problematic file was originally about 9GB, and after compression, its size is 3.3GB.
The problematic file is stored with LFS.
Here is the relevant job from the workflow:
build:
needs: test
runs-on: windows-latest
steps:
# Checkout
- uses: actions/checkout@v4
with:
lfs: true
# LFS
- name: Install Git LFS
run: |
choco install git-lfs
- name: Pull lfs files
run: |
git lfs pull
# Python install
- name: Set up Python 3.10
uses: actions/setup-python@v3
with:
python-version: "3.10"
# PyInstaller build
- name: Build
run: |
cd ..
mkdir build
cd build
python -m pip install --upgrade pip
pip install pyinstaller
python -m pip install -r ..\requirements.txt
pyinstaller ..\prod.spec
xcopy /E /I .\dist\gui\ ..\gui\
cd ..\
dir
# Release
- name: Extract Tag Name
id: extract_tag_name
run: |
$tagName = $Env:GITHUB_REF -replace 'refs/tags/', ''
Write-Host "::set-output name=tag_name::$tagName"
# Zip the output folder
- name: Zip folder
if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
run: |
7z a -r release.zip ./gui/
- name: Create Release
if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
id: create_release
uses: actions/create-release@v1
with:
tag_name: ${{ steps.extract_tag_name.outputs.tag_name }}
release_name: Release ${{ steps.extract_tag_name.outputs.tag_name }}
body: Release ${{ steps.extract_tag_name.outputs.tag_name }} created automatically by GitHub Actions.
token: ${{ secrets.ACTION_RELEASE }}
draft: false
prerelease: false
env:
GITHUB_TOKEN: ${{ secrets.ACTION_RELEASE }}
- name: Upload Release Assets
if: steps.extract_tag_name.outputs.tag_name != null && steps.extract_tag_name.outputs.tag_name != 'refs/heads/main'
id: upload_asset
uses: actions/upload-release-asset@v1
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
asset_path: ./release.zip
asset_name: release.zip
asset_content_type: application/zip
env:
GITHUB_TOKEN: ${{ secrets.ACTION_RELEASE }}
Several solutions have been attempted, but none have resolved the problem:
GitHub LFS: Attempted to upload the release.zip
to LFS during workflow runtime. However, this approach was unsuccessful as it failed to push. Additionally, there are concerns about the feasibility of pushing LFS during the workflow.
Splitting the zip: Tried to split the zip file into partial zip files. Unfortunately, this failed during the workflow due to the actions/upload-release-asset@v1
not supporting multiple files. Moreover, attempting to split files in this manner on my local machine resulted in corrupted files.
Maximum compression: Utilized the zip -mx9
option for maximum compression, but this did not alleviate the issue. The output zip file still exceeded 3GB.
Unfortunately, this failed during the workflow due to the actions/upload-release-asset@v1 not supporting multiple files.
The README of actions/upload-release-asset
states its no longer maintained and suggests using softprops/action-gh-release instead, which does support multiple files:
for example to upload everything under a 'dist' directory:
# ...
uses: softprops/action-gh-release@v1
with:
files: "dist/*"
# ... use other parameters as needed
Moreover, attempting to split files in this manner on my local machine resulted in corrupted files.
Files are just sequences of bytes. You should always be able to split up any arbitrary file into chunks of smaller files. On Linux systems, the split
command can do precisely this. See one such example described here. Consider generating and publishing hashes as well to help verify integrity before and after upload.
Here's a Python implementation of splitting and joining arbitrary binary files to and from chunks:
import os
from pathlib import Path
from hashlib import sha256
def split_file(infile: str | Path, outdir: str | Path | None = None, n_chunks: int = 5):
infile = Path(infile).absolute()
if outdir is None:
outdir = infile.parent
else:
outdir = Path(outdir).absolute()
outfile_pattern = f'{os.path.basename(infile)}.chunk{{}}'
inhash = sha256()
file_size = os.stat(infile).st_size
assert file_size >= n_chunks, f'file too small ({file_size}) to chunk into {n_chunks}'
assert n_chunks >= 2
chunk_size = (file_size // n_chunks) or 1
outpaths = []
with open(infile, 'rb') as in_f:
for i in range(1, n_chunks + 1):
outfile_name = outfile_pattern.format(i)
outfile_path = outdir / outfile_name
outpaths.append(outfile_path)
with open(outfile_path, 'wb') as out_f:
if i == n_chunks:
chunk_contents = in_f.read()
else:
chunk_contents = in_f.read(chunk_size)
out_f.write(chunk_contents)
inhash.update(chunk_contents)
return outpaths, inhash.hexdigest()
def join_files(chunk_paths, outfile, expected_sha_256_digest=None):
new_hash = sha256()
with open(outfile, 'wb') as out:
for fp in chunk_paths:
with open(fp, 'rb') as f:
chunk_contents = f.read()
out.write(chunk_contents)
new_hash.update(chunk_contents)
if expected_sha_256_digest is not None:
assert expected_sha_256_digest == new_hash.hexdigest()
return new_hash.hexdigest()
# splitting usage:
source_file = 'myfile.bin'
chunk_files, digest = split_file(source_file)
print(f'split {source_file} (digest: {digest}) into chunks:', *chunk_files, sep='\n\t')
# joining usage:
new_hash = join_files(chunk_files, 'myfile-again.bin', digest)
print(f'Created myfile-again.bin with matching digest ({new_hash})')
Maximum compression: Utilized the zip -mx9 option for maximum compression
Unclear whether this will get you to the size you need, but you can try using a better compression mechanism. 7zip with specific options, zlib, or LZMA2 may get you better results than what is possible with zip
, depending on the nature of your data.