[SOLVED] Is there any other way to push large pickle files (bigger than 100 MB) to GitHub repo other than Git LFS? (For my ML project)

Is there any other way to push large pickle files (bigger than 100 MB) to GitHub repo other than Git LFS? (For my ML project)

I am a beginner in ML. As per my understanding, we generally store our trained model in a pickle file. (If we are working in Python). However, GitHub has a file size upload limit of 100 MB. For files having size more than 100 MB, developers generally use Git LFS. However, Git LFS has limited free-tier usage. Is there any other way to push large pickle files (bigger than 100 MB) to GitHub repo without using Git LFS?

I used Git LFS for my project and it worked fine for a few days. However, I ran out of my free-tier usage and now my project app (hosted on Streamlit Cloud) is not working anymore. I tried Googling some other solutions, but to no avail. How do I fix this? Any help would be appreciated.

Solution

You can zip and split your files using function and then join them while execution.

I had the same problem and was only able to find this solution after a long research.

I used the following two functions (Hope this Helps!):

import gzip

def compress_file_into_two_parts(input_file_path, output_part1_path, output_part2_path):
    with open(input_file_path, 'rb') as f_in:
        data = f_in.read()

    mid_point = len(data) // 2
    part1_data = data[:mid_point]
    part2_data = data[mid_point:]

    compressed_part1 = gzip.compress(part1_data)
    compressed_part2 = gzip.compress(part2_data)

    with open(output_part1_path, 'wb') as f_out1:
        f_out1.write(compressed_part1)

    with open(output_part2_path, 'wb') as f_out2:
        f_out2.write(compressed_part2)



def decompress_two_parts_to_file(input_part1_path, input_part2_path, output_file_path):
        with open(input_part1_path, 'rb') as f_in1:
            compressed_part1 = f_in1.read()

        with open(input_part2_path, 'rb') as f_in2:
            compressed_part2 = f_in2.read()

        decompressed_part1 = gzip.decompress(compressed_part1)
        decompressed_part2 = gzip.decompress(compressed_part2)

        combined_data = decompressed_part1 + decompressed_part2

        with open(output_file_path, 'wb') as f_out:
            f_out.write(combined_data)