pythongithubmachine-learningpicklestreamlit

Is there any other way to push large pickle files (bigger than 100 MB) to GitHub repo other than Git LFS? (For my ML project)


I am a beginner in ML. As per my understanding, we generally store our trained model in a pickle file. (If we are working in Python). However, GitHub has a file size upload limit of 100 MB. For files having size more than 100 MB, developers generally use Git LFS. However, Git LFS has limited free-tier usage. Is there any other way to push large pickle files (bigger than 100 MB) to GitHub repo without using Git LFS?

I used Git LFS for my project and it worked fine for a few days. However, I ran out of my free-tier usage and now my project app (hosted on Streamlit Cloud) is not working anymore. I tried Googling some other solutions, but to no avail. How do I fix this? Any help would be appreciated.


Solution

  • You can zip and split your files using function and then join them while execution.

    I had the same problem and was only able to find this solution after a long research.

    I used the following two functions (Hope this Helps!):

    import gzip
    
    def compress_file_into_two_parts(input_file_path, output_part1_path, output_part2_path):
        with open(input_file_path, 'rb') as f_in:
            data = f_in.read()
    
        mid_point = len(data) // 2
        part1_data = data[:mid_point]
        part2_data = data[mid_point:]
    
        compressed_part1 = gzip.compress(part1_data)
        compressed_part2 = gzip.compress(part2_data)
    
        with open(output_part1_path, 'wb') as f_out1:
            f_out1.write(compressed_part1)
    
        with open(output_part2_path, 'wb') as f_out2:
            f_out2.write(compressed_part2)
    
    
    
    def decompress_two_parts_to_file(input_part1_path, input_part2_path, output_file_path):
            with open(input_part1_path, 'rb') as f_in1:
                compressed_part1 = f_in1.read()
    
            with open(input_part2_path, 'rb') as f_in2:
                compressed_part2 = f_in2.read()
    
            decompressed_part1 = gzip.decompress(compressed_part1)
            decompressed_part2 = gzip.decompress(compressed_part2)
    
            combined_data = decompressed_part1 + decompressed_part2
    
            with open(output_file_path, 'wb') as f_out:
                f_out.write(combined_data)