I am a beginner in ML. As per my understanding, we generally store our trained model in a pickle file. (If we are working in Python). However, GitHub has a file size upload limit of 100 MB. For files having size more than 100 MB, developers generally use Git LFS. However, Git LFS has limited free-tier usage. Is there any other way to push large pickle files (bigger than 100 MB) to GitHub repo without using Git LFS?
I used Git LFS for my project and it worked fine for a few days. However, I ran out of my free-tier usage and now my project app (hosted on Streamlit Cloud) is not working anymore. I tried Googling some other solutions, but to no avail. How do I fix this? Any help would be appreciated.
You can zip and split your files using function and then join them while execution.
I had the same problem and was only able to find this solution after a long research.
I used the following two functions (Hope this Helps!):
import gzip
def compress_file_into_two_parts(input_file_path, output_part1_path, output_part2_path):
with open(input_file_path, 'rb') as f_in:
data = f_in.read()
mid_point = len(data) // 2
part1_data = data[:mid_point]
part2_data = data[mid_point:]
compressed_part1 = gzip.compress(part1_data)
compressed_part2 = gzip.compress(part2_data)
with open(output_part1_path, 'wb') as f_out1:
f_out1.write(compressed_part1)
with open(output_part2_path, 'wb') as f_out2:
f_out2.write(compressed_part2)
def decompress_two_parts_to_file(input_part1_path, input_part2_path, output_file_path):
with open(input_part1_path, 'rb') as f_in1:
compressed_part1 = f_in1.read()
with open(input_part2_path, 'rb') as f_in2:
compressed_part2 = f_in2.read()
decompressed_part1 = gzip.decompress(compressed_part1)
decompressed_part2 = gzip.decompress(compressed_part2)
combined_data = decompressed_part1 + decompressed_part2
with open(output_file_path, 'wb') as f_out:
f_out.write(combined_data)