Sorry if I'm wording this wrong, below is my script, I'm trying to figure out why when I review the archive file (that I created) I only see 9874 lines when the file to open/read has 10000. I guess I'm trying to uderstand why some iterations are missing. I've tried it a few times and that number always varies. What am I doing wrong?
import multiprocessing
import hashlib
from tqdm import tqdm
archive = open('color_archive.txt', 'w')
def generate_hash(yellow: str) -> str:
b256 = hashlib.sha256(yellow.encode()).hexdigest()
x = ' '.join([yellow, b256])
archive.write(f"{x}\n")
if __name__ == "__main__":
listofcolors = []
with open('x.txt') as f:
for yellow in tqdm(f, desc="Generating..."):
listofcolors.append(yellow.strip())
cpustotal = cpu_count() - 1
pool = multiprocessing.Pool(cpustotal)
results = pool.imap(generate_hash, listofcolors)
pool.close()
pool.join()
print('DONE')
This script executes fine however when looking at the archive file some lines are missing for example a file with 10000 lines only wrote 9985 lines to the new file, what am I doing wrong?
Here's another way to think about the problem. Each process does its work and returns the value to the main process, which writes to the file. This is like doing a "Queue" without explicitly using a queue.
import multiprocessing
import hashlib
from tqdm import tqdm
def generate_hash(yellow: str) -> str:
b256 = hashlib.sha256(yellow.encode()).hexdigest()
return yellow + " " + b256 + "\n"
def main():
archive = open('color_archive.txt', 'w')
with open('x.txt') as f:
listofcolors = [s.strip() for s in f]
cpustotal = multiprocessing.cpu_count() - 1
pool = multiprocessing.Pool(cpustotal)
for s in pool.imap(generate_hash, listofcolors):
archive.write(s)
pool.close()
pool.join()
if __name__ == "__main__":
main()
print('DONE')