The procedure is as follows.
Filtering a huge File.txt
file (FASTQ file format if you are interested) by line by line through file streaming in C
.
After each filtering process, the output is a filtered_i.txt
file.
Repeat steps 1-2 with 1000 different filters.
The expected results are 1000 filtered_i.txt
files, i
from 1 to 1000.
The question is:
Can I run these filtering processes in parallel?
My concern is multiple buffers would be opened in File.txt
if do parallel. Is it safe to do? Any potential drawbacks?
There is no best answer to your problem: here are some potential issues to take into consideration:
As for all optimisation problems, you should test different approaches and measure performance.
Here is a simple script to run 20 filters in parallel:
#!/bin/bash
for i in {0..20}; do (for j in {0..50}; do ./filter_$[$j*20+$i+1]; done)& done