bashterminalmbox

How to split an mbox file into n-MB big chunks using the terminal?


So I've read through this question on SO but it does not quite help me any. I want to import a Gmail generated mbox file into another webmail service, but the problem is it only allows 40 MB huge files per import.

So I somehow have to split the mbox file into max. 40 MB big files and import them one after another. How would you do this?

My initial thought was to use the other script (formail) to save each mail as a single file and afterwards run a script to combine them to 40 MB huge files, but still I wouldnt know how to do this using the terminal.

I also looked at the split command, but Im afraid it would cutoff mails. Thanks for any help!


Solution

  • If your mbox is in standard format, each message will begin with From and a space:

    From someone@somewhere.com
    

    So, you could COPY YOUR MBOX TO A TEMPORARY DIRECTORY and try using awk to process it, on a message-by-message basis, only splitting at the start of any message. Let's say we went for 1,000 messages per output file:

    awk 'BEGIN{chunk=0} /^From /{msgs++;if(msgs==1000){msgs=0;chunk++}}{print > "chunk_" chunk ".txt"}' mbox
    

    then you will get output files called chunk_1.txt to chunk_n.txt each containing up to 1,000 messages.

    If you are unfortunate enough to be on Windows (which is incapable of understanding single quotes), you will need to save the following in a file called awk.txt

    BEGIN{chunk=0} /^From /{msgs++;if(msgs==1000){msgs=0;chunk++}}{print > "chunk_" chunk ".txt"}
    

    and then type

    awk -f awk.txt mbox