I have a quarantine folder that I periodically have to download and split by recipient inbox or even better split each message in a text file. I have c.a. 10.000 mails per day and I'm coding something with fetchmail and procmail. The problem is that i can't find out how to split message-by-message in procmail; they all end up in the same inbox.
I tried to pass every message in a script via a recipe like:
:0
| script_processing_messages.sh
Which contained
read varname
echo "$varname" > test_file
To try to see if I could obtain a single message in the $varname variable but nope, I only obtain a single line of a message each time.
Right now I use
fetchmail --keep
where .fetchmailrc is
poll mail.mymta.my protocol pop3 username "my@inbox.com" password "****" mda "procmail /root/.procmailrc"
and .procmailrc is
VERBOSE=0
DEFAULT=/root/inbox.quarantine
I would like to obtain a file for each message, so:
1.txt
2.txt
3.txt
[...]
10000.txt
I have many recipients and many domains, so I can't let's say write 5000 rules to match every recipient. It would be good if there was some kind of
^To: $USER
that redirect to
/$USER.inbox
so that procmail itself takes care of reading and creating dinamically these inbox
I'm not very expert in fetchmail and procmail recipes, I'm trying hard but I'm not going so far.
You seem to have two or three different questions; proper etiquette on Stack Overflow would be to ask each one separately - this also helps future visitors who have just one of your problems.
First off, to split a Berkeley mbox file containing multiple messages and run Procmail on each separately, try
formail -s procmail -m <file.mbox
You might need to read up on the mailbox formats supported by Procmail. A Berkeley mailbox is a single file which contains multiple messages, simply separated by a line beginning with From
(with a space after the four alphabetic characters). This separator has to be unique, and so a message which contains those five characters at beginning of a line in the body will need to be escaped somehow (typically by writing a >
before From
).
To save each message in a separate file, choose a different mailbox format than the single-file Berkeley format. Concretely, if the destination is a directory, Procmail will create a new file in that directory. How exactly the new file is named depends on the contents of the directory (if it contains the Maildir subdirectories new
, tmp
, and cur
, the new file is created in new
in accordance with Maildir naming conventions) and on how exactly the directory is specified (trailing slash and dot selects MH format; otherwise, mail directory format).
Saving to one mailbox per recipient has a number of pesky corner cases. What if the message was sent to more than one of your local recipients? What if the recipient address is not visible in the headers? etc (the Procmail Mini-FAQ has a section about this, in the context of virtual hosting of a domain, which this is basically a variation of). But if we simply ignore these, you might be able to pull it off with something like
:0 # whitespace before ] is a literal tab
* ^TO_\/[^ @ ]+@(yourdomain\.example|example\.info)\>
{
# Trim domain part from captured MATCH
:0
* MATCH ?? ^\/[^@]+
./$MATCH/
}
This will capture into $MATCH
the first address which matches the regex, then perform another regex match on the captured string to capture just the part before the @
sign. This obviously requires that the addresses you want to match are all in a set of specific domains (here, I used yourdomain.example
and example.info
; obviously replace those with your actual domain names) and that capturing the first matching address is sufficient (so if a message was To: alice@yourdomain.example
and Cc: bob@example.info
, whichever one of those is closer to the top of the message will be picked out by this recipe, and the other one will be ignored).
In some more detail, the \/
special token causes Procmail to copy the text which matched the regex after this point into the internal variable MATCH
. As this recipe demonstrates, you can then perform a regex match on that variable itself to extract a substring of it (or, in other words, discard part of the captured match).
The action ./$MATCH/
uses the captured string in MATCH
as the name of the folder to save into. The leading ./
specifies the current directory (which is equal to the value of the Procmail variable MAILDIR
) and the trailing /
selects mail directory format.
If your expected recipients cannot be constrained to be in a specific set of domains or otherwise matched by a single regex, my recommendation would be to ask a new question with more limited scope, and enough details to actually identify what you want to accomplish.