I work with Postfix and Dovecot for SMTP and IMAP. They are on the latest CentOS 7 available versions and the messages are stored in Maildir format.
We have made an agreement with Google and our mailboxes will be transfered to them very soon.
We have this mailserver infrastructure since the nineties. So, some messages have an old "Date" header format, with the year field as "yy". Since Google demands it to be "yyyy", they told me that I need to convert this information on every needed message before the Imap import to their cloud. This follows the RFC2060 and 3501.
This is an university and theses old messages contains research data that should be preserved.
Here is an example:
date: Thu, 20 Apr 17 15:45:15 +0000
should be:
date: Thu, 20 Apr 2017 15:45:15 +0000
I've been looking for script to perform this fix by keeping the header, the date, and only fix the year in every needed file and without changing the file timestamp (some mail clients use this as sorting method). But I haven't found any.
So, is there anyone that can help me?
Thank you.
You cannot modify a file without changing its time stamp; but you can keep the original time stamp and apply it back with touch
, as indicated in a separate answer.
Finding the broken Date:
headers isn't too hard, either, especially if the messages are sent by a small set of clients which are all uniformly broken in the same way. You can find many, many different violations of the RFCs in the wild, though, so probably perform a test run to extract all Date: headers which aren't in one of the expected formats before you go ahead with modifications.
find Maildir -type f -exec sh -c 'for f; do
sed -n "/^\$/q;/^[Dd][Aa][Tt][Ee]:/p" "$f"; done' _ {} +
The -exec ... +
is a GNU extension which mimics xargs
in that it will pass as many of the found files as possible as arguments to the process started by -exec
.
You can augment the regex after [Dd][Aa][Tt][Ee]:
to search for date headers matching a particular erroneous Date:
format.
If you can verify that all the erroneous messages are similar to your sample,
sed -i '1,/^$/!b;s/^\([Dd][Aa][Tt][Ee]: [A-Z][a-z][a-z], [ 0-3][0-9] [A-Z][a-z][a-z] \)\([7-9][0-9] \)/\119\2/;s/^\([Dd][Aa][Tt][Ee]: [A-Z][a-z][a-z], [ 0-3][0-9] [A-Z][a-z][a-z] \)\([01][0-9] \)/\120\2/'
might be at least a good start towards fixing the erroneous messages.
Pulling everything together, the final script might look something like
find Maildir -type f -exec sh -c 'for f; do
timestamp=$(stat -c "%y" "$f")
sed -i "1,/^\$/!b;s/^\(Date: [A-Z][a-z][a-z], [ 0-3][0-9] [A-Z][a-z][a-z] \)\([7-9][0-9] \)/\119\2/;s/^\(Date: [A-Z][a-z][a-z], [ 0-3][0-9] [A-Z][a-z][a-z] \)\([01][0-9] \)/\120\2/" "$f"
touch -d "$timestamp" "$f"
done' _ {} +
My prediction is that your final sed
script will need to be quite a lot more complex if you need to deal with decades of buggy mail clients from the strongholds of intellectual creativity like Lotus, Yahoo!, and Microsoft. The peskiest are perhaps the ones which have incorrectly been localized - you can probably guess that Märtz is March, but good luck with marraskuu or 十一月...