I am working on a Python tool to migrate emails into Gmail while preserving the original Date header. My goal is simply to build a cli tool that allows to copy email from gmail account a to gmail account b, preserving all data and metadata (including date and time).
I am using the Gmail API's users.messages.insert method, as suggested in the Google support documentation. The support states that using internalDateSource: 'dateHeader' and deleted: true should enforce the Date header from the email: https://support.google.com/vault/answer/9987957?hl=en
Here is a minimal code example:
from googleapiclient.discovery import build
import base64
# Initialize Gmail API client
service = build('gmail', 'v1', credentials=your_credentials)
# Raw email with a custom Date header
raw_email = """\
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
to: recipient@gmail.com
from: sender@gmail.com
subject: Test Email
Date: Tue, 13 Aug 2024 14:00:00 +0000
This is a test email.
"""
# Encode the email
raw_email_b64 = base64.urlsafe_b64encode(raw_email.encode('utf-8')).decode('utf-8')
# Insert the email using the Gmail API
body = {
'raw': raw_email_b64,
'internalDateSource': 'dateHeader',
'deleted': True
}
response = service.users().messages().insert(userId='me', body=body).execute()
# Log the response
print(response)
Problem: Despite setting internalDateSource: 'dateHeader' and deleted: true, the Date header in the inserted email is overridden by the current timestamp. The original Date header is not preserved and the datetime of insert is used instead.
Question: Is this behavior expected, or am I missing something in the implementation? Are there additional steps required to enforce the Date header during email insertion? Any insights or workarounds would be greatly appreciated!
Verified that the Date header is correctly set in the raw email before insertion. Used the internalDateSource: 'dateHeader' parameter as per the Google support suggestion. Added the deleted: true parameter to the users.messages.insert method. Observations: The Gmail API still overrides the Date header with the current timestamp. The X-Original-Date header workaround works, but I would prefer to rely on the Date header directly.
It has to be outside body (and it works even without deleted)
body = {
'raw': raw_email_b64,
#'labelIds': ['INBOX'], # Optional: put in INBOX
}
.insert(
userId='me',
body=body,
internalDateSource='dateHeader',
#deleted=True
)
I realized it after checking help/docstring
help( service.users().messages().insert )
I even found this text on internet: insert parameters and in first line it shows
insert(userId, body=None, deleted=None, internalDateSource=None, media_body=None, media_mime_type=None, x__xgafv=None)
I run code with header Date with Sun, 17 May 2020 15:30:00 +0000 and Gmail uses this value to sort this message (difference can make my local timezone +0200)
In Show Source it also shows it as Created at and as header date in raw data.
Full working code used for tests:
import base64
import datetime
import os
import pickle
import sys
from google_auth_oauthlib.flow import InstalledAppFlow
from google.auth.transport.requests import Request
from googleapiclient.discovery import build
from email.mime.text import MIMEText
SCOPES = ['https://www.googleapis.com/auth/gmail.insert']
def get_credentials():
creds = None
if os.path.exists('token.pickle'):
with open('token.pickle', 'rb') as token:
creds = pickle.load(token)
if not creds or not creds.valid:
if creds and creds.expired and creds.refresh_token:
#print('refresh')
creds.refresh(Request()) # refresh silently
else:
flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
creds = flow.run_local_server(port=0)
with open('token.pickle', 'wb') as token:
pickle.dump(creds, token)
return creds
def create_message_1():
raw_email = """\
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
to: recipient@gmail.com
from: sender@gmail.com
subject: Test Email
Date: Tue, 13 Aug 2024 14:00:00 +0000
This is a test email.
"""
print(raw_email)
raw_message = base64.urlsafe_b64encode(raw_email.encode('utf-8')).decode('utf-8')
return raw_message
def create_message_2():
now = datetime.datetime.now()
custom_date = datetime.datetime(2020, 5, 17, 15, 30, tzinfo=datetime.UTC)
message = MIMEText(f"This email has a custom date {custom_date:%a, %d %b %Y %H:%M:%S %z} (Send: {now:%a, %d %b %Y %H:%M:%S %z})")
message['to'] = "youremail@gmail.com"
message['from'] = "someone@example.com"
message['subject'] = "Email with custom date"
message['date'] = custom_date.strftime('%a, %d %b %Y %H:%M:%S %z') # With timezone
print(message.as_bytes().decode())
raw_message = base64.urlsafe_b64encode(message.as_bytes('utf-8')).decode('utf-8')
return raw_message
# ------
creds = get_credentials()
service = build('gmail', 'v1', credentials=creds)
#help(service.users().messages().insert) # to see parameters
# https://googleapis.github.io/google-api-python-client/docs/dyn/gmail_v1.users.messages.html#insert
if len(sys.argv) == 1:
raw_message = create_message_1()
else:
raw_message = create_message_2()
inserted = service.users().messages().insert(
userId="me",
body={
'raw': raw_message,
'labelIds': ['INBOX'], # Optional: put in INBOX
#'internalDate': int(custom_date.timestamp()),
},
internalDateSource='dateHeader',
#internalDateSource='receivedTime',
#deleted=True,
).execute()
print(f"Inserted message ID: {inserted['id']}")
print(f"{inserted = }")
Sometimes inserted has only id but sometimes it has also labelIds and threadId.