pythonemailgmail-api

Why does Gmail API override the Date header even with internalDateSource: 'dateHeader' and deleted: true in users.messages.insert?


I am working on a Python tool to migrate emails into Gmail while preserving the original Date header. My goal is simply to build a cli tool that allows to copy email from gmail account a to gmail account b, preserving all data and metadata (including date and time).

I am using the Gmail API's users.messages.insert method, as suggested in the Google support documentation. The support states that using internalDateSource: 'dateHeader' and deleted: true should enforce the Date header from the email: https://support.google.com/vault/answer/9987957?hl=en

Here is a minimal code example:

from googleapiclient.discovery import build
import base64

# Initialize Gmail API client
service = build('gmail', 'v1', credentials=your_credentials)

# Raw email with a custom Date header
raw_email = """\
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
to: recipient@gmail.com
from: sender@gmail.com
subject: Test Email
Date: Tue, 13 Aug 2024 14:00:00 +0000

This is a test email.
"""

# Encode the email
raw_email_b64 = base64.urlsafe_b64encode(raw_email.encode('utf-8')).decode('utf-8')

# Insert the email using the Gmail API
body = {
    'raw': raw_email_b64,
    'internalDateSource': 'dateHeader',
    'deleted': True
}
response = service.users().messages().insert(userId='me', body=body).execute()

# Log the response
print(response)

Problem: Despite setting internalDateSource: 'dateHeader' and deleted: true, the Date header in the inserted email is overridden by the current timestamp. The original Date header is not preserved and the datetime of insert is used instead.

Question: Is this behavior expected, or am I missing something in the implementation? Are there additional steps required to enforce the Date header during email insertion? Any insights or workarounds would be greatly appreciated!

Verified that the Date header is correctly set in the raw email before insertion. Used the internalDateSource: 'dateHeader' parameter as per the Google support suggestion. Added the deleted: true parameter to the users.messages.insert method. Observations: The Gmail API still overrides the Date header with the current timestamp. The X-Original-Date header workaround works, but I would prefer to rely on the Date header directly.


Solution

  • It has to be outside body (and it works even without deleted)

    body = {
        'raw': raw_email_b64,
        #'labelIds': ['INBOX'],  # Optional: put in INBOX
    }
    
    .insert(
      userId='me', 
      body=body,
      internalDateSource='dateHeader',
      #deleted=True
    )
    

    I realized it after checking help/docstring

    help( service.users().messages().insert )
    

    I even found this text on internet: insert parameters and in first line it shows

    insert(userId, body=None, deleted=None, internalDateSource=None, media_body=None, media_mime_type=None, x__xgafv=None)
    

    I run code with header Date with Sun, 17 May 2020 15:30:00 +0000 and Gmail uses this value to sort this message (difference can make my local timezone +0200)

    enter image description here

    In Show Source it also shows it as Created at and as header date in raw data.

    enter image description here


    Full working code used for tests:

    import base64
    import datetime
    import os
    import pickle
    import sys
    from google_auth_oauthlib.flow import InstalledAppFlow
    from google.auth.transport.requests import Request
    from googleapiclient.discovery import build
    from email.mime.text import MIMEText
    
    SCOPES = ['https://www.googleapis.com/auth/gmail.insert']
    
    def get_credentials():
        creds = None
    
        if os.path.exists('token.pickle'):
            with open('token.pickle', 'rb') as token:
                creds = pickle.load(token)
    
        if not creds or not creds.valid:
            if creds and creds.expired and creds.refresh_token:
                #print('refresh')
                creds.refresh(Request())  # refresh silently
            else:
                flow = InstalledAppFlow.from_client_secrets_file('credentials.json', SCOPES)
                creds = flow.run_local_server(port=0)
    
            with open('token.pickle', 'wb') as token:
                pickle.dump(creds, token)
    
        return creds
    
    def create_message_1():
        raw_email = """\
        Content-Type: text/plain; charset="us-ascii"
        MIME-Version: 1.0
        Content-Transfer-Encoding: 7bit
        to: recipient@gmail.com
        from: sender@gmail.com
        subject: Test Email
        Date: Tue, 13 Aug 2024 14:00:00 +0000
    
        This is a test email.
        """
    
        print(raw_email)
    
        raw_message = base64.urlsafe_b64encode(raw_email.encode('utf-8')).decode('utf-8')
    
        return raw_message
    
    def create_message_2():
        now = datetime.datetime.now()
        custom_date = datetime.datetime(2020, 5, 17, 15, 30, tzinfo=datetime.UTC)
    
        message = MIMEText(f"This email has a custom date {custom_date:%a, %d %b %Y %H:%M:%S %z} (Send: {now:%a, %d %b %Y %H:%M:%S %z})")
        message['to'] = "youremail@gmail.com"
        message['from'] = "someone@example.com"
        message['subject'] = "Email with custom date"
        message['date'] = custom_date.strftime('%a, %d %b %Y %H:%M:%S %z')  # With timezone
    
        print(message.as_bytes().decode())
    
        raw_message = base64.urlsafe_b64encode(message.as_bytes('utf-8')).decode('utf-8')
    
        return raw_message
    
    # ------
    
    creds = get_credentials()
    
    service = build('gmail', 'v1', credentials=creds)
    
    #help(service.users().messages().insert)  # to see parameters
    # https://googleapis.github.io/google-api-python-client/docs/dyn/gmail_v1.users.messages.html#insert
    
    if len(sys.argv) == 1:
        raw_message = create_message_1()
    else:
        raw_message = create_message_2()
    
    inserted = service.users().messages().insert(
        userId="me",
        body={
            'raw': raw_message,
            'labelIds': ['INBOX'],  # Optional: put in INBOX
            #'internalDate': int(custom_date.timestamp()),
        },
        internalDateSource='dateHeader',
        #internalDateSource='receivedTime',
        #deleted=True,
    ).execute()
    
    print(f"Inserted message ID: {inserted['id']}")
    
    print(f"{inserted = }")
    

    Sometimes inserted has only id but sometimes it has also labelIds and threadId.