emaildelphiindymimeeml

Reading EML-FIles with Delphi


I want to read eml-files and extract the plain text.

So far i have found the TIdMessage with which i can iterate over the TIdMessage.MessageParts and check if their PartType is mptText. All of that works quite well.

My problem is reading Messages correctly if TIdMessage.Encoding = TIdMessageEncoding.meMIME i just can´t get behind the logic of that format. I would like to get the whole text without tags from the EML-File. Is there always a text/plain-Part in a mail?

Until now i´v got the following two functions which return the html-Content for a Message.

function GetMultiPartAlternative(aMsg: TIdMessage; aParentIndex, aLastIndex: Integer): String;
var
  Part: TIdMessagePart;
  i: Integer;
begin
  Result := '';

  for i := aLastIndex - 1 downto aParentIndex + 1 do
  begin
    Part := aMsg.MessageParts.Items[i];
    if { (Part.ParentPart = aParentIndex) and } (Part is TIdText) then
    begin
      if Part.ContentType.StartsWith('text/html') then
      begin
        Result := (Part as TIdText).Body.Text;
        Exit;
      end
      else if Part.ContentType.StartsWith('text/plain') then
      begin
        Result := (Part as TIdText).Body.Text;
        Exit;
      end;
    end;
  end;
end;

function GetMultiPartMixed(aMsg: TIdMessage; aParentIndex, aLastIndex: Integer): String;
var
  Part: TIdMessagePart;
  i: Integer;
begin
  Result := '';

  for i := aLastIndex - 1 downto aParentIndex + 1 do
  begin
    Part := aMsg.MessageParts.Items[i];

    if { (Part.ParentPart = aParentIndex) and } (Part is TIdText) then
    begin
      if Part.ContentType.StartsWith('multipart/alternative') then
      begin
        Result := GetMultiPartAlternative(aMsg, aParentIndex, aLastIndex);
        Exit;
      end
      else if Part.ContentType.StartsWith('text/html') then
      begin
        Result := (Part as TIdText).Body.Text;
        Exit;
      end
      else if Part.ContentType.StartsWith('text/plain') then
      begin
        Result := (Part as TIdText).Body.Text;
        Exit;
      end;
      aLastIndex := i;
    end;
  end;
end;

Solution

  • TIdMessage uses the MessageParts collection for MIME emails. Your code is fine for accessing individual MIME parts (and +1 for iterating the parts in the correct order!). Simply ignore HTML parts if you are only interested in PlainText parts.

    Is there always a text/plain-Part in a mail?

    Unfortunately, no. It depends on what formats the sender decides to include. It is customary but not required for an HTML email to include a PlainText alternative for readers that don't understand HTML.

    Please read this article on Indy's blog: HTML Messages (it's geared towards sending emails, but it does describe the TIdMessage layout for common scenarios you can encounter when reading emails, too).

    Having HTML without a PlainText alternative is a real possibility you need to account for. If there is no PlainText provided then you will have to parse out the text from the HTML instead.


    On a side note: you should not use ContentType.StartsWith('...'), as that is not very accurate. Use IsHeaderMediaType(ContentType, '...') instead, eg:

    if IsHeaderMediaType(Part.ContentType, 'text/html') then
    

    IsHeaderMediaType() is declared in the IdGlobalProtocols unit.