rfor-loopoutlookrdcomclient

In R: Search all emails by subject line, pull comma-separate values from body, then save values in a dataframe


Each day, I get an email with the quantities of fruit sold on a particular day. The structure of the email is as below:

Date of report:,04-JAN-2022
Time report produced:,5-JAN-2022 02:04
Apples,6
Pears,1
Lemons,4
Oranges,2
Grapes,7
Grapefruit,2

I'm trying to build some code in R that will search through my emails, find all emails with a particular subject, iterate through each email to find the variables I'm looking for, take the values and place them in a dataframe with the "Date of report" put in a date column.

With the assistance of people in the community, I was able to achieve the desired result in Python. However as my project has developed, I need to now achieve the same result in R if at all possible.

Unfortunately, I'm quite new to R and therefore if anyone has any advice on how to take this forward I would greatly appreciate it.

For those interested, my Python code is below:

#PREP THE STUFF
Fruit_1 = "Apples"
Fruit_2 = "Pears"
searchf = [
    Fruit_1, 
    Fruit_2
]

#DEF THE STUFF
def get_report_vals(report, searches):
    dct = {}
    for line in report:
        term, *value = line
        if term.casefold().startswith('date'):
            dct['date'] = pd.to_datetime(value[0])
        elif term in searches:
            dct[term] = float(value[0])
    if len(dct.keys()) != len(searches):
    dct.update({x: None for x in searches if x not in dct})
return dct


#DO THE STUFF
outlook = win32com.client.Dispatch("Outlook.Application").GetNamespace("MAPI")
inbox = outlook.GetDefaultFolder(6) 
messages = inbox.Items
messages.Sort("[ReceivedTime]", True)

results = []

for message in messages:
    if message.subject == 'FRUIT QUANTITIES':
        if Fruit_1 in message.body and Fruit_2 in message.body:
            data = [line.strip().split(",") for line in message.body.split('\n')]
            results.append(get_report_vals(data, searchf))
        else:
            pass

fruit_vals = pd.DataFrame(results)
fruit_vals.columns = map(str.upper, fruit_vals.columns)

I'm probably going about this the wrong way, but I'm trying to use the steps I took in Python to achieve the same result in R. So for example I create some variables to hold the fruit sales I'm searching for, then I create a vector to store the searchables, and then when I create an equivalent 'get_vals' function, I create an empty vector.

library(RDCOMClient)

Fruit_1 <- "Apples"
Fruit_2 <- "Pears"
##Create vector to store searchables
searchf <- c(Fruit_1, Fruit_2)
## create object for outlook
OutApp <- COMCreate("Outlook.Application")
outlookNameSpace = OutApp$GetNameSpace("MAPI")

search <- OutApp$AdvancedSearch("Inbox", "urn:schemas:httpmail:subject = 'FRUIT QUANTITIES'")
inbox <- outlookNameSpace$Folders(6)$Folders("Inbox")

vec <- c()

for (x in emails)
{
  subject <- emails(i)$Subject(1)
  if (grepl(search, subject)[1])
  {
    text <- emails(i)$Body()
    print(text)
    break
  }
}

Solution

  • The code now functions as intended. Function was written quite a bit differently from those recommended:

    get_vals <- function(email) {
      body <- email$body()
      date <- str_extract(body, "\\d{2}-[:alpha:]{3}-\\d{4}") %>% 
        as.character()
      data <- read.table(text = body, sep = ",", skip = 9, strip.white = T) %>% 
        row_to_names(1) %>% 
        mutate("Date" = date)
    
      return(data)
    }
    

    In addition I've written this to bind the rows together:

    info <- sapply(results, get_vals, simplify = F) %>% 
      bind_rows()