rofficer

R: Automate generating word documents based on dataframe


I am trying to find a way to automate generating word documents based on a data frame containing addresses that I would like inserted into separate word documents:

`DF <- data.frame(Company=c('Alpha','Beta', 'Gamma'),
             Name= c('John Doe', 'Jane Doe', 'Jack Smith'),
             Address=c('123 Main St', '234 Maple St', '456 30th St' ),
             Address_Line_2=c('Suite A', '', 'Suite B'),
             City_State_Zip=c('Los Angeles, CA 12345', 'New York City, NY 23456', 'Chicago, IL 3456'))`

I have a word doc template I use and I use "Find and Replace" function code for generating word documents:

enter image description here:

document <- read_docx("C:\\Mock_Path\\Letter.docx")
document <- document %>%
officer::body_replace_all_text(., old_value = "Today’s Date", new_value = "July 24, 2023", only_at_cursor = FALSE, ignore.case = TRUE) %>%
officer::body_replace_all_text(., old_value = "Company Name",new_value = "Alpha", only_at_cursor = FALSE, ignore.case = TRUE) %>%
officer::body_replace_all_text(., old_value = "First Name/Last Name", new_value = "John Doe", only_at_cursor = FALSE, ignore.case =TRUE) %>% 
officer::body_replace_all_text(., old_value = "Company Address", new_value = "123 Main St", only_at_cursor = FALSE, ignore.case =TRUE)%>%
officer::body_replace_all_text(., old_value = "Address Line 2", new_value = "Suite A", only_at_cursor = FALSE, ignore.case = TRUE)%>%
officer::body_replace_all_text(document, old_value = "City, State Zip", new_value = "Los Angeles, CA 1234", only_at_cursor =FALSE, ignore.case = TRUE)
print(document ,""C:\\Mock_Path\\Addressed_Letter.docx")`

My method for using a find and replace function from a template works but since my actual data frame is much larger, I was wondering if there is a way to automate the process of printing multiple word documents working down the rows in my data frame. I am hoping to generate something like this: enter image description hereI could really appreciate any help or advice. Thanks!


Solution

  • Having had a closer look at this, I realise that the body_replace_all_text() modifies the document in place. This means that after you've made the replacements the first time, you replace e.g. "Company Name" with "Alpha", and then the text "Company Name" no longer appears. This will lead to all the documents being the same.

    The easiest way around this is to read in the template each time we apply the function. We can also use Reduce() to apply the replacements iteratively and avoid repetition:

    # Function to replace text
    replace_all_text <- function(df, replacement_list, doc_path = "C:\\Mock_Path\\Letter.docx") {
        document <- read_docx(doc_path)
        Reduce(
            \(doc, repl)
            body_replace_all_text(doc, repl["pattern"], df[[repl["repl"]]], ignore.case = TRUE),
            replacement_list,
            init = document
        )
    }
    

    Then define the replacements and split the data frame. The repl value refers to the column of your original data frame which contains the replacement.

    # Create list of replacements
    replacement_list <- list(
        c(pattern = "Today’s Date", repl = "date"),
        c(pattern = "Company Name", repl = "Company"),
        c(pattern = "First Name/Last Name", repl = "Name"),
        c(pattern = "Company Address", repl = "Address"),
        c(pattern = "Address Line 2", repl = "Address_Line_2"),
        c(pattern = "City, State Zip", repl = "City_State_Zip")
    )
    
    # Add date to the data.frame
    DF$date <- format(Sys.Date(), "%B %d, %Y")
    df_split <- split(DF, seq_len(nrow(DF)))
    

    Then simply apply the function to each document.

    # Apply the function to each document
    replaced_docs_list <- lapply(df_split, \(df) replace_all_text(df, replacement_list)) |>
        setNames(DF$Company)
    

    The output will be a named list, with each item in the list being a document for each company in the data frame. This makes it easy to save each document (which bizarrely officer does using the print() function:

    purrr::iwalk(replaced_docs_list, \(doc, company) {
        outfile <- paste0(company, ".docx")
        message("Saving doc: ", outfile)
        print(doc, target = outfile)
    })