rofficer

Cursor location for multiple matches (Officer Package in R)


I am trying to identify the exact location to insert/replace items using Officer package in R

For example:

cursor_reach(document, "Adverse Events")

would find the first adverse event wording match only.

So if I had a text about Adverse Events, the cursor will point to that first. But what I really want is to find the heading with "Adverse Events" and so I can insert some automated text after this heading.

There does not seem to be a way to move cursor after a heading if the word adverse event appears before the heading?

Thanks!

I tried this

cursor_reach(document, "Adverse Events")

cursor_reach(document, "Adverse Events")

But this does not work...


Solution

  • You can set your cursor location directly with the following:

    # assuming your rdocx object is called document and the 
    # desired cursor location is 3
    
    document$officer_cursor$which <- 3
    

    Here's a wrapper function for figuring out where the desired cursor location should be:

    require(dplyr)
    require(xml2)
    
    cursor_reach_list <- function(x, keyword) {
      
      nodes_with_text <- xml_find_all(x$doc_obj$get(), "/w:document/w:body/*|/w:ftr/*|/w:hdr/*")
      if (length(nodes_with_text) < 1) {
        stop("no text found in the document", call. = FALSE)
      }
      text_ <- xml_text(nodes_with_text)
      test_ <- grepl(pattern = keyword, x = text_)
      if (!any(test_)) {
        stop(keyword, " has not been found in the document", 
             call. = FALSE)
      }
      # note: everything above was taken directly from officer's cursor_reach function
    
      # get the paragraph style associated with each paragraph
      style_ <- unlist(sapply(nodes_with_text,
                              function(x) {
                                ss <- xml_find_all(x, ".//w:pStyle")
                                if(length(ss) == 0) return("")
                                xml_attr(ss, "val", default = "")
                              }))
    
      # put the results in a table
      result <- data.frame(para = seq_along(text_),
                           keyword.found = test_,
                           style_id = style_) %>%
        left_join(styles_info(x) %>%
                    filter(style_type == "paragraph") %>%
                    select(style_id, style_name),
                  by = "style_id") %>%
        select(-style_id)
    
      print(result)
    }
    

    Demonstration with a simple document:

    
    # create simple document for testing #####
    doc <- read_docx()
    doc <- body_add_par(doc, "A paragraph of normal text that contains the keywords Adverse Events, and precedes any heading.")
    doc <- body_add_par(doc, "Some other text.")
    doc <- body_add_par(doc, "Header Adverse Events", style = "heading 1")
    doc <- body_add_par(doc, "Another paragraph after the header, to beef up the document.")
    print(doc, "temp_file.docx")
    rm(doc)
    
    # load document & use cursor_reach_list to identify desired location #####
    
    doc <- read_docx("temp_file.docx")
    cursor_reach_list(doc, "Adverse Events")
    
    # Result:
    #  para keyword.found style_name
    #1    1          TRUE     Normal
    #2    2         FALSE     Normal
    #3    3          TRUE  heading 1
    #4    4         FALSE     Normal
    #5    5         FALSE           
    
    # Both paragraphs 1 & 3 contain the keywords, but para 1 follows Normal style
    # while para 3 doesn't.
    
    # move cursor to para 3
    doc$officer_cursor$which <- 3 
    
    # insert text after heading
    doc <- body_add_par(doc, "additional text in next line", pos = "after")
    
    # save result to different location for ease of verification
    print(doc, "temp_file1.docx")
    

    I'm not familiar with your actual use case, so the actual changing of cursor location and insertion of new text are left as manual actions after ascertaining the appropriate location for the cursor. You can probably automate everything in a wrapper function, based on your needs.