rxmldataframetexttei

Creating a text data frame from multiple XML files


i am trying to create a single data frame in R, that contains the text from multiple xml files. I have tried to create a function that reads the xmls and uses the xml_text from the xml2 package. This is my code:

read_texts <- function(folder) {
  dir_ls(folder, glob = "*.xml") %>%
  map_dfr(
  read_xml(.) %>%
  xml_text(., trim = TRUE) %>%
  tibble()
)
}

read_texts_n <- Vectorize(read_texts)


read_texts_n("forfatterskab")

When I'm doing this I still get the error:

Error: `x` must be a string of length 1

How do I get the code to load my files. The aim is to make a single data frame, that contains all the text. I am not that experienced working with XML.


Solution

  • I don't think you need to Vectorize your function since you are using map_dfr.

    Try using the below function.

    library(xml2)
    
    read_texts <- function(folder) {
      list.files(folder, pattern = '\\.xml$', full.names = TRUE) %>%
        map_dfr(~.x %>% read_xml() %>% xml_text(trim = TRUE) %>%tibble())
    }
    
    
    result <- read_texts_n("forfatterskab")
    

    The only doubt I have is the way you pass the folder name. Usually, I would expect you pass a complete folder path to the function. Something like read_texts_n('Users/username/folder_name').