Turning xml file into dataframe

I'm trying to extract data from an xml file. I'm extracting the nodes separately with the following code:

entity_uin <- xml_text(xml_find_all(xml, ".//Entity/EntityUin"))
entity_name <- xml_text(xml_find_all(xml, ".//Entity/EntityName"))
entity_zip_code <- xml_text(xml_find_all(xml, ".//Entity/EntityZipCode"))

This way I'm getting three character vectors. Then, I'm trying to create a tibble from these character vectors with the following code:

xml <- tibble(entity_uin, entity_name, entity_zip_code)

Unfortunately, this doesn't work because the three character vectors are with unequal lengths. Can anyone suggest a solution?

Solution

Assuming(!) that some Entity nodes in your document are not complete and error is raised because some of your column vectors are shorter than others, you could first get a set of parent nodes and extract details from those with xml_find_first(). xml_find_first() output is always the same size as the input, missing matches are filled with NAs, resulting vectors are aligned and can be passed to tibble():

library(xml2)
example_xml <- 
'<?xml version="1.0" encoding="UTF-8"?>
<Entities>
  <Entity>
    <EntityUin>123456</EntityUin>
    <EntityName>ABC Corp</EntityName>
    <EntityZipCode>10001</EntityZipCode>
  </Entity>
  <Entity>
    <EntityUin>789012</EntityUin>
    <EntityName>XYZ Inc</EntityName>
    <!-- Missing EntityZipCode -->
  </Entity>
  <Entity>
    <EntityUin>345678</EntityUin>
    <EntityName>Sample LLC</EntityName>
    <EntityZipCode>90210</EntityZipCode>
  </Entity>
</Entities>'

entities <- 
  read_xml(example_xml) |>
  xml_find_all("/Entities/Entity")

tibble::tibble(
  entity_uin      = xml_find_first(entities, "./EntityUin")     |> xml_text(),
  entity_name     = xml_find_first(entities, "./EntityName")    |> xml_text(), 
  entity_zip_code = xml_find_first(entities, "./EntityZipCode") |> xml_text()
)
#> # A tibble: 3 × 3
#>   entity_uin entity_name entity_zip_code
#>   <chr>      <chr>       <chr>          
#> 1 123456     ABC Corp    10001          
#> 2 789012     XYZ Inc     <NA>           
#> 3 345678     Sample LLC  90210