rvestdrake-r-package

Using rvest with drake: external pointer is not valid error


When I first run the code below, everything is ok. But when I change something in html_file %>%... comand, for example commenting tolower(), I get the following error:

Error: target title failed.
diagnose(title)error$message:
  external pointer is not valid
diagnose(title)error$calls:
   1. └─html_file %>% html_nodes("h2") %>% html_text()

Code:

library(rvest)
library(drake)

some_string <- '
  <div class="main">
      <h2>A</h2>
      <div class="route">X</div>
  </div> 
'

html_file <- read_html(some_string)
title <- html_file %>% 
  html_nodes("h2") %>% 
  html_text()

plan <- drake_plan(
  html_file = read_html(some_string),
  title = html_file %>% 
    html_nodes("h2") %>% 
    html_text() %>% 
    tolower()
)

make(plan)

I found two possible solutions but I'm not enthusiastic about them.
1. Join both steps in drake_plan into one.
2. Use xml2::write_html() and xml2::read_html() as suggested here.
Is there a better way to solve it? P.S. Issue was already discussed here, Rstudio forum, and on github.


Solution

  • By default, drake saves targets as RDS files (other options here). So https://github.com/tidyverse/rvest/issues/181#issuecomment-395064636, which you brought up, is exactly the problem. I like (1) because text is compatible with RDS. Speaking broadly, it is up to the user to choose good targets compatible with drake's data storage system. See https://books.ropensci.org/drake/plans.html#how-to-choose-good-targets for a discussion and links to similar issues. But you want to go with (2), you could return the file path to your HTML file from within a dynamic file.