
web-scraping using R selenider on linux error --user-data-dir

I'm attempting to web-scrape using the following R code (which was obtained from this thread: link to other question


session <- selenider_session("selenium", browser = "chrome")


elements <- session |> get_page_source() |> html_elements(".item_teams__cKXQT")

res <- data.frame(
  home_team_name = elements |> 
    html_elements(".item_team__evhUQ:nth-child(1) .item_teamName__NSnfH") |> 
    html_text(trim = TRUE),
  home_team_odds = elements |> 
    html_elements(".item_team__evhUQ:nth-child(1) .item_odd__Lm2Wl") |> 
    html_text(trim = TRUE),
  away_team_name = elements |> 
    html_elements(".item_team__evhUQ:nth-child(3) .item_teamName__NSnfH") |> 
    html_text(trim = TRUE),
  away_team_odds = elements |> 
    html_elements(".item_team__evhUQ:nth-child(3) .item_odd__Lm2Wl") |> 
    html_text(trim = TRUE),
  match_date = elements |> 
    html_elements(".item_scores__Vi7YX .item_date__g4cq_") |> 
    html_text(trim = TRUE),
  match_time = elements |> 
    html_elements(".item_scores__Vi7YX .item_time__xBia_") |> 
    html_text(trim = TRUE),
  match_type = elements |> 
    html_elements(".item_scores__Vi7YX .item_bo__u2C9Q") |> 
    html_text(trim = TRUE)

This code works fine when I run it locally on Windows 10, however, I have linux server running that I'd like this script to run on. When I run it on linux I get the following error:

Error in `create_selenium_client_internal()`:
! A Selenium session could not be started
Caused by error in `httr2::req_perform()`:
! HTTP 500 Internal Server Error.
✖ Session not created.
✖ Could not start a new session. Error while creating session with the driver service. Stopping driver service: Could not start a new session. Response code 500. Message: probably user data directory is already in use, please specify a unique value for --user-data-dir argument, or don't use --user-data-dir 
  Host info: host: 'Unknown', ip: 'Unknown'
  Build info: version: '4.29.0', revision: '18ae989'
  System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.15.0-134-generic', java.version: '11.0.26'
  Driver info: driver.version: unknown
  Build info: version: '4.29.0', revision: '18ae989'
  System info: os.name: 'Linux', os.arch: 'amd64', os.version: '5.15.0-134-generic', java.version: '11.0.26'
  Driver info: driver.version: unknown

I've also attempted creating and setting a directory manually

server_options = selenium_options(server_options = selenium_server_options(extra_args = c("--user-data-dir=/tmp/testing")))

session <- selenider_session(
  browser = "chrome",
  options = server_options

Which only ends up with the same error. I've tried killing all chrome processes running as well, it doesn't seem to help. Is there a way to fix this issue?

Another important note is that I have some other python selenium scripts that work fine on the server. In those scripts, there is no setting of --user-data-dir manually. I'm trying to transition my code to R as I'm much more proficient in R as compared to Python.


  • With assistance from TimG's method I've found a working solution. It is a more manual way of launching chromote and then utilizing rvest. Here is my working code, where instead of reading a bunch of elements, I'm simply grabbing some team names that are on the page.

    b <- chromote::ChromoteSession$new()
    # if we don't set some headers, the javascript on the page will not load
    # due to cloudflare blockage
      width = 1280,
      height = 800,
      deviceScaleFactor = 1,
      mobile = FALSE
    user_agent <- "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36"
    b$Emulation$setUserAgentOverride(userAgent = user_agent)
    html <- b$Runtime$evaluate("document.documentElement.outerHTML")$result$value
    parsed_html = read_html(html)
    teams = parsed_html %>%
      rvest::html_elements(".item_teamName__NSnfH") %>%