rftprcurl

RCurl::getURL how to only list URLs of files inside a folder


I want to list the files in a remote SFTP server, so I did this:

    url <- "sftp://remoteserver.com/dir/"
    credentials <- "myusrname/mypwd"

    file_list <- tryCatch({
    
        RCurl::getURL(
          url,
          userpwd = credentials,
          ftp.use.epsv = FALSE,
          dirlistonly = TRUE,
          forbid.reuse = TRUE,
          .encoding = "UTF-8"
        )
    
      }, error = function(e) {
        as.character()
      })

However, in file_list, except for the URLs of the files in that folder, there are also some extra entries that I don't need:

# at the beginning of the vector
[1] "sftp://remoteserver.com/dir/."
[2] "sftp://remoteserver.com/dir/.."

# at the end of the vector
[67] "sftp://remoteserver.com/dir/"

Is there a way to avoid these entries? Is it safe to use the following code to just delete them?

file_list <- file_list[c(-1, -2)]
file_list <- file_list[-length(file_list)]

Solution

  • I don't think that's the best method in case it's not always in that order. If you want everything that is a .logs file, then I'd do something like this:

    library(dplyr)
    library(stringr)
    
    file_list <- c(
      "sftp://remoteserver.com/dir/.",
      "sftp://remoteserver.com/dir/.",
      "sftp://remoteserver.com/dir/names.logs",
      "sftp://remoteserver.com/dir/"
    )
    
    as_tibble(file_list) %>% # because it's just easier for me to think of things as dataframes 
      filter(str_detect(value, "logs$")) %>% 
      pull()
    
    
    [1] "sftp://remoteserver.com/dir/names.logs"