rhpc

How to output messages (errors, warnings etc) from R running on HPC as batch task


I am starting with just a small chunk of bigger code in R; this code does not require HPC but the steps following it do. The code behaves as expected when I run it in interactive mode. When running the code as a batch job it 'completed' according to slurm and the .out file was empty, but the expected output - a .csv - is not created.

I want to learn how to get R to save its error messages, warnings etc into a text file that I can access after the code has run as a batch file, so I can diagnose what is going wrong, in this instance and as I run more complex jobs.

Following this solution to a similar query, but about running R from the command line, I have tried to use sink(). My job now ends in an error:

"rror: unexpected input in "msg <- file("/niguanak/nigSp_1.Rout", open = "wt") Execution halted

The current code that ends in the error is below. The previous, which 'completed' but did not output the expected .csv file, was the same but lacking the top two and bottom two lines that are related to sink. niguanak is the folder I am running the job from and it is there I have created a blank file nigSp_1.Rout, which remains blank after running the below code.

msg  <- file("/niguanak/nigSp_1.Rout", open = "wt")
sink(msg, type = "message")

library(tidyverse)

#combine sp download and Dryas integrifolia missed from sp download
Ni.sp.dl <- read.csv("/hpcfs/users/a1233466/niguanak/data/0039174-240321170329656.csv")
DI.dl <- read.csv("/hpcfs/users/a1233466/niguanak/data/0045407-240321170329656.csv")
Ni.sp.dl <- rbind(Ni.sp.dl, DI.dl, make.row.names = FALSE)
rm(DI.dl)

#get rid of occurrenceStatus == Absent
Ni.sp.dl <- Ni.sp.dl %>%
  filter(occurrenceStatus == "PRESENT")

#get rid of any less than 15º for invasive
Ni.sp.dl <- Ni.sp.dl %>%
  filter(decimalLatitude > 15)

#Checked this with coordinates cleaner on laptop. None seemed problematic for this analysis.

###reshape and save
Ni.sp.dl_df <- as.data.frame(cbind(Ni.sp.dl$gbifID, Ni.sp.dl$species,
                              Ni.sp.dl$decimalLatitude, Ni.sp.dl$decimalLongitude))

colnames(Ni.sp.dl_df) <- c("ind_id", "tax", "lat", "lon")

#Make sure the numeric columns are numeric!
Ni.sp.dl_df$lat <- as.numeric(Ni.sp.dl_df$lat)
Ni.sp.dl_df$lon <- as.numeric(Ni.sp.dl_df$lon)
Ni.sp.dl_df$ind_id <- as.numeric(Ni.sp.dl_df$ind_id)

#write out
write.csv(Ni.sp.dl_df, file = "/hpcfs/users/a1233466/niguanak/Ni.sp.dl_df_b.csv")

sink(type="message")
close(msg)

I have read the examples and help file for sink() and file() but didn't figure out what I am doing wrong from there either.

Additionally, the complete code with sink etc as above, also seems to work as expected when run interactively.

EDIT:

In trying out various (desperate) solutions I ended up with the following code.

   #capture output
sink("nigSp_1.out", type =  "messages")

#load required packages
library(tidyverse)

#combine downlaod and Dryas integrifolia missed from sp download
Ni.sp.dl <- read.csv("/hpcfs/users/a1233466/niguanak/data/0039174-240321170329656.csv")
DI.dl <- read.csv("/hpcfs/users/a1233466/niguanak/data/0045407-240321170329656.csv")
Ni.sp.dl <- rbind(Ni.sp.dl, DI.dl, make.row.names = FALSE)
rm(DI.dl)

#get rid of occurrenceStatus == Absent
Ni.sp.dl <- Ni.sp.dl %>%
  filter(occurrenceStatus == "PRESENT")

#get rid of any less than 15 for invasive
Ni.sp.dl <- Ni.sp.dl %>%
  filter(decimalLatitude > 15)

#Checked this with coordinates cleaner on laptop. None seemed problematic for this analysis.

###reshape and save
Ni.sp.dl_df <- as.data.frame(cbind(Ni.sp.dl$gbifID, Ni.sp.dl$species,
                              Ni.sp.dl$decimalLatitude, Ni.sp.dl$decimalLongitude))

colnames(Ni.sp.dl_df) <- c("ind_id", "tax", "lat", "lon")

#Make sure the numeric columns are numeric!
Ni.sp.dl_df$lat <- as.numeric(Ni.sp.dl_df$lat)
Ni.sp.dl_df$lon <- as.numeric(Ni.sp.dl_df$lon)
Ni.sp.dl_df$ind_id <- as.numeric(Ni.sp.dl_df$ind_id)

#write out in case of issues
write.csv(Ni.sp.dl_df, file = "/hpcfs/users/a1233466/niguanak/Ni.sp.dl_df.csv")

#return output to console
sink()

This 'completes' in 1 second, but doesn't have any errors in the .err file, no output in the .out file, no "nigSp_1.out" is made nor "Ni.sp.dl_df.csv"


Solution

  • In answer to my original question, adding the following to my submission script does output the errors and console outputs (so long as your Rscript is not just a big comment!)

    #SBATCH -o /hpcfs/users/username/folder/%j.out
    #SBATCH -e /hpcfs/users/username/folder/%j.err
    

    The underlying issue as to why this didn't appear to work at first was that nano was hiding a formatting issue. My code was authored on my computer and pasted into nano; I was using a short example as I got used to a new HPC system. It looked fine in nano but when my colleague looked at it in vim, it was using windows/dos line feeds not posix. Because of that it 'saw' my whole script as a comment, hence no errors or outputs. Having corrected that, it now behaves and outputs as expected.

    I also found that you can control the line feeds in Rstudio under tools > global options > code > saving > line end conversion. Switched it to posix. I am now able to cut and paste from code I have working on my PC.