rcsvxlsxdata-conversion

Converting a .csv with data in one column to an .xlsx with multiple columns


I am working with .csv files automatically generated by a microscope. When I attempt to open the files in R, I find that all my data is grouped under a single column instead of under multiple columns as it should be. This formatting makes it impossible for me to analyze the data as-is in R.

Here is an example of what one of my files looks like:

"Title","Auto area measurement results data"
"Date saved","2/26/2025","4:02 PM"
"Comment",
"[ Statistics ]"
,"Area","Unit","Perimeter","Unit","Max diameter","Unit","Min diameter","Unit"
"Average",42158.27,"µm²",784.71,"µm",257.90,"µm",209.34,"µm"
"Standard Deviation",1667.74,"µm²",15.05,"µm",10.15,"µm",3.41,"µm"
"Max",43826.01,"µm²",799.76,"µm",268.05,"µm",212.76,"µm"
"Min",40490.53,"µm²",769.65,"µm",247.75,"µm",205.93,"µm"
"Total",84316.53,"µm²",1569.41,"µm",515.79,"µm",418.69,"µm"
"[ Main ]"
"No.","Area","Unit","Perimeter","Unit","Max diameter","Unit","Min diameter","Unit"
"1",43826.01,"µm²",799.76,"µm",268.05,"µm",212.76,"µm"
"2",40490.53,"µm²",769.65,"µm",247.75,"µm",205.93,"µm"
,"Area","Unit"
"Total area",84316.53,"µm²"
"Count",2,"pcs"
"Area ratio",2.55,"%"
"Total region area",3312540.76,"µm²"

Here is what it looks like in Excel with proper column organization: Excel spreadsheet

When I open each file individually and "save as" an Excel workbook, I can then import the file into R with all expected columns working properly. However, I have hundreds of files, so it is not feasible for me to individually save each one. I tried converting the CSVs using the write.xlsx function from the openxlsx package, but this yields a .xlsx file with only one column (once again, not useful for my analyses). I am looking for a way to convert each file that can be applied efficiently to hundreds of files at once (e.g., using a for loop).


Solution

  • 1) This worked when I tried it:

    library(openxlsx)
    library(tools)
    
    infiles <- Sys.glob("*.csv")
    
    for(infile in infiles) {
      outfile <- infile |> file_path_sans_ext() |> paste0(".xlsx")
      cat("converting", infile, "to", outfile, "\n")
      dat <- read.csv(infile, header = FALSE)
      write.xlsx(dat, outfile)
    }
    
    

    2) however, if for some reason your files do not work with that here is an approach that uses the gocsv and csvtk command line utilities instead of R packages. R is only used to construct and invoke the commands.

    library(tools)
    
    infiles <- Sys.glob("*.csv")
    
    for(infile in infiles) {
      outfile <- infile |> file_path_sans_ext() |> paste0(".xlsx")
      cat("converting", infile, "to", outfile, "\n")
      cmd <- sprintf("gocsv clean %s | csvtk csv2xlsx -o %s", infile, outfile)
      shell(cmd)
    }