rread.tableread.csv

Extra commas at end of lines causing error with read.csv and read.table


I am trying to read this .csv file into R. When I use read.csv, I either get errors related to row.names, or the column names are offset from their original columns. Based on this post I believe the problem is related to having an extra comma at the end of each line. What I can't find in the response to the previous question is how to get rid of the line ending commas.

My work around is to do the following:

pmr <-read.csv("pubmed_result.csv", header = T, row.names = NULL)
colnames(pmr) <- c(colnames(pmr)[2:ncol(pmr)], "blank")
pmr <- pmr[1:ncol(pmr)-1]

This provides the desired result, but seems a bit inelegant. Is there a way to get read.csv or read.table to ignore the last comma? Or is there a way to use gsub to fix the csv?


Solution

  • You are correct in your assessment that the trailing "," is causing the issues. To be precise, it's the fact that you have a trailing "," in the data lines but not in the line where the column names are declared.

    If you don't want to manually fix the issue like you do in your code above, you could use readr::read_csv

    library(tidyverse);
    df <- read_csv("pubmed_result.csv");
    df;
        ## A tibble: 375 x 11
    #   Title   URL    Description  Details  ShortDetails Resource Type  Identifiers
    #   <chr>   <chr>  <chr>        <chr>    <chr>        <chr>    <chr> <chr>
    # 1 Myoedi… /pubm… Zhang Y, Lo… Physiol… Physiol Rev… PubMed   cita… PMID:29717…
    # 2 Cullin… /pubm… Papizan JB,… J Biol … J Biol Chem… PubMed   cita… PMID:29653…
    # 3 Fusoge… /pubm… Bi P, McAna… Proc Na… Proc Natl A… PubMed   cita… PMID:29581…
    # 4 Correc… /pubm… Long C, Li … Sci Adv… Sci Adv.  2… PubMed   cita… PMID:29404…
    # 5 Single… /pubm… Amoasii L, … Sci Tra… Sci Transl … PubMed   cita… PMID:29187…
    # 6 Requir… /pubm… Shi J, Bi P… Proc Na… Proc Natl A… PubMed   cita… PMID:29078…
    # 7 Consid… /pubm… Carroll KJ,… Circ Re… Circ Res.  … PubMed   cita… PMID:29074…
    # 8 ZNF281… /pubm… Zhou H, Mor… Genes D… Genes Dev. … PubMed   cita… PMID:28982…
    # 9 Functi… /pubm… Kyrychenko … JCI Ins… JCI Insight… PubMed   cita… PMID:28931…
    #10 Defici… /pubm… Papizan JB,… J Clin … J Clin Inve… PubMed   cita… PMID:28872…
    ## ... with 365 more rows, and 3 more variables: Db <chr>, EntrezUID <int>,
    ##   Properties <chr>
    

    This will throw a bunch of warnings which originate from the missing/additional trailing ",", which you can ignore in this case. Note that column names are correctly assigned.