I am trying to read this .csv file into R. When I use read.csv, I either get errors related to row.names, or the column names are offset from their original columns. Based on this post I believe the problem is related to having an extra comma at the end of each line. What I can't find in the response to the previous question is how to get rid of the line ending commas.
My work around is to do the following:
pmr <-read.csv("pubmed_result.csv", header = T, row.names = NULL)
colnames(pmr) <- c(colnames(pmr)[2:ncol(pmr)], "blank")
pmr <- pmr[1:ncol(pmr)-1]
This provides the desired result, but seems a bit inelegant. Is there a way to get read.csv or read.table to ignore the last comma? Or is there a way to use gsub to fix the csv?
You are correct in your assessment that the trailing ","
is causing the issues. To be precise, it's the fact that you have a trailing ","
in the data lines but not in the line where the column names are declared.
If you don't want to manually fix the issue like you do in your code above, you could use readr::read_csv
library(tidyverse);
df <- read_csv("pubmed_result.csv");
df;
## A tibble: 375 x 11
# Title URL Description Details ShortDetails Resource Type Identifiers
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 Myoedi… /pubm… Zhang Y, Lo… Physiol… Physiol Rev… PubMed cita… PMID:29717…
# 2 Cullin… /pubm… Papizan JB,… J Biol … J Biol Chem… PubMed cita… PMID:29653…
# 3 Fusoge… /pubm… Bi P, McAna… Proc Na… Proc Natl A… PubMed cita… PMID:29581…
# 4 Correc… /pubm… Long C, Li … Sci Adv… Sci Adv. 2… PubMed cita… PMID:29404…
# 5 Single… /pubm… Amoasii L, … Sci Tra… Sci Transl … PubMed cita… PMID:29187…
# 6 Requir… /pubm… Shi J, Bi P… Proc Na… Proc Natl A… PubMed cita… PMID:29078…
# 7 Consid… /pubm… Carroll KJ,… Circ Re… Circ Res. … PubMed cita… PMID:29074…
# 8 ZNF281… /pubm… Zhou H, Mor… Genes D… Genes Dev. … PubMed cita… PMID:28982…
# 9 Functi… /pubm… Kyrychenko … JCI Ins… JCI Insight… PubMed cita… PMID:28931…
#10 Defici… /pubm… Papizan JB,… J Clin … J Clin Inve… PubMed cita… PMID:28872…
## ... with 365 more rows, and 3 more variables: Db <chr>, EntrezUID <int>,
## Properties <chr>
This will throw a bunch of warnings which originate from the missing/additional trailing ",", which you can ignore in this case. Note that column names are correctly assigned.