rcsvimport-from-csv

Problems using # (hashtag) in string columns importing CSV in R


I have hashtags (#) in some of my string fields in a CSV file. It looks like that R has problems with it.

csv = "A;B;C
n;# 9;0
n;1;0"

read.table(text=csv, header=TRUE, sep=";", encoding="UTF-8")

Results in

Fehler in scan(file = file, what = what, sep = sep, quote = quote, dec = dec,  : 
  line 1 did not have 3 elements

The CSV file is generated by Python using the csv,QUOTE_MINIMAL style. IT means that string fiels are only enclosed with quotes if necessary (e.g. when the string itself contains a quote char). There is no way to change that. So I have to deal with the # on the R side.


Solution

  • read.table treats hash as comment by default. Change comment.char to any other value to change that.

    read.table(text=csv, header=TRUE, sep=";", encoding="UTF-8", comment.char = '@')
    
    #  A   B C
    #1 n # 9 0
    #2 n   1 0
    

    And that is why you should use read.csv() instead of read.table(). The first is the latter but with defaults making more sense for CSV files.