rcsvscopus

Scopus_ReadCSV {CITAN} not working with csv file exported from Scopus


I am using Rstudio with R 3.3.1 on Windows 7 and I have installed CITAN package. I am trying to import bibliography entries from a CSV file that I exported from Scopus (as it is, untouched), choosing to export all available information.

This is the error that I get:

example <- Scopus_ReadCSV("scopus.csv")

Error in Scopus_ReadCSV("scopus.csv") : Column not found: `Source'. In addition: Warning messages:

1: In read.table(file = file, header = header, sep = sep, quote = quote, : invalid input found on input connection 'scopus.csv'

2: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'scopus.csv'

Column `Source' is there when I open the file, so I do not know why it says 'not found'.


Solution

  • Eventually I came into the following conclusions:

    1. The encoding of the CSV file as exported from Scopus was UTF-8-BOM, which does not seem to be recognized from R when using Scopus_readCSV("file.csv") or read.table("file.csv", header = TRUE, sep = ",", fileEncoding = "UTF-8").

    2. Although it is used an encoding type for the file from Scopus, there can be found some "strange" non-english characters which are not readable from the read function in R. (Mainly found this problem in names with special characters)

    Solutions for those issues:

    1. Open the CSV file with a notepad application like the Notepad++ and save the file with UTF-8 encoding to become readable for R as UTF-8.

    2. When running the read function in R you will notice that it stops reading (e.g. in the 40th out of 200 registries). See where exactly it stopped and this way you can find the special character, by opening the CSV with the notepad, and then you can erase/change it as you wish in order to not have the same issue in R again.