r

Merging and Removing Duplicates from Multiple CSV Files (Scopus Literature Review)


I am conducting a literature review using the Scopus database. I performed multiple keyword searches and saved the results as CSV files. Each file has the same structure but different filenames. I want to merge all the CSV files into a single file while removing duplicates.

The CSV files have the following columns: Authors, Author full names, Author(s) ID, Title, Year, Source title, Volume, Issue, Art. No., Page start, Page end, Page count, Cited by, DOI, Link, Document Type, Publication Stage, Open Access, Source, EID

How can I efficiently merge these files and remove duplicates using R? I appreciate any guidance or sample code!


Solution

  • Based on my experience fread() is quite optimum for reading list of data especially if they are large datasets

    library(data.table)
    files <- list.files(pattern= "\\.csv$")
    data <- rbindlist(lapply(files, fread))
    unique_data <- unique(data)
    

    Note: x64 engines typically provide superior performance for data operations. Please ensure you are using the x64 version of R