How to use Commons CSV remove duplicate in csv file using Java?

I have a csv file. It contains several duplicate columns. I am trying to remove these duplicates using Java. I found Apache Common csv library, some people use it to remove duplicate rows. How can I use it to remove or skip duplicate columns?

For example: my csv header is:

ID Name Email Email

So far my code is:

Reader reader = Files.newBufferedReader(Paths.get("user.csv"));
 
            // read csv file
            Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
                    .withIgnoreHeaderCase()
                    .withTrim()
                    .parse(reader);
        
            for (CSVRecord record : records) {
                System.out.println("Record #: " + record.getRecordNumber());
                System.out.println("ID: " + record.get("ID"));
                System.out.println("Name: " + record.get("Name"));
                System.out.println("Email: " + record.get("Email"));
                
            }
        
            // close the reader
            reader.close();

Solution

Your code is close to what you need - you just need to use CSVPrinter to write out your data to a new file.

import java.io.IOException;
import java.io.Reader;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;

public class App {

    public static void main(String[] args) throws IOException {

        try (final Reader reader = Files.newBufferedReader(Paths.get("source.csv"),
                StandardCharsets.UTF_8)) {

            final Writer writer = Files.newBufferedWriter(Paths.get("target.csv"),
                    StandardCharsets.UTF_8,
                    StandardOpenOption.CREATE); // overwrites existing output file

            try (final CSVPrinter printer = CSVFormat.DEFAULT
                    .withHeader("ID", "Name", "Email")
                    .print(writer)) {
                
                // read each input file record:
                Iterable<CSVRecord> records = CSVFormat.DEFAULT
                        .withFirstRecordAsHeader()
                        .withIgnoreHeaderCase()
                        .withTrim()
                        .parse(reader);
                
                // write each output file record
                for (CSVRecord record : records) {
                    printer.print(record.get("ID"));
                    printer.print(record.get("Name"));
                    printer.print(record.get("Email"));
                    printer.println();
                }
            }
        }
    }
}

This transforms the following source file:

ID,Name,Email,Email
1,Albert,foo@bar.com,foo@bar.com
2,Brian,baz@bat.com,baz@bat.com

To this target file:

ID,Name,Email
1,Albert,foo@bar.com
2,Brian,baz@bat.com

Some points to note:

I was wrong in my comment. You do not need to use column indexes - you can use headings (as I do above) in your specific case.
Whenever reading and writing a file, it is recommended to provide the character encoding. In my case, I use UTF-8. (This assumes the original file was created as a URF-8 file, of course - or is compatible with that encoding.)
When opening the reader and the writer I use "try-with-resources" statements. These mean I do not have to explicitly close the file resources - Java takes care of that for me.