I have a csv file. It contains several duplicate columns. I am trying to remove these duplicates using Java. I found Apache Common csv library, some people use it to remove duplicate rows. How can I use it to remove or skip duplicate columns?
For example: my csv header is:
ID Name Email Email
So far my code is:
Reader reader = Files.newBufferedReader(Paths.get("user.csv"));
// read csv file
Iterable<CSVRecord> records = CSVFormat.DEFAULT.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader);
for (CSVRecord record : records) {
System.out.println("Record #: " + record.getRecordNumber());
System.out.println("ID: " + record.get("ID"));
System.out.println("Name: " + record.get("Name"));
System.out.println("Email: " + record.get("Email"));
}
// close the reader
reader.close();
Your code is close to what you need - you just need to use CSVPrinter
to write out your data to a new file.
import java.io.IOException;
import java.io.Reader;
import java.io.Writer;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;
import org.apache.commons.csv.CSVFormat;
import org.apache.commons.csv.CSVPrinter;
import org.apache.commons.csv.CSVRecord;
public class App {
public static void main(String[] args) throws IOException {
try (final Reader reader = Files.newBufferedReader(Paths.get("source.csv"),
StandardCharsets.UTF_8)) {
final Writer writer = Files.newBufferedWriter(Paths.get("target.csv"),
StandardCharsets.UTF_8,
StandardOpenOption.CREATE); // overwrites existing output file
try (final CSVPrinter printer = CSVFormat.DEFAULT
.withHeader("ID", "Name", "Email")
.print(writer)) {
// read each input file record:
Iterable<CSVRecord> records = CSVFormat.DEFAULT
.withFirstRecordAsHeader()
.withIgnoreHeaderCase()
.withTrim()
.parse(reader);
// write each output file record
for (CSVRecord record : records) {
printer.print(record.get("ID"));
printer.print(record.get("Name"));
printer.print(record.get("Email"));
printer.println();
}
}
}
}
}
This transforms the following source file:
ID,Name,Email,Email
1,Albert,foo@bar.com,foo@bar.com
2,Brian,baz@bat.com,baz@bat.com
To this target file:
ID,Name,Email
1,Albert,foo@bar.com
2,Brian,baz@bat.com
Some points to note:
I was wrong in my comment. You do not need to use column indexes - you can use headings (as I do above) in your specific case.
Whenever reading and writing a file, it is recommended to provide the character encoding. In my case, I use UTF-8. (This assumes the original file was created as a URF-8 file, of course - or is compatible with that encoding.)
When opening the reader and the writer I use "try-with-resources" statements. These mean I do not have to explicitly close the file resources - Java takes care of that for me.