javafilecollectionsbufferedreader

Java Remove Duplicates from file search for String Array [0]


I have a long text file.

Now I will remove duplicates from the file. The problem is that the search parameter is the first word in the list, split by ":"

For example:

The file lines:

11234567:229283:29833204:2394803
11234567:4577546765:655776:564456456
43523:455543:54335434:53445
11234567:43455:544354:5443

Now I will have this here:

11234567:229283:29833204:2394803
43523:455543:54335434:53445

I need to get the first line from the duplicates, other will be ignored.

I tried this:

Set<String> lines11;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
    lines11 = new HashSet<>(10000); // maybe should be bigger
    String line11;
    while ((line11 = reader11.readLine()) != null) {
        lines11.add(line11);
    }
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
    for (String unique : lines11) {
        writer11.write(unique);
        writer11.newLine();
    }
}

That is working, but it removes only when the complete line is duplicated.

How can I change it so that it looks for the first word in every line and checks for duplicates here; when no duplicate is found, save the complete line; if duplicate then ignore the line?


Solution

  • You need to maintain a Set<String> that holds only the first word of each line.

    List<String> lines11;
    Set<String> dups;
    try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
        lines11 = new ArrayList<>();
        dups = new HashSet<>();
        String line11;
        while ((line11 = reader11.readLine()) != null) {
            String first = line11.split(":")[0]; // assuming your separator is :
            if (!dups.contains(first)) {
                lines11.add(line11);
                dups.add(first);
            }
        }
    }
    try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
        for (String unique : lines11) {
            writer11.write(unique);
            writer11.newLine();
        }
    }