I have a long text file.
Now I will remove duplicates from the file. The problem is that the search parameter is the first word in the list, split by ":"
For example:
The file lines:
11234567:229283:29833204:2394803 11234567:4577546765:655776:564456456 43523:455543:54335434:53445 11234567:43455:544354:5443
Now I will have this here:
11234567:229283:29833204:2394803 43523:455543:54335434:53445
I need to get the first line from the duplicates, other will be ignored.
I tried this:
Set<String> lines11;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new HashSet<>(10000); // maybe should be bigger
String line11;
while ((line11 = reader11.readLine()) != null) {
lines11.add(line11);
}
} // maybe should be bigger
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}
That is working, but it removes only when the complete line is duplicated.
How can I change it so that it looks for the first word in every line and checks for duplicates here; when no duplicate is found, save the complete line; if duplicate then ignore the line?
You need to maintain a Set<String>
that holds only the first word of each line.
List<String> lines11;
Set<String> dups;
try (BufferedReader reader11 = new BufferedReader(new FileReader("test.txt"))) {
lines11 = new ArrayList<>();
dups = new HashSet<>();
String line11;
while ((line11 = reader11.readLine()) != null) {
String first = line11.split(":")[0]; // assuming your separator is :
if (!dups.contains(first)) {
lines11.add(line11);
dups.add(first);
}
}
}
try (BufferedWriter writer11 = new BufferedWriter(new FileWriter("test.txt"))) {
for (String unique : lines11) {
writer11.write(unique);
writer11.newLine();
}
}