I want to delete lines from file 1.txt that are in file 2.txt and save the output to 3.txt, I am using this bash command:
comm -23 1.txt 2.txt > 3.txt
When I check the output in file 3.txt, I find that some common lines between 1.txt and 2.txt are still in 3.txt, take as an example the word "registry" , what is the problem?
You can download the two files below:
file 1.txt : https://ufile.io/n7vn6
file 2.txt : https://ufile.io/p4s58
I'm not sure how you generated your text files, but the problem is that some of your 1.txt
and 2.txt
lines don't have consistent line terminations. Some have a CR character (ctrl-M) but not the sole line feed Linux expects for text files. For example, one of them has registry^M
which doesn't match registry
(Linux programs that examine text will see ^M
as another character or white space but not as a line termination that gets ignored). When you look at the file with some text editors, the ^M
isn't visible so it appears registry
is the same in both places, but it isn't.
You could try:
dos2unix 1.txt 2.txt
comm -23 <(sort 1.txt) <(sort 2.txt) > 3.txt
dos2unix
will make all of the line terminations correct (assuming they might be using the DOS CR). Note that this can affect the sort a little, so I'm also resorting them. You can try this without resorting, and if there's an issue comm
will give an error that one of the files isn't sorted.