linuxgnugnu-coreutilsgnu-sort

Why could GNU sort be hanging for 10 hours+ on this specific file


I'm attempting to merge & dedupe several different versions of the same kind of plain txt file using gnu sort that comes with Ubuntu 18 lts. I have used sort a lot almost daily with no issues sorting files of 1gb+ in size.

However, i have the following command that still couldn't complete when i left it going for 10 hours in the background (around 600mb total of data):

find backups -type f -iname 'file0.txt' -o -iname 'file1.txt' -o -iname 'file2.txt' -o -iname 'file3.txt' -exec sort -u {} + > "combined.txt"

The sort part is what is causing issues, the rest of the command is irrelevant from my testing. I have cat all the files into a single file of ~600 mb and when i try to sort -u this file, it still hangs for ever even when setting memory buffer to 80% with around 6gb free ram. I also have no issues with disk space.

While it is still running, i have dragged in an unsorted 3gb text file and successfully sorted -u it. I'm doing this in a virtual machine if that could matter.

what could cause this behavior?


Solution

  • Setting LC_ALL=C before issuing the sort command or export LC_ALL=C below the shebang at the begging of a script file solved it. Not sure why the particular text added to the text files last update caused the command to get stuck forever without LC_ALL=C though.