Given the following list of presidents do a top ten word count in the smallest program possible:
INPUT FILE
Washington Washington Adams Jefferson Jefferson Madison Madison Monroe Monroe John Quincy Adams Jackson Jackson Van Buren Harrison DIES Tyler Polk Taylor DIES Fillmore Pierce Buchanan Lincoln Lincoln DIES Johnson Grant Grant Hayes Garfield DIES Arthur Cleveland Harrison Cleveland McKinley McKinley DIES Teddy Roosevelt Teddy Roosevelt Taft Wilson Wilson Harding Coolidge Hoover FDR FDR FDR FDR Dies Truman Truman Eisenhower Eisenhower Kennedy DIES Johnson Johnson Nixon Nixon ABDICATES Ford Carter Reagan Reagan Bush Clinton Clinton Bush Bush Obama
To start it off in bash 97 characters
cat input.txt | tr " " "\n" | tr -d "\t " | sed 's/^$//g' | sort | uniq -c | sort -n | tail -n 10
Output:
2 Nixon 2 Reagan 2 Roosevelt 2 Truman 2 Washington 2 Wilson 3 Bush 3 Johnson 4 FDR 7 DIES
Break ties as you see fit! Happy fourth!
For those of you who care more information on presidents can be found here.
A shorter shell version:
xargs -n1 < input.txt | sort | uniq -c | sort -nr | head
If you want case insensitive ranking, change uniq -c
into uniq -ci
.
Slightly shorter still, if you're happy about the rank being reversed and readability impaired by lack of spaces. This clocks in at 46 characters:
xargs -n1<input.txt|sort|uniq -c|sort -n|tail
(You could strip this down to 38 if you were allowed to rename the input file to simply "i" first.)
Observing that, in this special case, no word occur more than 9 times we can shave off 3 more characters by dropping the '-n' argument from the final sort:
xargs -n1<input.txt|sort|uniq -c|sort|tail
That takes this solution down to 43 characters without renaming the input file. (Or 35, if you do.)
Using xargs -n1
to split the file into one word on each line is preferable to the tr \ \\n
solution, as that creates lots of blank lines. This means that the solution is not correct, because it misses out Nixon and shows a blank string showing up 256 times. However, a blank string is not a "word".