bashawksedduplicatesline-count

BASH: Count identical lines


I have a file that contains:

VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoicemailButtonTest
VoiceMailConfig60CharsTest
VoicemailDefaultTypeTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoiceMailIconSelectableTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsFromMessageModeScreenTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest
VoicemailSettingsTest

How do I replace the duplicate lines with counts:

VoicemailButtonTest (5)
VoiceMailConfig60CharsTest (1)
VoicemailDefaultTypeTest (1)
VoiceMailIconSelectableTest (5)
VoicemailSettingsFromMessageModeScreenTest (2)
VoicemailSettingsTest (7)

I placing the pair into an associative array. I tried using 'read' inside a 'while' statement, but the array gets lost. Here's my attempt:

unset line
tests=$(cat file.log)
echo "$tests" | 
    while read l; do 
        if [ "$l" == "${line}" ]; then
            let cnt++;
        else
            echo "${line} (${cnt})"
            line=${l}
            cnt=1
        fi
        export run_suites
    done

Solution

  • Assuming the formatting of the output doesn't exactly have to match

    VoicemailButtonTest (5)
    VoiceMailConfig60CharsTest (1)
    VoicemailDefaultTypeTest (1)
    VoiceMailIconSelectableTest (5)
    VoicemailSettingsFromMessageModeScreenTest (2)
    VoicemailSettingsTest (7)
    

    you can just use

    sort <input_file> | uniq -c
    

    If you need the output to exactly match what you posted, you can use

    awk '{duplicates[$1]++} END{for (ind in duplicates) {print ind,"("duplicates[ind]")"}}' <input_file>
    

    Edit: Posted just after anubhava's answer... but leaving (unless people suggest I delete) because of the addition of the sort command.