pythoncsvcounterdefaultdict

This counter code is counting cells twice


I'm using this code to print the occurrences of non-numeric cells. However, it is doubling the count. Its printing 6 for 3.

Sample data:

pID,sID,dID,nID,ID
ABCD-02-01,ABCD-02-01-0002-UNK,2,123,ABCD
ABCD-02-01,ABCD-02-01-0004-UNK,3,1234,ABCD
ABCD-02-01,ABCD-02-01-0007-UNK,7,3455,ABCD

Code:

#!/usr/bin/env python
from collections import Counter, defaultdict
import csv

header_counter = defaultdict(Counter)

with open('trial.csv') as input_file:
    r = csv.reader(input_file, delimiter=',')
    headers = next(r)
    for row in r:
        row_val = sum([w.isdigit() for w in row])
        for header, val in zip(headers, row):
            if not any(map(str.isdigit, val)):
                header_counter[header].update({val: row_val})

for k, v in header_counter.iteritems():
    print k,v

Current output ID Counter({'ABCD': 6}) Desired output ID Counter({'ABCD': 3})


Solution

  • So:

    ln 11sum([w.isdigit() for w in row])

    Is returning the number of columns that are digits in each row, in your case two, cols dID ans nID are digits.

    So row_val is integer 2 for all rows this triggers on.

    ln 14 header_counter[header].update({val: row_val})

    Is adding row_val (2) every time.