I'm using this code to print the occurrences of non-numeric cells. However, it is doubling the count. Its printing 6 for 3.
Sample data:
pID,sID,dID,nID,ID
ABCD-02-01,ABCD-02-01-0002-UNK,2,123,ABCD
ABCD-02-01,ABCD-02-01-0004-UNK,3,1234,ABCD
ABCD-02-01,ABCD-02-01-0007-UNK,7,3455,ABCD
Code:
#!/usr/bin/env python
from collections import Counter, defaultdict
import csv
header_counter = defaultdict(Counter)
with open('trial.csv') as input_file:
r = csv.reader(input_file, delimiter=',')
headers = next(r)
for row in r:
row_val = sum([w.isdigit() for w in row])
for header, val in zip(headers, row):
if not any(map(str.isdigit, val)):
header_counter[header].update({val: row_val})
for k, v in header_counter.iteritems():
print k,v
Current output ID Counter({'ABCD': 6})
Desired output ID Counter({'ABCD': 3})
So:
ln 11
sum([w.isdigit() for w in row])
Is returning the number of columns that are digits in each row, in your case two, cols dID
ans nID
are digits.
So row_val
is integer 2 for all rows this triggers on.
ln 14
header_counter[header].update({val: row_val})
Is adding row_val
(2) every time.