I have the following csv I want to read into Python (Spyder) and count the amount of blank values in column 2:
column 1 | column 2 |
---|---|
A | N/A |
B | N/A |
C | N/A |
D | |
E | N/A |
F | N/A |
G | |
H | N/A |
In this case, there are two blank values, the rest are default values.
The standard code is:
LoadFile=pd.read_csv(FileName)
Which reads in the data as:
column 1 | column 2 |
---|---|
A | NaN |
B | NaN |
C | NaN |
D | NaN |
E | NaN |
F | NaN |
G | NaN |
H | NaN |
So the empty count is 8, not 2
missings =LoadFile['column 2'].isnull().sum()
Then I tried to read it in as:
LoadFile=pd.read_csv(FileName,na_values='', keep_default_na=False)
Which changes the table into:
column 1 | column 2 |
---|---|
A | N/A |
B | N/A |
C | N/A |
D | N/A |
E | N/A |
F | N/A |
G | N/A |
H | N/A |
So the empty count is zero.
How do I read in my csv file so the empty count is 2 and it does not alter the empty values.
This works fine for me.
CSV = '''column 1,column 2
A,N/A
B,N/A
C,N/A
D,
E,N/A
F,N/A
G,
H,N/A'''
df = pd.read_csv(io.StringIO(CSV), na_values='', keep_default_na=False)
df['column 2'].isnull().sum()
# 2
Alternatively, don't define na_values
and count the blanks:
df = pd.read_csv(FileName, keep_default_na=False)
missings = df['column 2'].eq('').sum()
Output: 2