pythonpandasread-csv

Read csv without filling up empty values


I have the following csv I want to read into Python (Spyder) and count the amount of blank values in column 2:

column 1 column 2
A N/A
B N/A
C N/A
D
E N/A
F N/A
G
H N/A

In this case, there are two blank values, the rest are default values.

The standard code is:

LoadFile=pd.read_csv(FileName)

Which reads in the data as:

column 1 column 2
A NaN
B NaN
C NaN
D NaN
E NaN
F NaN
G NaN
H NaN

So the empty count is 8, not 2

missings =LoadFile['column 2'].isnull().sum()

Then I tried to read it in as:

LoadFile=pd.read_csv(FileName,na_values='', keep_default_na=False)

Which changes the table into:

column 1 column 2
A N/A
B N/A
C N/A
D N/A
E N/A
F N/A
G N/A
H N/A

So the empty count is zero.

How do I read in my csv file so the empty count is 2 and it does not alter the empty values.


Solution

  • This works fine for me.

    CSV = '''column 1,column 2
    A,N/A
    B,N/A
    C,N/A
    D,
    E,N/A
    F,N/A
    G,
    H,N/A'''
    
    df = pd.read_csv(io.StringIO(CSV), na_values='', keep_default_na=False)
    df['column 2'].isnull().sum()
    # 2
    

    Alternatively, don't define na_values and count the blanks:

    df = pd.read_csv(FileName, keep_default_na=False)
    missings = df['column 2'].eq('').sum()
    

    Output: 2