pythonpandasdataframe

Discard rows with a string value from a dataframe


I have a dataframe like this:

C1 C2 C3 C4
1 foo asd 23
foo foo asd 43
3 foo asd 1
4 foo asd bar

I'm trying to filter (and discard) all rows that have strings in C1 or C4 columns, my final dataframe must be:

C1 C2 C3 C4
1 foo asd 23
3 foo asd 1

I'm trying to do this using "isNaN" but I'm not sure how I should use it.

This is my code:

df = pd.read_csv(
    path_file,
    sep=",",
    usecols=columns,
    skiprows=0,
    skipfooter=0,
    engine="python",
    encoding="utf-8",
    skipinitialspace=True,
    on_bad_lines='warn',
    names=columns)

df_new = df[df["C1"].notna()]
df_new_2 = df[df["C4"].notna()]

Any idea about how I can achieve this?


Solution

  • You can try something like this:

    df.loc[df[['C1', 'C4']].apply(pd.to_numeric, errors='coerce').dropna(how='any').index]
    

    Output:

      C1   C2   C3  C4
    0  1  foo  asd  23
    2  3  foo  asd   1
    

    Use pd.to_numeric with errors='coerce' to give NaN that can be used to with dropna.