I have a dataframe like this:
C1 | C2 | C3 | C4 |
---|---|---|---|
1 | foo | asd | 23 |
foo | foo | asd | 43 |
3 | foo | asd | 1 |
4 | foo | asd | bar |
I'm trying to filter (and discard) all rows that have strings in C1 or C4 columns, my final dataframe must be:
C1 | C2 | C3 | C4 |
---|---|---|---|
1 | foo | asd | 23 |
3 | foo | asd | 1 |
I'm trying to do this using "isNaN" but I'm not sure how I should use it.
This is my code:
df = pd.read_csv(
path_file,
sep=",",
usecols=columns,
skiprows=0,
skipfooter=0,
engine="python",
encoding="utf-8",
skipinitialspace=True,
on_bad_lines='warn',
names=columns)
df_new = df[df["C1"].notna()]
df_new_2 = df[df["C4"].notna()]
Any idea about how I can achieve this?
You can try something like this:
df.loc[df[['C1', 'C4']].apply(pd.to_numeric, errors='coerce').dropna(how='any').index]
Output:
C1 C2 C3 C4
0 1 foo asd 23
2 3 foo asd 1
Use pd.to_numeric
with errors='coerce'
to give NaN that can be used to with dropna
.