pythonpandasdataframedata-sciencefuzzy-logic

merging dataframes by country and year while the countries are not named the same (for example US,United states )


Hello I am trying to drop rows that have in a specific column string that is not a year. For example here I have the in last rows year formats that have decimal points or '-'.

I have tried to convert the year column into a string and then drop them using the code below but it only removes the row with 2011-21, the ones with decimal points stay.

df.level_1=df.level_1.astype(str)

df.loc[
    (~df.level_1.str.contains("."))
    |~(df.level_1.str.contains("-")),
    :]

is there a way to fix this issue ??


Solution

  • You can filter all rows where level_1 contains non digit characters:

    df[~df.level_1.str.contains('\D')]