pythonpandas

Subset rows in df where first two string values are the same - python


I'm aiming to subset a df where the first two string values in a list are the same between two separate columns. Example, the list outlined in first_2 display the values I'm interested in returning. When these values are found between Letters and Value, I want to subset these rows.

However, I don't want the rows where AB and DA are found. I'm only after an identical match.

df = pd.DataFrame({
    'Letters':('AB','BD','AB','DA','EG','FA'),
    'Value':('AB','BC','DA','DA','EH','FA'),
    'Position':(1,np.nan,3,4,np.nan,6),
})

first_2 = ['AB','DA']

df1 = df[(~df['Letters'].str[0:1].isin(first_2)) & (df['Value'].isin(first_2))]

intended:

Letters Value  Position
0      AB    AB       1.0
3      DA    DA       4.0

Solution

  • s = df['Letters'].str[:2]
    out = df[s.isin(first_2) & s.eq(df['Value'])]
    

    out

        Letters Value   Position
    0   AB      AB      1.0
    3   DA      DA      4.0