pythonpandasstringmatching

Unable to find a match for a substring in column of my dataframe


def process_and_predict(folder_path):
    image_files = os.listdir(folder_path)
    results_df = pd.DataFrame(columns=['name', 'prediction', 'actual'])
    
    for image_file in image_files:
        #some pre processing
        str1 = str(image_file)#converting name to string, just a precaution not really necessary since i have confirmed it is the same
        str1 = str1.strip()
        st.write("string ",str1)
        actual = df.loc[df['Image_filename'].str.contains(str1), 'BIRADS'].values[0]

i have a dataframe that has the paths of the files in the 'Image_filename' column i am iterating through some test images and trying to find the row that has a match with image_file and extracting the 'BIRADS' column value

example - "inst/BIRADS 2/birads - 2 (11).bmp" this is a value in my df['Image_filename']

now while iterating, the image_file(made into str1) gets the value - 'birads - 2 (11).bmp'

ideally the above code should give me a match, but it is not, this is the message i am getting -

UserWarning: This pattern is interpreted as a regular expression, and has match groups. To actually get the groups, use str.extract.new_info['Image_filename'].str.contains(x)

this is odd because when the str1 is something like 'case001.png' then the same code is giving a match with no issues

the matching entry in 'Image_filename' being - 'BrEaST-Lesions_USG-images_and_masks/case001.png'


Solution

  • Try: .new_info['Image_filename'].str.contains(x, regex=False)

    By default, x is interpreted as a regex, where (...) has a special meaning. See here.