pythonpandasimport-csv

Python - Pandas Search and update multiple values - is this possible or should I use Import CSV?


I have CSV with 10k lines. I want to first of all search for the rows with required info and then edit those lines.

example below


jobs = [
  'X01_TEST1_C', 
  'P01_TEST3_B'
  ]


headers = ['job', 'name', 'date', 'extrainfo']
data = [
  ['X01_TEST1_C', 'NAME', 'DATE', 'EXTRADATA'],
  ['P01_TEST3_C', 'NAME', 'DATE', 'EXTRADATA'],
  ['X01_TEST1002_C', 'NAME', 'DATE', 'EXTRADATA'],
  ['X01_TEST4231_C', 'NAME', 'DATE', 'EXTRAP01_TEST3_BDATA']
]

I can load this into PANDAS and then search for single items using below.

df= pd.read_csv("filename",sep=",", encoding='cp1252')
df1 = df[(df['job'].str.contains("X01_TEST1_C", na=False))]
print(df1)

which would print

['X01_TEST1_C', 'NAME', 'DATE', 'EXTRADATA']

How can I search for multiple values at once via pandas

I want something like

df1 = df[(df['job'].str.contains(jobs, na=False))]

But I get error TypeError: first argument must be string or compiled pattern

Once I get passed this part I want to update some jobs from X01_TEST1_C to X01_NEW_TEST1_C - adding this bit of info in incase easier to do whole thing at once.

Is Pandas good for this or do I need to try via different method like import csv?

Thanks for any help.


Solution

  • try:

    jobs = [
      'X01_TEST1_C', 
      'P01_TEST3_B'
      ]
    
    df1 = df[df['job'].str.contains('|'.join(jobs), na=False)] #the default is regex=True so no need to add it
    
    #this is similar to:
    df1 = df[df['job'].str.contains('X01_TEST1_C|P01_TEST3_B', na=False)]