pythonpdfpdf-parsingpdf-extraction

How to use page.filter(test_function) in PDFPlumber library?


I am trying to delete tables inside the a pdf page and I'm trying to use page.filter() function for that, here I have table bbox coordinates and I am trying to compare if object coordinates are inside the table coordinates or not. But I was unable to find any sample usage for filter function.

here is the documentation link

I tried in this way :

def filter_func(object):
  #some logic to find the coordinates inside boundary or not

new_page = page.filter(lambda x: x if filter_func(x) else '')

but this usage is not working unfortunately, please help in knowing how to use page.filter function


Solution

  • Found this which is working:

    def filter_func(object):
     #some logic to find the coordinates inside boundary or not
    
    new_page = page.filter(filter_func)
    

    page.filter is a generator and only executes when you use new_page