I have this pdf and I'm trying to work on it's very first table.
The issue happens when the name of the employer (EMPREGADOR) reaches two lines.
I'm using the following command to try to strip the data correctly:
tables = camelot.read_pdf('tipo1/t1_3.pdf', pages='1', flavor='stream', edge_tol=500, strip_text='\n')
df = tables[0].df
print(df)
But the result is the following:
0 1 2
0 EMPREGADOR DATA DE ADMISSÃO PIS/PASEP
1 ABC ABC ABC
2 07/01/2008 123123123
3 LTDA
4 CARTEIRA DE TRABALHO INSCRIÇÃO DO EMPREGADOR NÚMERO DA CONTA
5 123123 123123 1231231231
6 DATA DE OPÇÃO DATA E CÓDIGO DE AFASTAMENTO CATEGORIA
7 07/01/2008 30/09/2011 - N2 1
8 TIPO DE CONTA TAXA DE JUROS VALOR PARA FINS RECISÓRIOS
9 OPTANTE 3.0% a.a R$ 0,00
Tried reading the docs and didn't find anything that could help me getting the employer's (EMPREGADOR) data correctly (in this case, ABC ABC ABC LTDA).
This is an issue because the lenght of the employer's name may vary to 1, 2, 3 or even more lines, making a mess in the DF and, therefore, hard to code.
Any suggestion?
As mentioned by Stefano Fiorucci in the comments, Camelot currently does not support the feature needed. Solution was to manipulate the data manually.