regex ocr uipath uipath-studio pdf-extraction

How to get only the first match of a RegEx (UiPath Studio RegEx Based Extractor)

I have the following text that I extracted from a PDF using UiPath Studio's OCR. It's the same block of text repeated 3 times due to it being the original, duplicate & triplicate of the same PDF page.

Os bens/serviços foram colocados à disposição do adquirente em 2020-04-16 * Data/Hora início de transporte: 2020-04-16 às 11:52

Total Líquido               500,00
Total de Descontos 500,00         
Desconto Documento                
Total de IVA                115,00
Total do Documento (EUR)    615,00

IVA      Incidência   Valor do IVA
Isento                            
6%                                
13%                               
23%      500,00       115,00      

b5El-Processado por programa certificado n.º75/AT.

Os bens/serviços foram colocados à disposição do adquirente em 2020-04-16 * Data/Hora início de transporte: 2020-04-16 às 11:52

Total Líquido               500,00
Total de Descontos 500,00         
Desconto Documento                
Total de IVA                115,00
Total do Documento (EUR)    615,00

IVA      Incidência   Valor do IVA
Isento                            
6%                                
13%                               
23%      500,00       115,00      

b5El-Processado por programa certificado n.º75/AT.

Os bens/serviços foram colocados à disposição do adquirente em 2020-04-16 * Data/Hora início de transporte: 2020-04-16 às 11:52

Total Líquido               500,00
Total de Descontos 500,00         
Desconto Documento                
Total de IVA                115,00
Total do Documento (EUR)    615,00

IVA      Incidência   Valor do IVA
Isento                            
6%                                
13%                               
23%      500,00       115,00      

b5El-Processado por programa certificado n.º75/AT.

I need to extract the 4 character code behind "-Processado por programa" but just want 1 match or the 1st match.

Already tried [^*]+(?=-Processado\spor\sprograma) and (.*?)(?=-Processado\spor\sprograma) but that outputs me 3 matches.

It worked when I removed the /g flag but I'm using UiPath Studio's RegEx extractor and I don't know how to remove that flag on that program.

Solution

You could match all lines that do not start with 4 word characters and -Processado por programa using a negative lookahead.

When you encounter the line that does, capture the first 4 word characters in group 1

\A.*(?:\r?\n(?!\w{4}-Processado\spor\sprograma\b).*)*\r?\n(\w{4})

Explanation

\A.* Assert the position at the start of the string and any char except a newline 0+ times
(?: Non capture group
- \r?\n Match a newline
- (?!\w{4}-Processado\spor\sprograma\b) Negative lookahead, assert not -Processado por programa directly to the right
- .* Match the rest of the line
)* Close non capture group and repeat 0+ times to match all the lines
\r?\n(\w{4}) Match a newline and capture 4 word characters in group 1

Regex demo