I want to exclude the text after "(trading as". My regular expression so far is looking like below. I tried a negative look ahead (?!\s\(trading as))
. But it isn't working as expected. Any help is appriciated.
import re
def extract_company_name(title):
match = re.findall(r'\b[A-Z0-9-](?:[A-Z0-9 \t&.-](?:\s*\(\d+\))?)*(?:\b|(?<=\)))', title)
return ','.join(match) if match else None
text = """TEST LIMITED (trading as FOO Limited) (in relation), TEST (2005) LTD, WINDING LIMITED (in liquidation)"""
print(extract_company_name(text))
Text : TEST LIMITED (trading as FOO Limited) (in relation), TEST (2005) LTD, WINDING LIMITED (in liquidation)
Expected Output : TEST LIMITED, TEST (2005) LTD, WINDING LIMITED
"To exclude the text after (trading as
" you can use the usual regex trick to match what you do not need and capture what you need.
However, you need to also adapt your code for that trick to work the way you want.
So, the code will look like
import re
def extract_company_name(title):
match = re.findall(r'\(trading as.*|\b([A-Z0-9-](?:[A-Z0-9 \t&.-](?:\s*\(\d+\))?)*)(?:\b|(?<=\)))', title)
return ','.join(x for x in match if x) if match else None
text = """TEST LIMITED (trading as FOO Limited) (in relation)"""
print(extract_company_name(text))
See the online demo
Changes:
\(trading as.*|
alternative is added before your pattern to match (trading as
and the rest of the string till end (add re.S
or re.DOTALL
to your re.findall
if your string contains line breaks) (also, add \b
after as
if it must be a whol word)(...)
, so that re.findall
could only return these matches(trading as
is matched, you need to filter the matches before joining them, so you need ','.join(x for x in match if x)
.To find matches outside of (trading as
and the next )
you simply need to adjust the first alternative and use
\(trading as[^()]*\)|\b([A-Z0-9-](?:[A-Z0-9 \t&.-](?:\s*\(\d+\))?)*)(?:\b|(?<=\)))
See the regex demo.