I want to remove Specific words with dot and without dot like (Pvt. ,Ltd. ,Pvt ,Ltd ,Pte. ,Pte ,Co., Co, Private Limited, Inc. , Incorporated) from the string and it should capture rest of the data available.
I have tried using
"\(|\)|-|\.|Pvt|Ltd|Incorporated|Pte|Inc|Co|Private|\s"
but it's not working.
Example text:
0.5Bn FinHealth Pvt. Ltd.Inc. Pte.Co.Private Limited Incorporated,
0.5Bn FinHealth Ltd.,
1MG Technologies Pvt. Ltd.,
I need help to improve the regex.
Maybe give the following pattern a try:
(?:\s*\b(?:(?:Pvt|Ltd|Pte|Co)\.?|Inc\.|Incorporated|Private Limited))+
See an online demo
(?:
- Open 1st non-capture group;
\s*
- 0+ (Greedy) whitespace characters;\b
- A word-boundary;(?:
- Open a nested 2nd non-capture group;
(?:Pvt|Ltd|Pte|Co)
- A 3rd nested non-capture group with the alternatives that can have optional dot behind;\.?
- An optional literal dot;|
- Or;Inc\.
- Literally match 'Inc.';|
- Or;Incorporated
- Literally match 'Incorporated';|
- Or;Private Limited
- Literally match 'Private Limited';))+
- Close non-capture groups and match the 1st one 1+ times.Replace matches with empty string.
Note: I was unsure what you meant to do with \(|\)|-|\.
but my guess is you want to replace certain stand-alone characters. If so, you can include a character-class, for example: [().-]+
to replace these in another alternation.