I want to extract only words. The word should not contain any number or any special character attached to it, e.g. (64-bit)
, WebView2
, x86_64
. Current regex is able to ignore WebView2
and x86_64
but in the case of (64-bit)
it's returning me bit
, which I don't want. I want to exclude it because it contains numbers with -,(,)
characters.
I've this input data:
Brave
Google Chrome
Microsoft Edge WebView2 Runtime
Robo 3T 1.4.4
WinRAR 7.01 (64-bit)
Python 3.12.3 Core Interpreter (64-bit)
and this regex:
\b[a-zA-Z]+\b
above regex return this result
['Python', 'Core', 'Interpreter', 'bit']
instead of the expected:
['Python', 'Core', 'Interpreter']
IIUC, you don't need a regex, you can split
the words and filter based on isalpha
:
txt = 'Python 3.12.3 Core Interpreter (64-bit)'
out = [s for s in txt.split() if s.isalpha()]
If you really want to use a regex, be aware that \b
matches -
. To avoid this, you would need:
import re
out = re.findall(r'(?:^|\s)([a-zA-Z]+)(?=\s|$)',
'Python 3.12.3 Core Interpreter (64-bit)')
Output:
['Python', 'Core', 'Interpreter']