Python3:
import re
k = "X"
s = "X测试一Q测试二XQ测试三"
print(re.split((r"\b" + k + r"\b"), s))
Output:
['X测试一Q测试二XQ测试三']
Expected:
['', '测试一Q测试二XQ测试三']
The 测
is a letter belonging to the \p{Lo}
class and there is no word boundary between X
and 测
.
A \b
word boundary construct is Unicode-aware by default in Python 3.x re
patterns, so you might switch this behavior off by using the re.ASCII
/ re.A
option, or the inline (?a)
flag:
import re
k = "X"
print( re.split(fr"(?a)\b{k}\b", "X测试一Q测试二XQ测试三") )
See the regex demo and the Python demo.
If you need to make sure there is no ASCII letter before and after X
, use (?<![a-zA-Z])X(?![a-zA-Z])
. Or, including digits, (?<![a-zA-Z0-9])X(?![a-zA-Z0-9])
.