I’m trying (and failing) to write a regular expression (PCRE2) which will replace every space with a dash (-) after the first instance of a particular word (namely •VAN•, •VON• or •DE•) which itself must be surrounded by spaces.
For example:
HENRIETTA VON DER GRAAF
CAROLINE VAN OOSTEN DE WINKEL
MARC DE VRIES VAN JONG
ANNEKA VANHOVEN BAKKER
JOHN WILKINSON SMITH
would translate to:
HENRIETTA VON-DER-GRAAF
CAROLINE VAN-OOSTEN-DE-WINKEL
MARC DE-VRIES-VAN-JONG
ANNEKA VANHOVEN BAKKER (NB: Does not match VAN as not surrounded by spaces)
JOHN WILKINSON SMITH (NB: No substitution here as pattern not matched)
This is as far as I’ve got, but it’s not substituting all of the spaces following the match:
\b( VON| VAN| DE)+\s
https://regex101.com/r/s6BC1y/1
Any advice most appreciated!
You can do your transformation without regular expressions.
data have;
input text $CHAR50.;
datalines;
HENRIETTA VON DER GRAAF
CAROLINE VAN OOSTEN DE WINKEL
MARC DE VRIES VAN JONG
ANNEKA VANHOVEN BAKKER
JOHN WILKINSON SMITH
;
data want;
set have;
p = prxmatch('m/\b(VAN|VON|DE)( )/',text);
if 0 < p < length(text) then
substr(text,p+1) = translate(substr(trim(text),p+1),'-',' ');
run;