regexgedcom

Regular Expression to Extract Text Bounded by '/'


I need to a regular expression to extract names from a GEDCOM file. The format is:

Fred Joseph /Smith/

Where the text bounded by the / is the surname and the Fred Joseph are the forenames. The complication is that the surname could be at any place in the text or may not be there at all. I need something that will extract the surname and capture everything else as the forenames.

This is as far as I have got and I have tried making groups optional with the ? qualifier but to no avail:

What I have so far

As you can see it has several problems: If the surname is missing nothing gets captured, the forename(s) sometimes have leading and trailing spaces, and I have 3 capture groups when I'd really like 2. Even better would be if the capture group for the surname didn't include the '/' characters.

Any help would be much appreciated.


Solution

  • For your last line, I'm not sure there is a way to join the group 1 with group 3 into a single group.

    Here is my proposed solution. It doesn't capture spaces around forenames.

    ^(?:\h*([a-z\h]+\b)\h*)?(?:\/([a-z\h]+)\/)?(?:\h*([a-z\h]+\b)\h*)?$
    

    To correctly match the names, care to use the insensitive flag, and if you test all lines at once, use multiline flag.

    See the demo

    Explanation