regexregex-lookarounds

How can I exclude a single character from a list only when followed by certain other characters?


I want to parse certain character combinations which represent musical pitches from the begining of a string. For that I am using regular expressions (NSRegularExpression in Swift, if that matters). Here is the regex I have been using so far:

^[A-G][b#]?

Possible matches are A, Bb, D#, but not b alone.

Now I want to make it possible to detect the pitch names from lowercase letters as well. And here the problem begins.

So the new regex so far is:

^[A-Ga-g][b#]?

Now a single lowercase b will match, which is desired, as b is a musical pitch.

Furthermore I also want to recognise pitches described another way, i.e. as musical scale degrees in roman numeral notation. These look like this: V, #II, bVII.

My code first tries to get a match for an absolute pitch with the regex given before, and if that gives no result tries another regex for the roman numeral notation.

The problem is, that the optional characters b and # are used in both notations but in different places. AFTER the absolute pitch notation (like Gb) but BEFORE the roman numeral (like bV).

By allowing the absolute pitch letter name to be lowercase my first regex returns b as a match even when it is not the lowercase note letter b, but part of the roman numeral bV.

So what I want to do is to accept b as a match only if it is not followed by V or I. With the help of several tutorials I found out that a negative lookahead should be the solution, but I am struggling with the exact syntax for the regex.

ChatGPT is suggesting the following:

^[A-Ga-g](?!b[V|I])[b#]?

but this does not give the desired result. From bV I still get b as a match.

Another suggestion, also provided by ChatGPT is

^(?!.*b[V|I])[A-Ga-g][b#x]?

with the explanation that the (?!.*b[V|I]) ensures, that nowhere in the string bV or bI occurs. This regex does indeed give the desired result, but I am not sure about the explanation. I only want to test for these ocurrences at the beginning of the string.

My own idea would be:

^[A-Gab(?![V|I])c-g][b#x]?

but this does not pass the online regex tester.

Is it possible, to write a regex in which a single character b from a character list [a-g] is accepted only when not followed by a character from another list [VI]?

EDIT:

I'll provide a little bit more context:

Eventually, I am not looking for simple musical pitches but for chord symbols. They consist of multiple elements, some of which are optional, and the order of those elements can be varying. Therefore I parse the textstring in several phases, each time I use a different regex, and when found, I remove what I found from the original string.

Examples of such chord symbols are:

Cmaj7           -> C
C#∆7            -> C#
B13             -> B
Bb7alt          -> Bb
Fm7b5/Ab        -> F
Ab13b9#11omit5  -> Ab
V9sus4          -> V
bVIIo7          -> VII

One thing I can be sure of is, that the chord symbol starts with a pitch, and for convenience I allow this as absolute as well as relative pitches.

I first look for absolute pitches at the beginnig (c, f#, b, bb) - If I get no result I look again with another regex, this time for the roman numerals. The result of either of this searches will then be sent to another function which will analyze it and return a struct with semantic meaning (so that my app actually "knows" about the pitch). As I must use different functions for the two sorts of results, I have to apply these searches sequentially.

If both searches don't give a result it is not a valid chord symbol and the rest of the examination will be cancelled.


Solution

  • Your second regex gives the desired result you said. Yes, you only need to test for the condition at the start of the string. Further you can remove the vertical bar from the character class.

    ^(?!b[VI])[A-Ga-g][#bx]?
    

    Here is the demo at regex101

    The .* is removed from the lookahead, which is a zero-length assertion and checking the condition right at ^ beginning of the string (or line in multiline mode). It will not match if the string either starts with bV or bI but any other strings, that match the pattern, even BI or b.