arraysregexoption-type

Regex to match array of strings and optional array of strings


I am making a syntax highlighting service for guitar chord sheets. I am trying to highlight the guitar chords and not the lyrics. However, it gets complicated when guitar chords can be comprised of chords + extensions.

For example,

God Is So Good
(capo 1 for Eb)

[Verse 1]
D          Em     A7         D
God is so good,  God is so good;
D         G     Em       D   A7 D
God is so good, He’s so good to me.

I need regex to capture not only "D", "E" but also the "Dm", "Em7", "Dmaj7", "D/F#" and etc.

I have two arrays here, and the first array is capturing the chords and the second array is the optional extensions.

Array1 = {"A", "Bb", "A#", "B", "C", "C#", "D", "D#", "Eb", "E", "F", "F#", "G", "G#"}

Array2 = {"", "/", "m", "-", "1", "2", "3", "4", "5", "6", "7", "8", "9", "sus", "maj"}

How do I go about writing the regex to contain strings in Array 1, followed by optional strings in Array 2?

My initial take on this was to create a long regex that captures all possible chord expressions, but I want to know if there is a better way.

Edit: new example: revo, that regex didn't work with this example: something like D/F# should be matched as well.

 G                     D/F#      
 How great is our God, sing with me,
 Em7                   D/F#      
 How great is our God, all will see,

edit: \b(?:[BE]b?|[ACDFG]#?)(?:sus|maj|[-1-9/m])*(?!.[a-z]|[A-Z]) works for me at the moment.

Chord Editor working in progress


Solution

  • The regex doesn't have to be very long. You don't need to write out every possibility like this:

     A|A#|B|Bb|C|C#...
    

    you can shorten the first part to this:

    [BE]b?|[ACDFG]#?
    

    Shortening the second part:

    sus|maj|[-1-9\/m]
    

    And you just combine the two:

    \b(?:[BE]b?|[ACDFG]#?)(?:sus|maj|[-1-9\/m])?(?!\w)
    

    Note that the \b at the start and (?!\w) at the end. This ensures that substrings that are part of a word are not matched. Hence, things like G in "God" will not be matched.

    Obviously, if your array contents are unknown at compile time, you can't use such "tricks" and have to write out every possibility.