javascriptregex

What is the regex to match this string?


Consider these sentences:

apple is 2kg
apple banana mango is 2kg
apple apple apple is 6kg
banana banana banana is 6kg

Given that "apple", "banana", and "mango" are the only fruits, what would be the regex to extract the fruit name(s) that appear in the start of the sentence?

I wrote this regex (https://regex101.com/r/fY8bK1/1):

^(apple|mango|banana) is (\d+)kg$  

but this only matches if a single fruit is in the sentence.

How do I extract all the fruit names?

The expected output, for all 4 sentences, should be:

apple, 2
apple banana mango, 2
apple apple apple, 6
banana banana banana, 6


Solution

  • ^((?:apple|mango|banana| )+) is (\d+)kg\s?$
    

    DEMO

    https://regex101.com/r/dO1rR7/1


    Explanation

    ^((?:apple|mango|banana| )+) is (\d+)kg\s?$
    
    ^ assert position at start of a line
    1st Capturing group ((?:apple|mango|banana| )+)
        (?:apple|mango|banana| )+ Non-capturing group
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
            1st Alternative: apple
                apple matches the characters apple literally (case sensitive)
            2nd Alternative: mango
                mango matches the characters mango literally (case sensitive)
            3rd Alternative: banana
                banana matches the characters banana literally (case sensitive)
            4th Alternative:  
                 matches the character  literally
     is matches the characters  is literally (case sensitive)
    2nd Capturing group (\d+)
        \d+ match a digit [0-9]
            Quantifier: + Between one and unlimited times, as many times as possible, giving back as needed [greedy]
    kg matches the characters kg literally (case sensitive)
    \s? match any white space character [\r\n\t\f ]
        Quantifier: ? Between zero and one time, as many times as possible, giving back as needed [greedy]
    $ assert position at end of a line
    g modifier: global. All matches (don't return on first match)
    m modifier: multi-line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)
    i modifier: insensitive. Case insensitive match (ignores case of [a-zA-Z])