[SOLVED] Different and same Regex in Ruby and JS on the same input

Different and same Regex in Ruby and JS on the same input

This is the input line "!!!??"

FIRST regex works similar in Ruby and JS and do what it need to do split this input for two lines contains "!" and "?" respectively:

Example #1 ruby like js /!+|\?+/g

RUBY s.scan(/!+|\?+/).inspect works like as JS s.match(/!+|\?+/g). And output is: RUBY [ '!!!', '??' ], JS [\"!!!\", \"??\"]

Example #2 ruby is not js /([?!])\1*/

Here ruby and js have different behavior RUBY s.scan(/([?!])\1*/) not equal to JS s.match(/([?!])\1*/g). RUBY returns two arrays [[\"!\"], [\"?\"]]. JS returns two strings as like as in the Example #1 [ '!!!', '??' ].

Why /([?!])\1*/ acts different in Ruby and JS?

Solution

Since scan only returns captured substrings when capturing group(s) is/are defined in the pattern, you should modify the pattern to capture the whole match and add some more Ruby code (edited as per this comment):

s="!!!??"
matches = s.scan(/(([?!])\2*)/).inject([]) { |acc, (m, _)| acc << m }
puts matches
# = ['!!!', '??']

See the online Ruby demo.

As per @mudasobwa's comment, you may even contract that to

"!!!??".scan(/(([?!])\2*)/).map(&:first)

Here, (([?!])\2*) matches the same texts as /([?!])\1*/, but since the whole pattern is wrapped with capturing parentheses, the backreference to the ? or ! now has ID=2, hence \1 turns into \2. Inside the block, there is access to both the captured values via m (the whole match) and n (? or !). We are only collecting whole matches, hence m is added to matches upon each match.