phpstringvalidationsubstringcharacter

How can I determine if a string contain all characters in character mask as a continuous substring?


Given a haystack string (single word) consisting only of lowercase letters and a character mask containing only unique lowercase letters, how can I determine if all letters in the character mask occur consecutively at any point in the haystack string? Letters in the character mask may be used in any order and may be used more than once to form a qualifying string if necessary.

Test strings and commented expected boolean results:

$tests = [
    ['word' => 'example',     'mask' => 'lmp'],   // true  (mpl)
    ['word' => 'goodness',    'mask' => 'dns'],   // false (dn, ss)
    ['word' => 'slippers',    'mask' => 'eip'],   // true  (ippe)
    ['word' => 'slippers',    'mask' => 'ips'],   // false (s, ipp, s)
    ['word' => 'google',      'mask' => 'go'],    // true  (goog)
    ['word' => 'food',        'mask' => 'go'],    // false (oo)
    ['word' => 'bananas',     'mask' => 'ans'],   // true  (ananas)
    ['word' => 'candle',      'mask' => 'ace'],   // false (ca, e)
    ['word' => 'mississippi', 'mask' => 'i'],     // true  (i)
    ['word' => 'executive',   'mask' => 'ecitx'], // false (exec, ti, e)
];

There are many pre-existing questions on Stack Overflow across a spectrum of languages that have similar requirements, but they do not have the same combination of rules (or aren't in PHP). In this case, the qualifying substring must consist entirely of characters in the mask and all characters in the mask must be used at least once.

This question is a salvage operation after an interesting but incomplete question from another user was closed, abandoned, and deleted by the Roomba.
I have arbitrarily added details to clarify the task, limited the scope, and populated a battery of test cases.


Solution

    1. My first creation uses preg_match_all() to extract consecutive qualifying characters, then remove characters from the character mask using the extracted letters.

      preg_match_all("/[$mask]+/", $word, $m)
      && array_filter($m[0], fn($chars) => !ltrim($mask, $chars))
      
    2. Then I realized that preg_match_all() might be matching substrings that can be eliminated earlier because they have insufficient length to clear the mask characters. I added a minimum quantifier to the regex based on the length of the mask. It may or may not be worth the extra function call versus the decreased pattern readability.

      preg_match_all("/[$mask]{" . strlen($mask) . ",}/", $word, $m)
      && array_filter($m[0], fn($chars) => !ltrim($mask, $chars))
      
    3. Finally, I wanted to see if the task could be done solely with regex and avoid doing needless surplus matching. Using a technique that is typically used to validate password strength, I called preg_replace() to generate a series of lookaheads and built a pattern that preg_match() could use to ensure that every letter from the mask is present in the isolated substring.

      (bool) preg_match('/' . preg_replace('/[a-z]/', "(?=[$mask]*$0)", $mask) . "[$mask]+/", $word)
      

    Opinions will vary about which one is most/least readable/maintainable and I did not perform any benchmarks to see which one performs best. Here is the PHP demo.