apldyalog

Find locations of substrings within a string


I'm trying to find the locations of substrings within a string. Here, I can find the location of the first length-3 string ('DEF') within the larger string:

      ⍸(3,/'DEFGHI') ⍷ 3,/'ABCDEFGHIJKLMNOP'
4

But how can I find the rest of the matches? In this case the output would be:

4 5 6 7

Using up shoe () gives me DEF EFG FGH GHI which is close, but then I can't get from there to 4 5 6 7.


Solution

  • I'm not entirely sure I understand what you're after, but

    ⍸ (3,/'DEFGHI') ∊⍨ 3,/'ABCDEFGHIJKLMNOP'
    

    will give you the indices of the substrings that appear, in the order of appearance, hence

    ⍸ 'EFG' 'GHI' 'FGH' 'DEF' 'FGH' ∊⍨ 3,/'ABCDEFGHIJKLMNOP'
    

    gives the same result even though the looked-for strings are in a different order and have duplicates. If you want to preserve order, then a simple

    'EFG' 'GHI' 'FGH' 'DEF' ⍳⍨ 3,/'ABCDEFGHIJKLMNOP'
    

    will do.

    If your data doesn't already consist of triplets, then you can skip that step:

    ⍸ ⊃∨/ (3,/'DEFGHI') ⍷¨ ⊂'ABCDEFGHIJKLMNOP'
    

    and this preserves order and duplicates too.