functional-programmingapldyaloggnu-apl

APL - How can I find the longest word in a string vector?


I want to find the longest word in a string vector. Using APL I know that the shape function will return the length of a string e.g.

⍴ 'string' ⍝ returns 6

The reduce function allows me to map diadic functions along a vector but since shape is monadic this will not work. How can I map the shape function in this case? For example:

If the vector is defined as:

lst ← 'this is a string'

I want to do this:

⍴'this' ⍴'is' ⍴'a' ⍴'string'

Solution

  • The "typical" approach would be to treat it as a segmented (or: separated) string and prefix it with the separator (a blank) and pass it to a dfn for further analysis:

    {}' ',lst
    

    The fn then looks for the separator and uses it to build the vectors of words:

          {(⍵=' ')⊂⍵}' ',lst
    ┌─────┬───┬──┬───────┐
    │ this│ is│ a│ string│
    └─────┴───┴──┴───────┘
    

    Let's remove the blanks:

          {1↓¨(⍵=' ')⊂⍵}' ',lst
    ┌────┬──┬─┬──────┐
    │this│is│a│string│
    └────┴──┴─┴──────┘
    

    And then you "just" need to compute the length of each vector:

    {1↓¨(⍵=' ')⊂⍵}' ',lst
    

    This is a direct implementation of your request. However, if you're not interested in the substrings themselves but only the length of "non-blank segments", a more "APLy"-solution might be to work with booleans (usually most efficient):

          lst=' '
    0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
    

    So the ones are the positions of the separators - where do they occur?

          ⍸lst=' '
    5 8 10
    

    But we need a trailing blank, too - otherwise we're missing the end of text:

          ⍸' '=lst,' '
    5 8 10 17
    

    So these (minus the positions of the preceeding blank) should give the length of the segments:

          {¯1+⍵-0,¯1↓⍵}⍸' '=lst,' '
    4 2 1 6
    

    This is still somewhat naive and can be expressed in more advanced way - I leave that as an "exercise for the reader" ;-)