I want to find the longest word in a string vector. Using APL I know that the shape function will return the length of a string e.g.
⍴ 'string' ⍝ returns 6
The reduce function allows me to map diadic functions along a vector but since shape is monadic this will not work. How can I map the shape function in this case? For example:
If the vector is defined as:
lst ← 'this is a string'
I want to do this:
⍴'this' ⍴'is' ⍴'a' ⍴'string'
The "typical" approach would be to treat it as a segmented (or: separated) string and prefix it with the separator (a blank) and pass it to a dfn for further analysis:
{}' ',lst
The fn then looks for the separator and uses it to build the vectors of words:
{(⍵=' ')⊂⍵}' ',lst
┌─────┬───┬──┬───────┐
│ this│ is│ a│ string│
└─────┴───┴──┴───────┘
Let's remove the blanks:
{1↓¨(⍵=' ')⊂⍵}' ',lst
┌────┬──┬─┬──────┐
│this│is│a│string│
└────┴──┴─┴──────┘
And then you "just" need to compute the length of each vector:
{1↓¨(⍵=' ')⊂⍵}' ',lst
This is a direct implementation of your request. However, if you're not interested in the substrings themselves but only the length of "non-blank segments", a more "APLy"-solution might be to work with booleans (usually most efficient):
lst=' '
0 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0
So the ones are the positions of the separators - where do they occur?
⍸lst=' '
5 8 10
But we need a trailing blank, too - otherwise we're missing the end of text:
⍸' '=lst,' '
5 8 10 17
So these (minus the positions of the preceeding blank
) should give the length of the segments:
{¯1+⍵-0,¯1↓⍵}⍸' '=lst,' '
4 2 1 6
This is still somewhat naive and can be expressed in more advanced way - I leave that as an "exercise for the reader" ;-)