Upon reading this Python question and proposing a solution, I tried to solve the same challenge in Haskell.
I've come up with the code below, which seems to work. However, since I'm pretty new to this language, I'd like some help in understand whether the code is good performancewise.
lswrc :: String -> String
lswrc s = reverse $ fst $ foldl' step ("","") s
where
step ("","") c = ([c],[c])
step (maxSubstr,current) c
| c `elem` current = step (maxSubstr,init current) c
| otherwise = let candidate = (c:current)
longerThan = (>) `on` length
newMaxSubstr = if maxSubstr `longerThan` candidate
then maxSubstr
else candidate
in (newMaxSubstr, candidate)
Some points I think could be better than they are
(c:current)
but I use it only in the else
; I could make a more complicated longerThan
to add 1 to the lenght of its second argument, so that I can apply it to maxSubstr
and current
, and construct (c:current)
in the else
, without even giving it a name.current
when c
is in the current
string, because I'm piling up the strings with :
; I could instead pattern match when checking c
against the string (as in c `elem` current@(a:as)
), but then when adding the new character I should do current ++ [c]
, which I know is not as performant as c:current
.foldl'
(as I know foldl
doesn't really make sense); foldr
could be an alternative, but since I don't see how laziness enters this problem, I can't tell which one would be better.Running elem
on every iteration makes your algorithm Ω(n^2) (for strings with no repeats). Running length
on, in the worst case, every iteration makes your algorithm Ω(n^2) (for strings with no repeats). Running init
a lot makes your algorithm Ω(n*sqrt(n)) (for strings that are sqrt(n) repetitions of a sqrt(n)-long string, with every other one reversed, and assuming an O(1) elem
replacement).
A better way is to pay one O(n) cost up front to copy into a data structure with constant-time indexing, and to keep a set (or similar data structure) of seen elements rather than a flat list. Like this:
import Data.Set (Set)
import Data.Vector (Vector)
import qualified Data.Set as S
import qualified Data.Vector as V
lswrc2 :: String -> String
lswrc2 "" = ""
lswrc2 s_ = go S.empty 0 0 0 0 where
s = V.fromList s_
n = V.length s
at = V.unsafeIndex s
go seen lo hi bestLo bestHi
| hi == n = V.toList (V.slice bestLo (bestHi-bestLo+1) s)
-- it is probably faster (possibly asymptotically so?) to use findIndex
-- to immediately pick the correct next value of lo
| at hi `S.member` seen = go (S.delete (at lo) seen) (lo+1) hi bestLo bestHi
| otherwise = let rec = go (S.insert (at hi) seen) lo (hi+1) in
if hi-lo > bestHi-bestLo then rec lo hi else rec bestLo bestHi
This should have O(n*log(n)) worst-case performance (achieving that worst case on strings with no repeats). There may be ways that are better still; I haven't thought super hard about it.
On my machine, lswrc2
consistently outperforms lswrc
on random strings. On the string ['\0' .. '\100000']
, lswrc
takes about 40s and lswrc2
takes 0.03s. lswrc2
can handle [minBound .. maxBound]
in about 0.4s; I gave up after more than 20 minutes of letting lswrc
chew on that list.