For an online algorithms course I am attempting to write a program which calculates the travelling salesman distance of cities using an approxomation algorithm:
I am trying to write a solution in Haskell, and I have it working on small data sets, but it runs out of memory on a large input (the course has an input of ~33000 cities)
-- Fold data: cities map, distances map, visited map, list of visited cities and each distance,
-- and current city
data TS = TS (M.Map Int City) (M.Map (Int,Int) Double) (M.Map Int Bool) ([(Int,Double)]) (Int)
run :: String -> String
run input = let cm = parseInput input -- cityMap contains cities (index,xPos,yPos)
n = length $ M.keys cm
dm = buildDistMap cm -- distanceMap :: M.Map (Int,Int) Double
-- which is the distance between cities a and b
ts = TS cm dm (M.fromList [(1,True)]) [(1,0.0)] 1
(TS _ _ _ beforeLast _) = foldl' (\ts _ -> exec ts n) ts [2..n]
completed = end beforeLast dm
in show $ floor $ sum $ map (\(_,d) -> d) $ completed
exec :: TS -> Int -> TS
exec (TS cm dm visited ordered curr) n =
let candidateIndexes = [(i)|i<-[1..n],M.member i visited == False]
candidates = map (\i -> let (Just x) = M.lookup (curr,i) dm in (x,i)) candidateIndexes
(dist,best) = head $ sortBy bestCity candidates
visited' = M.insert best True visited
ordered' = (best,dist) : ordered
in TS cm dm visited' ordered' best
end :: [(Int,Double)] -> M.Map (Int,Int) Double -> [(Int,Double)]
end ordering dm = let (latest,_) = head ordering
(Just dist) = M.lookup (latest,1) dm
in (1,dist) : ordering
bestCity :: (Double,Int) -> (Double,Int) -> Ordering
bestCity (d1,i1) (d2,i2) =
if compare d1 d2 == EQ
then compare i1 i2
else compare d1 d2
At first I wrote the function exec
as a recursive function instead of calling it via the foldl'
. I thought changing it to use foldl'
would solve my issue as foldl'
is strict. However it appears to have made no difference in memory usage. I have tried compiling my program using no optimisations and -O2
optimisations.
I know that somehow it must be keeping multiple loops in memory as I can sort 34000 numbers without issue using
> sort $ [34000,33999..1]
What exactly am I doing wrong here?
Just in case it is any use here is the parseInput
and buildDistMap
functions, but they are not the source of my issue
data City = City Int Double Double deriving (Show, Eq)
-- Init
parseInput :: String -> M.Map Int City
parseInput input =
M.fromList
$ zip [1..]
$ map ((\(i:x:y:_) -> City (read i) (read x) (read y)) . words)
$ tail
$ lines input
buildDistMap :: M.Map Int City -> M.Map (Int,Int) Double
buildDistMap cm =
let n = length $ M.keys cm
bm = M.fromList $ zip [(i,i)|i<-[1..n]] (repeat 0) :: M.Map (Int,Int) Double
perms = [(x,y)|x<-[1..n],y<-[1..n],x/=y]
in foldl' (\dm (x,y) -> M.insert (x,y) (getDist cm dm (x,y)) dm) bm perms
getDist :: M.Map Int City -> M.Map (Int,Int) Double -> (Int,Int) -> Double
getDist cm dm (x,y) =
case M.lookup (y,x) dm
of (Just v) -> v
Nothing -> let (Just (City _ x1 y1)) = M.lookup x cm
(Just (City _ x2 y2)) = M.lookup y cm
in eDist (x1,y1) (x2,y2)
eDist :: (Double,Double) -> (Double,Double) -> Double
eDist (x1,y1) (x2,y2) = sqrt $ p2 (x2 - x1) + p2 (y2 - y1)
where p2 x = x ^ 2
And some test input
tc1 = "6\n\
\1 2 1\n\
\2 4 0\n\
\3 2 0\n\
\4 0 0\n\
\5 4 3\n\
\6 0 3"
data TS = TS (M.Map Int City) (M.Map (Int,Int) Double) (M.Map Int Bool) ([(Int,Double)]) (Int)
(TS _ _ _ beforeLast _) = foldl' (\ts _ -> exec ts n) ts [2..n]
exec :: TS -> Int -> TS
exec (TS cm dm visited ordered curr) n =
let ...
in TS cm dm visited' ordered' best
That foldl'
is doing a lot less than you hope. It causes the TS
constructor to be evaluated at every step, but nothing in that evaluation process requires visited'
, ordered'
, or best
to be evaluated. (cm
and dm
aren't modified in the loop, so they can't stack up unevaluated thunks.)
The best way to solve this is to make the evaluation of the TS
constructor returned by exec
depend on evaluating visited'
, ordered'
, and best
sufficiently.
M.Map
is always spine-strict, so evaluating a map at all means the whole structure is evaluated. Whether the values are as well depends on how you imported it, but that turns out to not be relevant here. The value you're inserting is a nullary constructor, so it's already fully evaluated. So evaluating visited'
to WHNF is sufficient.
Int
is not a nested type, so evaluating best
to WHNF is sufficient.
[(Int, Double)]
(the outer parens are redundant, the list brackets do grouping of their contents) is a bit trickier. Lists aren't spine strict, nor are pairs strict. But looking at the construction pattern, this is a prepend-only structure. As such, you don't need to worry about the tail. If the list was evaluated coming in, the output will be evaluated as long as the new head is. Unfortunately, that means you've got to be a bit careful with the pair. Half of it is the same best
value as constructed above, so that's not too bad. If it's evaluated, it's evaluated! (Though this does suggest you don't need to be passing it at every iteration, you could just use the front of ordered
.) The other half of the pair is a Double
, which is also non-nested, so WHNF is sufficient for it.
In this particular case, due to the fact that different approaches are necessary, I'd probably just approach this with seq
.
let ... all the same stuff ...
in visited' `seq` dist `seq` best `seq` TS ... all the same stuff ...
Note that I'm being careful to force the minimal number of values to remove unnecessary nesting of thunks. The (,)
and (:)
constructors don't need to be evaluated, only their arguments - the place where nested thunks might build up. (What's the difference in memory consumption between <thunk <expression> <expression>>
and <constructor <expression> <expression>>
?)