algorithmhaskellhaskell-platform

How to optimise Haskell code to pass HackerRanks timed out test cases (Not for any ongoing competition, just me practicing)


I been learning Haskell for around 4 months now and I have to say, the learning curve is definitely hard(scary also :p).

After solving about 15 easy questions, today I moved to my first medium difficulty problem on HackerRank https://www.hackerrank.com/challenges/climbing-the-leaderboard/problem.

It was 10 test cases and I am able to pass 6 of them, but the rest fail with timeout, now the interesting part is, I can already see a few parts that have potential for performance increase, for example, I am using nub to remove duplicated from a [Int], but still I am not able to build a mental model for algorithmic performance, the main cause of that being unsure about Haskell compiler will change my code and how laziness plays a role here.

import Data.List (nub)

getInputs :: [String] -> [String]
getInputs (_:r:_:p:[]) = [r, p]

findRating :: Int -> Int -> [Int] -> Int
findRating step _ [] = step
findRating step point (x:xs) = if point >= x then step else findRating (step + 1) point xs

solution :: [[Int]] -> [Int]
solution [rankings, points] = map (\p -> findRating 1 p duplicateRemovedRatings) points
                            where duplicateRemovedRatings = nub rankings


main :: IO ()
main = interact (unlines . map show . solution . map (map read . words) . getInputs . lines)

Test Case in GHCI

:l "solution"
let i = "7\n100 100 50 40 40 20 10\n4\n5 25 50 120"
solution i // "6\n4\n2\n1\n"

Specific questions I have:

How can I reason about this and build a mental model for performance.

If this violates community guidelines, please comment and I will delete this. Thank you for the help :)


Solution

  • First, to answer your questions:

    1. Yes, duplicateRemovedRankings is computed only once. No repeated computation.
    2. To debug-trace, you can use trace and its friends (see the docs for examples and explanation). Yes, it can be used even in pure, non-IO code. But obviously, don't use it for "normal" output.
    3. Yes, your understanding of complexity is correct.

    Now, how to pass HackerRank's tricky tests.

    First, yes, you're right that nub is O(N^2). However, in this particular case you don't have to settle for that. You can use the fact that the rankings come pre-sorted to get a linear version of nub. All you have to do is skip elements while they're equal to the next one:

    betterNub (x:y:rest) 
      | x == y = betterNub (y:rest)
      | otherwise = x : betterNub (y:rest)
    betterNub xs = xs
    

    This gives you O(N) for betterNub itself, but it's still not good enough for HackerRank, because the overall solution is still O(N*M) - for each game you are iterating over all rankings. No bueno.

    But here you can get another improvement by observing that the rankings are sorted, and searching in a sorted list doesn't have to be linear. You can use a binary search instead!

    To do this, you'll have to get yourself constant-time indexing, which can be achieved by using Array instead of list.

    Here's my implementation (please don't judge harshly; I realize I probably got edge cases overengineered, but hey, it works!):

    import Data.Array (listArray, bounds, (!))
    
    findIndex arr p
      | arr!end' > p = end' + 1
      | otherwise = go start' end'
      where
        (start', end') = bounds arr
    
        go start end =
          let mid = (start + end) `div` 2
              midValue = arr ! mid
          in
            if midValue == p then mid
            else if mid == start then (if midValue < p then start else end)
            else if midValue < p then go start mid
            else go mid end
    
    solution :: [[Int]] -> [Int]
    solution [rankings, points] = map (\p -> findIndex duplicateRemovedRatings p + 1) points
                                where duplicateRemovedRatings = toArr $ betterNub rankings
    
    toArr l = listArray (0, (length l - 1)) l
    

    With this, you get O(log N) for the search itself, making the overall solution O(M * log N). And this seems to be good enough for HackerRank.

    (note that I'm adding 1 to the result of findIndex - this is because the exercise requires 1-based index)