haskellversioncompatibilitycabal

Can I tell the version of package from within the haskell program that imports a module from that package?


tl;dr

Can I tell the version of the package exposing an imported module in my Haskell program?

Clearly, I can constrain the version of a package via the .cabal file, but what if I want that to be free, and make a check at runtime?


Context

Say that I have two text files containing respectively

As long as I don't update hashable's version (e.g. no updates are available or I have locked the version down via .cabal file), I can identify each element of the former file with the corresponding element of the latter file, and vice versa (or conclude that some is missing in either file).

As soon as I update hashable (e.g. I don't have a version constraint for hashable in the .cabal file and I run cabal update before building the program), the lookup breaks, because hash's output changes.

Even more context

I've been writing a program (the repo is here) for a quiz that asks multiple choice questions picked from a local JSON database that looks like this,

[
  { "question": "What's female of the lion?",
    "alternatives": ["Lionale", "Lioness", "Liger", "None of them"],
    "answers": ["Lioness"] },
  ...

and updates a local report that stores success rate for each question (essentially a Map).

For no particular reason, I used hash the'question'text (from Data.Hashable) as the key in the map stored in the report, so the report looks something like this,

(fromList [(-9220494745531298831,(4,1 % 4)),(-9211334016399354391,(2,0 % 1)), -- ...

where the "long" numbers are the outputs of hash for each question (the inner pair has total attempts and success rate).

One bit I didn't think of, is that the output of hash for a given input changes when the hashable package is updated, which breaks the ability to identify the questions.

Clearly, the easy solution was to edit the .cabal file to enforce a specific version of that package.

That's how I came to ask the question above.


Solution

  • Version updates aren't the only potential problem with this schema. Hash collisions are possible. You don't currently have a way to handle the scenario in which two questions hash to the same Int. This may seem unlikely, because the hash space is 264 and you will have a much smaller number of questions. But the package documentation mentions it does not try to prevent such collisions:

    Applications that use hash-based data structures to store input from untrusted users can be susceptible to "hash DoS", a class of denial-of-service attack that uses deliberately chosen colliding inputs to force an application into unexpectedly behaving with quadratic time complexity.

    At this time, the string hashing functions used in this library are susceptible to such attacks and users are recommended to either use a Map to store keys derived from untrusted input or to use a hash function (e.g. SipHash) that's resistant to such attacks. A future version of this library might ship with such hash functions.

    It also mentions the exact problem you are asking about, and recommends using a named hash like SHA256.

    Note: the hash is not guaranteed to be stable across library versions, operating systems or architectures. For stable hashing use named hashes: SHA256, CRC32 etc.

    Overall, I would say that mucking about with package versions is just the wrong solution to your problem. If you want a stable, collision-resistant hash, use a hash function designed for such purposes instead of the one from hashable.