rustsethashsetlanguage-design

A hash function that maintains mathematical equality (especially for sets)


I'm working on a little math-based programming language (written in Rust) that uses sets instead of types. In it, a variable or value belongs to a set of values (eg. x : {1, 2, 3} or msg : Str). The user can create finite sets (eg. x = {1, 2, 3, false, "hi"}), which store the values internally in a HashSet, with the values being hashed. However, the language also has a notion of infinite sets, like Nat, Int, Real, Complex, Str, etc. My problem is that I do not know how to optimally represent these infinite sets when inside another set (ie. through hashing).

My current idea is to set the hash of eg. Nat to be 0, Int to be 1, and so on, but that just feels wrong. Additionally, an expression that evaluates to an equivalent set should have the same hash and one that evaluates to a different set would have to have a different hash, so my current implementation wouldn't work:

// Reals except those that are Ints
A = Real \ Int
// hash of A should vary from the hash of Real and Int (what would its hash be though?)
B = Real \ A
// hash of B should be the same as hash of Int

In this example, it could be possible to somehow simplify, though in more complex expressions I don't know if they could always be simplified into something with an equal hash. Also, for A, using my method, I have no idea how the hash would be calculated.

My question is: is there some hash function that I should use that has this functionality, or is there some other algorithm or datastructure that I should be using to implement this, or am I completely off-track here and should be implementing the whole thing in a different way?


Solution

  • Frame Challenge: this is not about hashing, this is about canonicalization.

    Hashing (or interning) takes an input and assign an integer value to it. If you want guaranteed equal hashes/interning, then you need to provide guaranteed equal inputs in the first place.

    Your problem, therefore, is to create an algorithm that will produce a canonical form for any input. In the example given:

    A = Real \ Int => Real \ Int
    B = Real \ A => Int
    

    Then, hashing or interning the canonical form will lead to an integer equal to all "canonically equal" sets -- no matter the hash/interning algorithm used.