lean

In Lean is there automatic detection of nearly identical definitions?


I'm all new to Lean. I believe many definitions have already been implemented. But is there an automatic mechanism that forbids one to define something already done (up to minor variations) ?

I'd imagine this is a crucial thing to have in order to avoid problems like sub-communities proving the same things, exponential growth of the library and illegible obfuscation. If not implemented, why is it the case? (Planned, or too difficult, or too restrictive, or not enough coders...)

For example, if I were to define a nat, but whose definition is actually identical to that of a natural number https://github.com/leanprover-community/lean4-samples/blob/main/NaturalNumbers/MyNat/Definition.lean possibly up to some fairly easy rearrangements/spellings or ways to deal with 0, would it go though a merge on github?


Solution

  • Certainly it wouldn't be merged in Mathlib or Std, because PRs are carefully reviewed by humans who know the libraries!

    Duplicates or near duplicates do occasionally find their way into the library, although surprisingly rarely, and typically they are eradicated shortly thereafter.

    So mostly the answer here is that there hasn't been too much need for this.

    But there's some need, and it would be great to see a proposed implementation.

    We already have the tactic exact? which is great for finding exact matches in the library (it finds more general statements, too). There's continuously more work on improvement exact?, and further contributions would be welcome.

    Writing tools that automate looking for duplicates would also be appreciated. It's the lemmas that are worth looking for; finding identical inductive definitions is interesting theoretically, but not of much practical relevance.

    Finally, the "Is there code for X" stream on the Lean zulip is an extremely powerful tool for finding duplicates, powered by experts with ridiculously comprehensive knowledge of the library!