I'm trying to work out how move semantics affect referential transparency.
Referential transparency (RT) allows us to replace any expression with its result without changing the meaning of the program (paraphrased from Functional Programming in Scala). For example, I can replace 1 + 1
anywhere in my program with 2
, and nothing should change. This Python program is referentially transparent:
@dataclass
class Bucket:
things: List[str]
leaves = ["leaves"]
def bucket_with_sand(things: List[str]) -> Bucket:
return Bucket(things + ["sand"])
bucket_with_sand(leaves) # can be replaced with Bucket(["leaves", "sand"]) with no change to the program
whereas this function mutates its argument in place
def bucket_with_sand(things: List[str]) -> Bucket:
things += ["sand"]
return Bucket(things)
so replacing the function call with its result changes the meaning. It's no longer referentially transparent. In a language with move semantics like Rust's, we can avoid this problem by moving leaves
(and relying on the fact that Vec
is non-Copy
):
struct Bucket {
things: Vec<&str>,
}
let leaves = vec!["leaves"];
fn bucket_with_sand(things: Vec<&str>) -> Bucket {
things.push("sand");
Bucket { things }
}
bucket_with_sand(leaves); // mutates `things`
// doesn't matter that `leaves` has been mutated here as it's now out of scope
This appears to be referentially transparent again. Is this correct? Do such moves relax conventional constraints on RT design? Or are moves not referentially transparent? I'm particularly interested to know if there are there broader implications on RT that I've not seen.
The concept of referential transparency is a bit blurry in almost all languages that execute on real computers, especially in languages with imperative state, and Rust is no exception to this fact. A call can have a side effect -- anything from performing IO to running out of memory to merely mutating a mutable variable -- and depending on whether you include those in your sense of "nothing changing" you might well consider functions to be non-referentially-transparent. They are not pure mathematical functions, but procedures that do change the state of the world when called.
That said: the so-called "ownership" system of Rust -- its combination of "affine" or "move" types with its multi-reader/single-writer borrowing system -- serves to dramatically reduce the set of possible side effects in a program. In particular it (mostly*) eliminates the most pervasive and pernicious side effect in most other imperative languages: mutable aliasing. That is, in Rust you will (mostly*) never have two or more references to the same memory location, where one reference in one function mutates the memory location as a side effect of running, and the other reference in some other function just sees the value in the memory location "suddenly change". This means that any time a value is going to be mutated, it will be mutated through its sole current reference -- either a &mut
or an owning variable -- and this means that, as you asked here, to a certain degree assumptions about referential transparency have a greater chance of being true in Rust than they do in most other imperative languages.
The "(mostly*)" asterisk above calls out a further relatively large exception: unsafe code can violate this rule, and does in several library functions. For example the portion of Rust's standard library that provides so-called "interior mutability" furnishes an unsafe cell type as well as wrapper types that enforce the prohibition on mutable aliasing dynamically, in a temporal fashion: one such mutable access can happen at a given time but they are allowed to happen sequentially from different shared references in sequence.
The same sort of caveat applies to almost every real language, no matter how "pure" it markets itself: the ML family has ref
cells, Haskell has its unsafe library functions, Lisps have set!
and so forth. These are all concessions to the fact that sometimes there's an overwhelming performance advantage to being able to reach through the mathematical abstraction (of pure values in functional languages, or of affine values in Rust's case) to the underlying machine with its unrestricted mutable aliasing.