haskellstatelessreferential-transparency

Statelessness implies referential transparency?


I have this code

data Slist a = Empty | Scons (Sexp a) (Slist a) 
data Sexp a = AnAtom a | AnSlist (Slist a)
data Fruit = Peach | Apple | Pear | Lemon | Fig deriving (Show,Eq)

sxOccurs oatm sxp =
  let slOC Empty = 0
      slOC (Scons se sls) = (seOC se) + (slOC sls)
      seOC (AnAtom atm) = if (atm == oatm) then 1 else 0
      seOC (AnSlist sla) = slOC sla
  in seOC sxp

As you can see in sxOccurs I've got two helper functions inside the let which are "mutually self-referential" as my The Little MLer calls them: slOC and seOC. So in SML you must use the keyword and to let them know about each other and "cross-reference" each other. BTW, sxOccurs counts the number of a specific AnAtom objects in an s-list, in my example atoms are Fruit variables.

My question is, is this an example of referential transparency? Similarly, in Davie he gives this example

let s0 = emptyStack
    s1 = push 12.2 s0
    s2 = push 7.1 s1
    s3 = push 6.7 s2
    s4 = divStack s3
    s5 = push 4.3 s4
    s6 = subtStack s5
    s7 = multStack s6
    s8 = push 2.2 s7
    s9 = addStack s8
in popStack s9

noting that a stack in Imperative-land is constantly mutating the stack, while Haskell is creating a new si variable for each stack operation. Then he says that each of these lines could be shuffled into a different order and the outcome would be unchanged. AFAICT this is the same basic idea as my sxOccurs when it doesn't care what order I present the sub-functions. So, again, is this the deeper meaning of referential transparency? If not, what is this I'm showing here?


Solution

  • Referential transparency means this, and only this: you may replace a variable by its definition without changing the meaning of the program. This is called "referential transparency" because you can "look through" a reference to its definition.

    For example, you write:

    slOC Empty = 0
    slOC (Scons se sls) = (seOC se) + (slOC sls)
    seOC (AnAtom atm) = if (atm == oatm) then 1 else 0
    seOC (AnSlist sla) = slOC sla
    

    Here's a couple transformations you could make, thanks to referential transparency:

    -- replace slOC by its definition
    seOC (AnSlist sla) = (\v -> case v of Empty -> 0; SCons se sls -> seOC se + slOC sls) sla
    -- replace slOC by its definition *again*, starting from the previous line
    seOC (AnSlist sla) = (\v -> case v of
        Empty -> 0
        SCons se sls -> seOC se + (\v -> case v of
            Empty -> 0
            SCons se sls -> seOC se + slOC sls
            ) sls
        ) sla
    -- replace slOC by its definition in another equation
    slOC (Scons se sls) = seOC se + (\v -> case v of Empty -> 0; SCons se sls -> seOC se + slOC sls) sls
    -- or, you could replace seOC by its definition instead
    slOC (SCons se sls) = (\v -> case v of
        AnAtom atm -> if atm == oatm then 1 else 0
        AnSlist sla -> sLOC sla
        ) se + slOC sls
    -- or you could do both, of course
    

    Well, of course, right? By now you may be thinking, "But Daniel, how could this property ever fail?". I will turn briefly to another language to illustrated: C.

    int *x = malloc(sizeof(*x));
    x[0] = 42;
    printf("%d\n", x[0]);
    

    In case you don't read C well, this creates a new variable named x, allocates some space for it, writes 42 into that space, then prints out the value stored in that space. (We should probably expect it to print 42!) But I've defined x = malloc(sizeof(*x)) in the first line; can I replace x by this definition elsewhere?

    No! This is a very different program:

    int *x = malloc(sizeof(*x));
    malloc(sizeof(*x))[0] = 42;
    printf("%d\n", x[0]);
    

    It's still a syntactically valid program, but now x[0] has not been initialized at the moment we reach the line that prints it -- because we allocated a second, independent chunk of space, and initialized that other space instead.

    This turns out to be the main way that other languages violate referential transparency: when you have variables whose value can change, it is not safe to replace references to them by their defined value, either because it may have changed since then or because this will cause it not to change in the way the rest of the program expected it to. Haskell eschews this ability; variables, once assigned a value, are never modified.