I have been using Flux.jl
and have been confused by the differences when running code within a let
block and without. The following example runs without error:
using Flux
p = rand(2)
function f(x)
f, b = p
x*f + b
end
data = reduce(hcat, [[x, f(x)] for x in 0:0.1:1.0])
p = rand(2)
θ = params(p)
loss(y) = sum((y .- f.(data[1,:])).^2)
for n in 1:1000
grads = Flux.gradient(θ) do
loss(data[2,:])
end
Flux.Optimise.update!(ADAM(), θ, grads)
end
However, wrapping the same code in a let
block does not work as I expect:
using Flux
let
...
end
and produces the stacktrace:
MethodError: objects of type Float64 are not callable
Maybe you forgot to use an operator such as [36m*, ^, %, / etc. [39m?
Stacktrace:
[1] macro expansion
@ ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0 [inlined]
[2] _pullback(ctx::Zygote.Context, f::Float64, args::Float64)
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:9
[3] (::Zygote.var"#1100#1104"{Zygote.Context, Float64})(x::Float64)
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/lib/broadcast.jl:186
[4] _broadcast_getindex_evalf
@ ./broadcast.jl:670 [inlined]
[5] _broadcast_getindex
@ ./broadcast.jl:643 [inlined]
[6] getindex
@ ./broadcast.jl:597 [inlined]
[7] copy
@ ./broadcast.jl:899 [inlined]
[8] materialize
@ ./broadcast.jl:860 [inlined]
[9] _broadcast
@ ~/.julia/packages/Zygote/bJn8I/src/lib/broadcast.jl:163 [inlined]
[10] adjoint
@ ~/.julia/packages/Zygote/bJn8I/src/lib/broadcast.jl:186 [inlined]
[11] _pullback
@ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:65 [inlined]
[12] _apply
@ ./boot.jl:814 [inlined]
[13] adjoint
@ ~/.julia/packages/Zygote/bJn8I/src/lib/lib.jl:200 [inlined]
[14] _pullback
@ ~/.julia/packages/ZygoteRules/AIbCs/src/adjoint.jl:65 [inlined]
[15] _pullback
@ ./broadcast.jl:1297 [inlined]
[16] _pullback(::Zygote.Context, ::typeof(Base.Broadcast.broadcasted), ::Float64, ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
[17] _pullback
@ ./In[198]:32 [inlined]
[18] _pullback(::Zygote.Context, ::var"#loss#155", ::Vector{Float64}, ::Vector{Float64})
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
[19] _pullback
@ ./In[198]:37 [inlined]
[20] _pullback(::Zygote.Context, ::var"#152#156"{var"#loss#155", Matrix{Float64}})
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface2.jl:0
[21] pullback(f::Function, ps::Zygote.Params)
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface.jl:351
[22] gradient(f::Function, args::Zygote.Params)
@ Zygote ~/.julia/packages/Zygote/bJn8I/src/compiler/interface.jl:75
[23] top-level scope
@ In[198]:36
[24] eval
@ ./boot.jl:373 [inlined]
[25] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
@ Base ./loading.jl:1196
Whereas I had expected them to both behave identically (at least in isolation). I cannot tell much myself from the stacktrace, since I have no experience with the implementation of Flux.jl
or Zygote.jl
. But the problem seems to be something to do with the definition of the function f
, since changing the definition of f
to:
function f(x)
a, b = p
x*a + b
end
Allows both the let
and let
less versions to work. Of course, I could fix it like this and call it a day. But I am curious if anyone knows why the two versions work differently?
Note
(@v1.7) pkg> status Flux
Status `~/.julia/environments/v1.7/Project.toml`
[587475ba] Flux v0.12.8
This is very weird. So the main difference is that the first version is in the global scope, and the second version is in a local scope (the let block). For a repeatable local scope, we can put the code in a function, or rather just enough of the code to see the problem:
function g()
p = [1.0, 10.0] # for repeatability
function f(x)
f, b = p
x*f + b
end
println(f)
println(f(1))
println(f)
println(f(2))
end
The results:
julia> g()
f
11.0
1.0
ERROR: MethodError: objects of type Float64 are not callable
So f
starts off as a function, and its first call does its job (11.0). But that call reassigns p[1]
to f
, so by the second call, calling f
, or 1.0, fails.
I think this is rooted in Julia's scoping rules.
In the global scope version, the function name f
is global, and it just happens to use a local variable f
. The full rules are in the link and worth a read, but rest assured, assigning a local variable in a particular local scope does not affect anything in the global scope or in the other isolated local scopes.
In the let block version, everything is nested in one local scope. The function name f
and the function variable f
are both local. When you assign to a local variable that already exists in the current or enclosing local scopes, the existing variable is used. Just defining f(x)
didn't run f, b = p
immediately, but upon calling f(1)
, it ran and reassigned the local f
.
Before reviewing the scoping rules, I myself didn't expect reassigning a function name to be possible, let alone by its own call. I suppose the implicit const
of method definitions only applies to global scope. Local function names are no different from any other local variables, which are routinely reassigned in nested local scopes like for-loops, comprehensions, and function bodies.
As you've demonstrated, changing the variable name gets around this local reuse, but you could also explicitly create a new local variable in a nested local scope, separate from any that may exist in enclosing local scopes:
function f(x)
local f
f, b = p
x*f + b
end
P.S. An attempt at explaining the scope rules in an order that's easier to follow than the docs, though still intimidatingly long.
A global scope contains global variables and exists at the level of a file or in a module
block. A global scope can enclose multiple isolated local scopes, which are created by any other block except if
and begin
. Any local scope contains its local variables and can enclose other local scopes.
If scope A encloses scope B, A is an enclosing/outer scope relative to B as the nested/inner scope. If scope B encloses a scope C, scope A also encloses scope C.
If a nested scope does not assign a variable of a particular name, it can use an existing variable of the same name from any enclosing scope.
If a nested scope assigns a variable, and no enclosing scope has a variable with the same name, then a new variable is created for that nested scope.
If a nested scope assigns a variable, and an enclosing local scope has a preexisting variable with the same name, then that enclosing local scope's variable is reassigned, by default. Simple example is my_sum = 0; for i in 1:3 my_sum += i end
. To create a new variable for that nested scope, instead, use the local <name>
statement in the nested local scope .
If a nested scope assigns a variable, and only the global scope has a preexisting variable with the same name, then a new local variable is created, by default. This difference is because unlike a local scope, a global scope can be scattered across multiple include
d files. Nobody wants to pore over scattered files to prevent one file's code from accidentally reassigning another file's global variable. To reassign a global variable, instead, use the global <name>
statement in the nested local scope.
Be aware that each iteration of loops and comprehensions makes its own local scope with its own local variables, though they share enclosing scopes' variables. The docs give an example of anonymous functions capturing iteration-local variables: for j = 1:2 Fs[j] = ()->j end
; note that if the rule was different so j
is the same variable across iterations (like in Python), then all the anonymous functions return the same j
value, not very useful.
At one point for v1.0, they tried to make interactive and non-interactive scope rules consistent, but people complained because they missed pasting local scope code into the global scope of the REPL or notebook. So some blocks were designated "soft" local scopes, and in interactive contexts, soft scopes reassign global variables, by default. In non-interactive contexts (.jl files, eval()), soft scopes act like hard scopes but print warnings. The hard/soft stuff really complicates things, so try to understand the other rules before applying this angle.