Specifically: I am trying to use Julia's DataFrames package, specifically the readtable() function with the names option, but that requires a vector of symbols.
So far I have found only a handful of references to the word symbol in the Julia language. It seems that symbols are represented by ":var", but it is far from clear to me what they are.
Aside: I can run
df = readtable( "table.txt", names = [symbol("var1"), symbol("var2")] )
My two bulleted questions still stand.
Symbols in Julia are the same as in Lisp, Scheme or Ruby. However, the answers to those related questions are not really satisfactory, in my opinion. If you read those answers, it seems that the reason a symbol is different from a string is that strings are mutable while symbols are immutable, and symbols are also "interned" – whatever that means. Strings do happen to be mutable in Ruby and Lisp, but they aren't in Julia, and that difference is actually a red herring. The fact that symbols are interned – i.e. hashed by the language implementation for fast equality comparisons – is also an irrelevant implementation detail. You could have an implementation that doesn't intern symbols and the language would be exactly the same.
So what is a symbol, really? The answer lies in something that Julia and Lisp have in common – the ability to represent the language's code as a data structure in the language itself. Some people call this "homoiconicity" (Wikipedia), but others don't seem to think that alone is sufficient for a language to be homoiconic. But the terminology doesn't really matter. The point is that when a language can represent its own code, it needs a way to represent things like assignments, function calls, things that can be written as literal values, etc. It also needs a way to represent its own variables. I.e., you need a way to represent – as data – the foo
on the left-hand side of this:
foo == "foo"
Now we're getting to the heart of the matter: the difference between a symbol and a string is the difference between foo
on the left-hand side of that comparison and "foo"
on the right-hand side. On the left, foo
is an identifier that evaluates the value bound to the variable foo
in the current scope. On the right, "foo"
is a string literal and it evaluates to the string value "foo". A symbol in both Lisp and Julia is how you represent a variable as data. A string represents itself. You can see the difference by applying eval
to them:
julia> eval(:foo)
ERROR: foo not defined
julia> foo = "hello"
"hello"
julia> eval(:foo)
"hello"
julia> eval("foo")
"foo"
What the symbol :foo
evaluates to depends on what – if anything – the variable foo
is bound to, whereas "foo"
always just evaluates to "foo". If you want to construct expressions in Julia that use variables, then you're using symbols (whether you know it or not). For example:
julia> ex = :(foo = "bar")
:(foo = "bar")
julia> dump(ex)
Expr
head: Symbol =
args: Array{Any}((2,))
1: Symbol foo
2: String "bar"
typ: Any
What that dumped-out stuff shows, among other things, is that there's a :foo
symbol object inside of the expression object you get by quoting the code foo = "bar"
. Here's another example, constructing an expression with the symbol :foo
stored in the variable sym
:
julia> sym = :foo
:foo
julia> eval(sym)
"hello"
julia> ex = :($sym = "bar"; 1 + 2)
:(begin
foo = "bar"
1 + 2
end)
julia> eval(ex)
3
julia> foo
"bar"
If you try to do this when sym
is bound to the string "foo"
, it won't work:
julia> sym = "foo"
"foo"
julia> ex = :($sym = "bar"; 1 + 2)
:(begin
"foo" = "bar"
1 + 2
end)
julia> eval(ex)
ERROR: syntax: invalid assignment location ""foo""
It's pretty clear to see why this won't work – if you tried to assign "foo" = "bar"
by hand, it also won't work.
This is the essence of a symbol: a symbol is used to represent a variable in metaprogramming. Once you have symbols as a data type, it becomes tempting to use them for other things, like hash keys. But that's an incidental, opportunistic usage of a data type that has another primary purpose.
Note that I stopped talking about Ruby a while back. That's because Ruby isn't homoiconic: Ruby doesn't represent its expressions as Ruby objects. So Ruby's symbol type is kind of a vestigial organ – a leftover adaptation, inherited from Lisp, but no longer used for its original purpose. Ruby symbols have been co-opted for other purposes – as hash keys, to pull methods out of method tables – but symbols in Ruby are not used to represent variables.
As to why symbols are used in DataFrames rather than strings, it's because you typically bind column values to variables inside of user-provided expressions. So it's natural for column names to be symbols, since symbols are exactly what you use to represent variables as data. Currently, you have to write df[:foo]
to access the foo
column, but in the future, you may be able to access it as df.foo
instead. When that becomes possible, only columns whose names are valid identifiers will be accessible with this convenient syntax.
See also: