dataframejuliajuliadb

Get column names from DataFrame or JuliaDB table


How do I get the column names from a DataFrame object or JuliaDB IndexedTable object? Is this possible?


Reproducible Code:

using JuliaDB
import DataFrames
DF = DataFrames

# CREATES AN EXAMPLE TABLE WITH JULIADB

colnames = [:samples, :A, :B, :C, :D]
primary_key = [:samples]
coltypes = [Int[], Float64[],Float64[],Float64[],Float64[]]
sample_sizes = [100,200,300]
example_values = (1, 0.4, 0.3, 0.2, 0.1)

mytable = table(coltypes..., names=colnames, pkey=primary_key) # initialize empty table

# add some data to table
for i in sample_sizes
    example_values = (i, 0.4, 0.3, 0.2, 0.1)
    table_params = [(col=>val) for (col,val) in zip(colnames, example_values)]

    push!(rows(mytable), (; table_params...)) # add row
    mytable = table(mytable, pkey = primary_key, copy = false) # sort rows by primary key
end
mytable = table(unique(mytable), pkey=primary_key) # remove duplicate rows which don't exist

# MAKES A DATAFRAME FROM JULIADB TABLE

df = DF.DataFrame(mytable)

For instance, given the above code, how would you check with a conditional if there's a column :E in either mytable or df, (for the purposes of adding such a column if it doesn't exist yet)?

Ultimately, I'm looking for the Julia equivalent of the following Python code:

if 'E' in df.columns:
     # ...
else:
     # ...


Solution

  • If df is a data frame you can write:

    if "E" in names(df)
    ...
    

    (in JuliaDB.jl it would be JuliaDB.colnames)

    but a faster (in terms of run-time and available for data frames option is:

    if hasproperty(df, :E)
    ...
    

    A bit slower, but useful in other cases is (it also works for JuliaDB.jl but first you have to load Tables.jl and write Tables.columnindex instead):

    if columnindex(df, :E) != 0
    ...
    

    The last example columnindex is probably most complex the way it works is described in its documentation:

    help?> columnindex
    search: columnindex
    
      Tables.columnindex(table, name::Symbol)
    
      Return the column index (1-based) of a column by name in a table with a
      known schema; returns 0 if name doesn't exist in table
    
      ────────────────────────────────────────────────────────────────────────────
    
      given names and a Symbol name, compute the index (1-based) of the name in
      names