What's the canonical way of finding a row in a DataFrame in DataFrames.jl?
For instance, given this DataFrame:
│ Row │ uuid │ name
│ │ String │ String
├──────┼──────────────────────────────────────┼──────────────────────────────
│ 1 │ 0efae8bf-39e6-5d65-b05d-c8947f4cee2a │ COSMA_jll
│ 2 │ 17ccb2e5-db19-44b3-b354-4fd16d92c74e │ CitableImage
Given the name "CitableImage", what's the best way to retrive the uuid?
I would typically use:
filter(:name => ==("CitableImage"), df)
which produces a data frame as you can have more than one matching row.
If you are sure that only one row will match then you can also write:
df[only(findall(==("CitableImage"), df.name)), :]
(the only
function checks that you picked only one row)
If you want to get a data frame using indexing you can write:
df[df.name .== "CitableImage", :]
or
df[findall(==("CitableImage"), df.name), :]
Finally we also provide the subset
function, but its normal use case is a bit different so here is is more verbose than filter
:
subset(df, :name => ByRow(==("CitableImage")))
If you want to do many lookups and want them to be efficient then it is better to do the following:
gdf = groupby(df, :name)
and then do:
gdf[("CitableImage",)]
which will be much faster if you do many such lookups.