dataframejuliajulia-dataframe

Find a row in a Julia DataFrame


What's the canonical way of finding a row in a DataFrame in DataFrames.jl?

For instance, given this DataFrame:

│ Row  │ uuid                                 │ name                          
│      │ String                               │ String                       
├──────┼──────────────────────────────────────┼──────────────────────────────
│ 1    │ 0efae8bf-39e6-5d65-b05d-c8947f4cee2a │ COSMA_jll                    
│ 2    │ 17ccb2e5-db19-44b3-b354-4fd16d92c74e │ CitableImage   

Given the name "CitableImage", what's the best way to retrive the uuid?


Solution

  • I would typically use:

    filter(:name => ==("CitableImage"), df)
    

    which produces a data frame as you can have more than one matching row.

    If you are sure that only one row will match then you can also write:

    df[only(findall(==("CitableImage"), df.name)), :]
    

    (the only function checks that you picked only one row)

    If you want to get a data frame using indexing you can write:

    df[df.name .== "CitableImage", :]
    

    or

    df[findall(==("CitableImage"), df.name), :]
    

    Finally we also provide the subset function, but its normal use case is a bit different so here is is more verbose than filter:

    subset(df, :name => ByRow(==("CitableImage")))
    

    If you want to do many lookups and want them to be efficient then it is better to do the following:

    gdf = groupby(df, :name)
    

    and then do:

    gdf[("CitableImage",)]
    

    which will be much faster if you do many such lookups.