rdplyrversions

Error: could not find function "distinct" when using dplyr library for R on Windows 7


I'd like to get the unique values from a column in a dataframe. With the R package dplyr, it should be possible.

enter image description here

This distinct(select(dataframe, column)) works great on my Mac. In RStudio on Windows 7 I encounter this:

enter image description here

when I run this R code:

library(dplyr)
df <- data.frame(replicate(4,sample(0:1,10,rep=TRUE)))

enter image description here

unique_values <- distinct(select(df, X1))

enter image description here

EDIT

Please check if dplyr::distinct(select(df, X1)) works? – akrun

Of course - here is the console output:

enter image description here

EDIT

I've not used distinct, but perhaps unique would work for you? unique(df$X1) – NPE

It does work, and it's concise too! I would still like to understand this dplyr error...

enter image description here

EDIT

Please add the output of sessionInfo() instead. – Roland

enter image description here

EDIT

some comments note that dplyr_0.2 version is old. install.packages("dplyr") gets a CRAN link to the old package. Now to figure out how to manually install dplyr_0.3.0.2.

enter image description here


Solution

  • Figured it out! Old R means old dplyr means no distinct() function.

    To fix this, install the latest version of R:

    1. go to http://www.r-project.org
    2. click on 'CRAN'
    3. then choose the CRAN site that you like. I like Kansas: http://rweb.quant.ku.edu/cran/
    4. click on 'Download R for X' [where X is your operating system]
    5. follow the installation procedure for your operating system
    6. restart RStudio
    7. rejoice

    source: this very nice answer

    Then run the command install.packages("dplyr") in the RStudio Console.

    Now you can create a dataframe and use the distinct() function to get the unique values from one of its columns:

    library(dplyr)
    
    # create a dataframe with some values
    df <- data.frame(replicate(4,sample(0:1,10,rep=TRUE)))
    df
    
    # select a column from that dataframe and get a list of the unique values
    unique_values <- distinct(select(df, X1))
    unique_values
    

    In the console you should see:

    enter image description here

    Thanks to David Arenburg and Richard Scriven for pointing our that dplyr-0.2 is old and lacks the distinct() function. This line of thinking led to the answer.