rggplot2

What transformation stat = 'bin' does in geom_point()


What is the transformation that using stat = 'bin' does in geom_point

ggplot(mpg, aes(x = displ)) + geom_point(color = 'red', stat = 'bin') + 
              geom_text(stat = 'bin',aes(label = stat(count)))

What are these values?

If I use stat = 'count' I understand the result but I dont understand when Im using stat = 'bin'


Solution

  • stat_bin counts binned (grouped) values; stat_count counts actual values.

    stat_bin is effectively the peaks of each in a histogram, which is binning the data and then counting the values in the bin.

    Compare the two plots:

    ggplot(mpg, aes(x = displ)) + geom_histogram(color = 'red')
    ggplot(mpg, aes(x = displ)) + geom_point(color = 'red', stat = 'bin') + geom_text(stat = 'bin',aes(label = stat(count)))
    

    two ggplot plots, one with a histogram, one with points marking histogram peaks

    Notice how each of the values in your stat='bin' plot marries up with the peaks of the histogram.

    Conversely, stat='count' is effectively just table(mpg$displ):

    table(mpg$displ)
    # 1.6 1.8 1.9   2 2.2 2.4 2.5 2.7 2.8   3 3.1 3.3 3.4 3.5 3.6 3.7 3.8 3.9   4 4.2 4.4 4.6 4.7   5 5.2 5.3 5.4 5.6 5.7 5.9   6 6.1 6.2 6.5   7 
    #   5  14   3  21   6  13  20   8  10   8   6   9   4   5   2   3   8   3  15   4   1  11  17   2   5   6   8   1   8   2   1   1   2   1   1 
    ggplot(mpg, aes(x = displ)) + geom_point(color = 'red', stat = 'count') + geom_text(stat = 'count',aes(label = stat(count)))
    

    one ggplot plot using stat='count'

    Notice that the counts are the same.

    Bottom line: counting raw data and counting binned data, that is the difference.