google-sheetslinechart

Binning for line charts in Google Sheets


I have data recording ancient olympic events with year data. I'd like to create a line chart where the x-axis is binned years (let's say decades) and then each of the lines is the sum of events within the bin.

The data looks like

Row 01| -476 | Stadion
Row 02| -476 | Boxing
Row 03| -476 | Pankration
...
Row 10| -434 | Stadion
Row 11| -433 | Boxing
Row 12| -433 | Stadion

The resulting graph would show a line for each event (the last column) with a sum of the count of the number of that event within a configurable bin. Here's a mockup of what I'm imagining: enter image description here

Is that possible in Google Sheets?


Solution

  • You will need to write your own histogram components, which are:

    compute the intervals

    In your examples, you have manually entered select intervals.

    If you want to automatically assign intervals based on the number of "bins" and automatically cover the whole series, you can use sequence in conjunction arrayformula. Suppose K4 is the number of "bins" you manually define. The following formula outputs a column array representing the left boundaries of the intervals.

    =arrayformula(floor(K2+(max(C2:C)-min(C2:C))/K4*sequence(K4,1,0,1)))
    

    enter image description here

    aggregate counts

    Next is to aggregate counts for each intervals. There are many ways to accomplish this. The simplest is to use counta and filter and "dragging", as shown below.

    To make the formula as simple as possible, we also append 1e99 to the end of the intervals using inline array via {...;...} notation.

    enter image description here

    You can further centralize the formula with map and offset. (And with this particular implementation you do not need to append 1e99 on the interval column.)

    =map(L2:L,lambda(c,if(isblank(c),,
           counta(filter(C2:C,D2:D=D$2,
                              C2:C>=c,
                              C2:C<if(isblank(offset(c,1,0)),1E+99,offset(c,1,0)))))))
    

    enter image description here

    plotting

    To connect the histogram data to Google Sheet's line chart, you want to provide the series of x-axis labels corresponding to each "bin". The label part can be done, for example, by

    =arrayformula(if(isblank(L2:L),,if(isblank(L3:L),L2:L&" to "&K3,L2:L&" to "&L3:L)))
    

    (This implementation assumes no appended 1e99.)

    You can then select the range with the histogram values and the x-axis labels, and with the range highlighted, Insert -> Chart. (In the example below, you need to have N2:O4 highlighted.) And then select line graph. You see see all the options having been set by Google Sheet to use the data. If not, you can toggle them yourself as shown below

    enter image description here