sql postgresql group-by aggregate-functions bucket

Inexact GROUP BY aggregation

I'm trying to figure out a way to reason about a fairly simple problem:

A table foo has two columns, id with unique identifiers, and value with a random numeric value. Given a constant threshold value, find the groups of values where their value is within that threshold - the count of that group, and the group's average value.

Let's set the stage - here is the foo table (in PostgreSQL):

CREATE TABLE foo (
    id serial PRIMARY KEY,
    value numeric NOT NULL
);
INSERT INTO foo(value) VALUES (1),(2),(2),(3),(6),(7),(7),(8);

Which gives the following data:

id	value
1	1
2	2
3	2
4	3
5	6
6	7
7	7
8	8

The query should return counts and averages - entries cannot be double-counted, so I'd like the rows to be "cut into groups" based on some threshold "near-ness" value - you can think of the threshold value as a radius (for my example, a radius of 1.5 would be what I want):

SELECT AVG(foo.value), COUNT(foo.id)
  FROM foo
  GROUP BY foo.value; -- where I'm stuck

The query above will only apply the aggregate functions to rows in which foo.value has exact duplicates - the following is the result of this query:

AVG(foo.value)	COUNT(foo.id)
1	1
2	2
3	1
6	1
7	2
8	1

what I'd like is something like the following pseudo-SQL:

SELECT AVG(foo.value), COUNT(foo.id)
  FROM foo
  GROUP BY
   (real(row_being_grouped.value) <= real(foo.value) + 1.5)
   AND
   (real(row_being_grouped.value) >= real(foo.value) - 1.5);

Where the result would look like the following:

AVG(foo.value)	COUNT(foo.id)
2	4
7	4

I'm not sure if this makes any sense. I am wondering if I can get away with this without doing a sub-query - maybe "bucket" the rows somehow before finding the average within the buckets?

EDIT:

Before writing this question, I was unaware of window functions, and see how critical they are to this question - window functions could be used to process this data in a sub-query.

Solution

If "threshold" can be understood as the maximum gap allowed between values of the same group, then it's a well defined gaps-and-islands problem.
Here is a solution:

SELECT grp, count(*) AS grp_count, round(avg(value), 2) AS grp_avg
FROM  (
   SELECT count(gap) OVER (ORDER BY value) AS grp, *
   FROM  (
      SELECT value
           , value - lag(value) OVER (ORDER BY value) > 150 OR null AS gap
      FROM   foo
      ) sub1
   ) sub2
GROUP  BY grp
ORDER  BY grp;

fiddle (with step-by-step demo)

Explanation and links to more:

How to group timestamps into islands (based on arbitrary gap)?

For convenience and short code I use Boolean logic (true OR null → true, false OR null → null), and the fact that count() ignores null values. See:

For absolute performance, is SUM faster or COUNT?

If this is more of a raster / cluster / granulation / grid problem, you need to define exactly which rows to pick as "focal points" or, alternatively, the exact nature of the independent raster / grid.