PostgreSQL solution to work easier on a specific row with multiple columns

I have a table where I want to do some statistics on each rows different columns. For example:

id	NW2020	NW2021	NW2022	NW2023	NW2024	CA2020	...
1	-299	4000	21	-325	2544	55
2	54	244	-5	-54	325	874

How can I have a table with the frequency of negative values of each Net Worth? Is there a way that I can ask for "NW%" columns? (a kind of GROUP BY but by columns and not by rows)

The only (UGLY) solution I had in mind was this one :

SELECT *, 
((CASE NW2020 < 0 THEN 1 ELSE 0 END) + (CASE WHEN NW2021 < 0 THEN 1 ELSE 0 END) + (CASE WHEN NW2022 < 0 THEN 1 ELSE 0 END) + ... )::double precision/
    (CASE WHEN NW2020 is not null THEN 1 ELSE 0 END) + (CASE WHEN NW2021 is not null THEN 1 ELSE 0 END) + (CASE WHEN NW2022  is not null THEN 1 ELSE 0 END) + ... )as freq_negative
FROM a

(I know coalesce but it doesn't short it enough for the amount of columns I have.

I had the same issue calculating the average and the standard deviation.

Is my DB and table organization bad?

Thanks in advance

Solution

If improving the database structure is not a viable solution for some reason and you are absolutely forced to do this (don't do it if you are not forced by some requirement or obstacle), then you can create a stored function, let's fall it f which:

takes in a parameter
returns the CASE WHEN expression for the input received

Then your expression would simplify to

SELECT *, 
((f(NW2020)) + (f(NW2021)) + (f(NW2022)) + ... )::double precision/
    (g(NW2020)) + (g(NW2021)) + (g(NW2022)) + ... )as freq_negative
FROM a

where g is another stored function created similarly. It would look much better, but still not ideal because the database is not planned well.

Here's how you can create a stored function in PostgreSQL.

But, the solution above is only a bandage and the solution would be to fix your database schema. NW2020, NW2021, NW2022, ... already breaks 1NF, particularly the principle of

There are no repeating groups or arrays in any row.

Because NW is something in essence and having a field for each year makes it redundant and the structure volatile. You've encountered this issue from the selection perspective, but as years pass if you leave this unchanged, you will always need to alter the tables add fields and adjusts all selections that want all years.

Instead, create an NW table like:

NW(y, ...)

where y represents the year. And one for CA too

CA(y, ...)

and whatever your a is, don't store repetitive values inside of it. Because this is a many-to-many relation, that is, an a has multiple references to MW and CA, whilst an MW and a CA is being references by multiple a records. So you need mapping tables, like

a_MW(id, aid, y, value)

and

a_CA(id, aid, y, value)

and then you can do this:

SELECT SUM(f(a_MW.value)) AS first_field, SUM(g(a.MW.value)) AS second_field
FROM a
LEFT JOIN a_MW
ON a.id = a_MW.aid
LEFT JOIN MW
ON a_MW.y = MW.y
GROUP BY a.id

The SUM function is an aggregation and since these are similar values, you can aggregate them as you please.