arrayspostgresqlsparse-matrixunnest

Can I avoid unnest() skipping unset array elements?


I had expected unnest() to return a row for index 1 through array_length() of an array, but apparently that is not the case. Consider this example:

create table T (
    id int,
    list int[]
);

insert into T values
    (1, array[null, 42]),   -- filled from the start
    (2, array[]::int[]);    -- we'll fill this later as we get values

update T set list[2] = 42 where id = 2;  -- we got a value for id 2 now

select x.* from T, unnest(T.list) with ordinality as x (val, idx) where id = 1;
select x.* from T, unnest(T.list) with ordinality as x (val, idx) where id = 2;

The output of the two selects is:

 val | idx 
-----+-----
     |   1
  42 |   2

 val | idx 
-----+-----
  42 |   1

This is unfortunate, because in my case the position of the elements is significant. The second query showing an index of 1 for the value 42 breaks what I'm trying to do.

I'm actually glad that PostgreSQL doesn't just blindly fill unset array indices with NULL, because that can potentially save a lot of storage space (the second array is stored as [2:2]={42} instead of {NULL,42}). But in this instance, it's inconvenient.

What would be the best way to work around that?

Background: we need to store billions of float8 values. They arrive in an EAV format (entity-attribute-value), and we aggregate them into arrays by entity, with the attribute providing the array index. This is hardly relationally sound anymore, but the sheer amount of data is hard to manage otherwise.


Edit: I don't know why StackOverflow keeps changing the "sparse-array" tag to "sparse-matrix". This has nothing to do with multidimensional arrays.


Solution

  • Not sure why it is documented separately and not with the array functions, but you can use the generate_subscripts function for this. Unlike WITH ORDINALITY, it does not produce an ordering but the actual subscripts that you can use to index the array:

    SELECT list[x.idx] AS val, x.*
    FROM T, generate_subscripts(T.list, 1) WITH ORDINALITY AS x (idx, ord);
    

    (online demo)

    However, notice that Postgres arrays are not actually sparse - they just have arbitrary (lower and upper) bounds. I'm not sure what the purpose of this is - maybe it's got something to do with how array slices are represented, maybe it was meant to support 0-based indexing (although this causes more confusion than value), but in either case

    [When] assigning to elements not already present[, ] any positions between those previously present and the newly assigned elements will be filled with nulls.

    If you really need something sparse where arbitrary keys can be inserted and nothing in between takes space, look at hstore or jsonb.