I have a function that returns integer values to integer input. The output values are relatively sparse; the function only returns around 2^14 unique outputs for input values 1....2^16. I want to create a dataset that lets me quickly find the inputs that produce any given output.
At present, I'm storing my dataset in a Map of Lists, with each output value serving as the key for a List of input values. This seems slow and appears to use a whole of stack space. Is there a more efficient way to create/store/access my dataset?
Added: It turns out the time taken by my sparesearray() function varies hugely on the ratio of output values (i.e., keys) to input values (values stored in the lists). Here's the time taken for a function that requires many lists, each with only a few values:
? sparsearray(2^16,x->x\7);
time = 126 ms.
Here's the time taken for a function that requires only a few lists, each with many values:
? sparsearray(2^12,x->x%7);
time = 218 ms.
? sparsearray(2^13,x->x%7);
time = 892 ms.
? sparsearray(2^14,x->x%7);
time = 3,609 ms.
As you can see, the time increases exponentially!
Here's my code:
\\ sparsearray takes two arguments, an integer "n" and a closure "myfun",
\\ and returns a Map() in which each key a number, and each key is associated
\\ with a List() of the input numbers for which the closure produces that output.
\\ E.g.:
\\ ? sparsearray(10,x->x%3)
\\ %1 = Map([0, List([3, 6, 9]); 1, List([1, 4, 7, 10]); 2, List([2, 5, 8])])
sparsearray(n,myfun=(x)->x)=
{
my(m=Map(),output,oldvalue=List());
for(loop=1,n,
output=myfun(loop);
if(!mapisdefined(m,output),
/* then */
oldvalue=List(),
/* else */
oldvalue=mapget(m,output));
listput(oldvalue,loop);
mapput(m,output,oldvalue));
m
}
To some extent, the behavior you are seeing is to be expected. PARI appears to pass lists and maps by value rather than reference except to the special inbuilt functions for manipulating them. This can be seen by creating a wrapper function like mylistput(list,item)=listput(list,item);
. When you try to use this function you will discover that it doesn't work because it is operating on a copy of the list. Arguably, this is a bug in PARI, but perhaps they have their reasons. The upshot of this behavior is each time you add an element to one of the lists stored in the map, the entire list is being copied, possibly twice.
The following is a solution that avoids this issue.
sparsearray(n,myfun=(x)->x)=
{
my(vi=vector(n, i, i)); \\ input values
my(vo=vector(n, i, myfun(vi[i]))); \\ output values
my(perm=vecsort(vo,,1)); \\ obtain order of output values as a permutation
my(list=List(), bucket=List(), key);
for(loop=1, #perm,
if(loop==1||vo[perm[loop]]<>key,
if(#bucket, listput(list,[key,Vec(bucket)]);bucket=List()); key=vo[perm[loop]]);
listput(bucket,vi[perm[loop]])
);
if(#bucket, listput(list,[key,Vec(bucket)]));
Mat(Col(list))
}
The output is a matrix in the same format as a map - if you would rather a map then it can be converted with Map(...)
, but you probably want a matrix for processing since there is no built in function on a map to get the list of keys.
I did a little bit of reworking of the above to try and make something more akin to GroupBy in C#. (a function that could have utility for many things)
VecGroupBy(v, f)={
my(g=vector(#v, i, f(v[i]))); \\ groups
my(perm=vecsort(g,,1));
my(list=List(), bucket=List(), key);
for(loop=1, #perm,
if(loop==1||g[perm[loop]]<>key,
if(#bucket, listput(list,[key,Vec(bucket)]);bucket=List()); key=g[perm[loop]]);
listput(bucket, v[perm[loop]])
);
if(#bucket, listput(list,[key,Vec(bucket)]));
Mat(Col(list))
}
You would use this like VecGroupBy([1..300],i->i%7)
.