What are the penalties of generic functions versus function pointer arrays in C?

I'm writing a matrix library (part of SciRuby) with multiple storage types ('stypes') and multiple data types ('dtypes'). For example, a matrix's stype may currently be dense, yale (AKA 'csr'), or list-of-lists; and its dtype may be int8, int16, int32, int64, float32, float64, complex64, etc.

It's super easy to write a template processor in Ruby or sed which takes a basic function (like sparse matrix multiplication) and creates a custom version for each possible dtype. I could even write such a template to handle two different dtypes, say if we wanted to multiply an int32 by a float64.

The same can be done in certain cases for different stypes. Eventually, though, you could end up with a very large set of functions, many of which never even get used in the course of most people's use.

It's also easy to use function pointer arrays to enable access to these functions -- and imagining even a 3-dimensional function pointer array is not too hard:

MultFuncs[lhs->stype][lhs->dtype][rhs->dtype](lhs->shape[0], rhs->shape[1], lhs->data, rhs->data, result->data);
// This might point to some function like this:
// i32_f64_dense_mult(size_t, size_t, int32_t*, float64*, float64*);

The extreme alternative to function pointer arrays, of course, which would be incredibly complicated to code and maintain, is hierarchical switch or if/else statements:

switch(lhs->stype) {
case STYPE_SPARSE:
  switch(lhs->dtype) {
  case DTYPE_INT32:
    switch(rhs->dtype) {
    case DTYPE_FLOAT64:
      i32_f64_mult(lhs->shape[0], rhs->shape[1], lhs->ija, rhs->ija, lhs->a, rhs->a, result->data);
      break;
    // ... and so on ...

It also seems that this would be O(sd²), where s=number of stypes, d=number of dtypes for every operation, whereas the function pointer array would be O(r), where r=number of dimensions in the array.

But there's also a third option.

The third option is to use function pointer arrays for common operations (e.g., copying from one unknown type to another):

SetFuncs[lhs->dtype][rhs->dtype](5, // copy five consecutive items
 &to, // destination
 dtype_sizeof[lhs->dtype], // dtype_sizeof is a const size_t array giving sizeof(int8_t), sizeof(int16_t), etc.
 &from, // source
 dtype_sizeof[rhs->dtype]);

And then to call that from a generic sparse matrix multiplication function, which might be declared like this:

void generic_sparse_multiply(size_t* ija, size_t* ijb, void* a, void* b, int dtype_a, int dtype_b);

And that would use SetFuncs[dtype_a][dtype_b] to reference the correct assignment function, for example. The downside, then, is that you might have to implement a whole bunch of these -- IncrementFuncs, DecrementFuncs, MultFuncs, AddFuncs, SubFuncs, etc. -- because you'd never know what types to expect.

So, finally, my questions:

What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow load time? etc.
Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to compile-time optimization?
If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated every single time?

I realize this is an incredibly complicated array of questions.

If you can simply refer me to a good resource and prefer not to answer directly, that's perfectly fine. I used the Google extensively before posting this, but wasn't quite sure what search terms to use.

Solution

First of all, try to reduce the complexity of the function(s). You should be able to have a declaration like

Result_t function (Param_t*);

where Param_t is a struct containing all those things you pass around. To use generic types, include an enum in the struct telling which type that is used, and a void* to that type.

1.What is the cost, if any, of having enormous multi-dimensional const arrays of function pointers? Large library or executable? Slow load time? etc.

Definitely larger executable. Load time depends on what system the code is for. If it is for a RAM-based system (PC etc), then the load time might increase, but it shouldn't have any major impact of performance. Though of course it depends on how large "enormous" is :)

2.Does use of generics like IncrementFuncs, SetFuncs, etc. (which all probably depend on memcpy or typecasts) present barriers to compile-time optimization?

Probably not, there's just so much that the compiler can optimize. When dealing with generic data types in C, it often boils down to memcpy() in the end, which in itself hopefully is implemented to be as fast as copying gets.

3.If one were to use switch statements as described above, would these be optimized out by modern compilers? Or would they be evaluated every single time?

Ironically, the compiler would probably optimize it into something like an array of function pointers. The compiler can however likely not predict the nature of the data, especially if it gets set in runtime.