cmacoslapackapple-m1accelerate-framework

Accelerate framework uses only one core on Mac M1


The following C program (dgesv_ex.c)

#include <stdlib.h>
#include <stdio.h>

/* DGESV prototype */
extern void dgesv( int* n, int* nrhs, double* a, int* lda, int* ipiv,
                double* b, int* ldb, int* info );

/* Main program */
int main() {
        /* Locals */
        int n = 10000, info;
        /* Local arrays */
        /* Initialization */
        double *a = malloc(n*n*sizeof(double));
        double *b = malloc(n*n*sizeof(double));
        int *ipiv = malloc(n*sizeof(int));
        for (int i = 0; i < n*n; i++ )
        {
                a[i] = ((double) rand()) / ((double) RAND_MAX) - 0.5;
        }
        for(int i=0;i<n*n;i++)
        {
            b[i] = ((double) rand()) / ((double) RAND_MAX) - 0.5;
        }

        /* Solve the equations A*X = B */
        dgesv( &n, &n, a, &n, ipiv, b, &n, &info );
        free(a);
        free(b);
        free(ipiv);
        exit( 0 );
} /* End of DGESV Example */

compiled on a Mac mini M1 with the command

clang -o dgesv_ex dgesv_ex.c -framework accelerate

uses only one core of the processor (as also shown by the activity monitor)

me@macmini-M1 ~ % time ./dgesv_ex 
./dgesv_ex  35,54s user 0,27s system 100% cpu 35,758 total

I checked that the binary is of the right type:

me@macmini-M1 ~  % lipo -info dgesv
Non-fat file: dgesv is architecture: arm64

As a comparaison, on my Intel MacBook Pro I get the following output :

me@macbook-intel ˜ % time ./dgesv_ex
./dgesv_ex  142.69s user 0,51s system 718% cpu 19.925 total

Is it a known problem ? Maybe a compilation flag or else ?


Solution

  • The original poster and the commenter are both somewhat unclear on exactly how AMX operates. That's OK, it's not obvious! For pre-A15 designs the setup is:

    (a) Each cluster (P or E) has ONE AMX unit. You can think of it as being more an attachment of the L2 than of a particular core. (b) This unit has four sets of registers, one for each core. (c) An AMX unit gets its instructions from the CPU (sent down the Load/Store pipeline, but converted at some point to a transaction that is sent to the L2 and so the AMX unit).

    Consequences of this include that

    More details can be found here: https://gist.github.com/dougallj/7a75a3be1ec69ca550e7c36dc75e0d6f

    It is certainly possible that Apple could change various aspects of this at any time, for example adding two AMX units to the P-cluster. Presumably when this happens, Accelerate will be updated appropriately.