I am using princomp
in R to perform PCA. My data matrix is huge (10K x 10K with each value up to 4 decimal points). It takes ~3.5 hours and ~6.5 GB of Physical memory on a Xeon 2.27 GHz processor.
Since I only want the first two components, is there a faster way to do this?
Update :
In addition to speed, Is there a memory efficient way to do this ?
It takes ~2 hours and ~6.3 GB of physical memory for calculating first two components using svd(,2,)
.
You sometimes gets access to so-called 'economical' decompositions which allow you to cap the number of eigenvalues / eigenvectors. It looks like eigen()
and prcomp()
do not offer this, but svd()
allows you to specify the maximum number to compute.
On small matrices, the gains seem modest:
R> set.seed(42); N <- 10; M <- matrix(rnorm(N*N), N, N)
R> library(rbenchmark)
R> benchmark(eigen(M), svd(M,2,0), prcomp(M), princomp(M), order="relative")
test replications elapsed relative user.self sys.self user.child
2 svd(M, 2, 0) 100 0.021 1.00000 0.02 0 0
3 prcomp(M) 100 0.043 2.04762 0.04 0 0
1 eigen(M) 100 0.050 2.38095 0.05 0 0
4 princomp(M) 100 0.065 3.09524 0.06 0 0
R>
but the factor of three relative to princomp()
may be worth your while reconstructing princomp()
from svd()
as svd()
allows you to stop after two values.