Calculating the L2 inner product in numpy?

I'm thinking about the L2 inner product.

I am specifically interested in performing these calculations using numpy/scipy. The best I have come up with is performing an array-based integral such as numpy.trapz.

import numpy as np
n=100000.
h=1./n
X = np.linspace(-np.pi,np.pi,n)

def L2_inner_product(f,g):
    return np.trapz(f*g,dx=2*np.pi*h)/np.pi

print L2_inner_product(np.sin(X), np.sin(X))
print L2_inner_product(np.cos(2*X), np.cos(2*X))
print L2_inner_product(np.sin(X), np.cos(X))
print L2_inner_product(np.sin(X), np.cos(3*X))
print L2_inner_product(np.ones(n),np.ones(n))

0.99999
0.99999
-3.86525742539e-18
1.6565388966e-18
1.99998

To be clear, I am not interested in using Mathematica, Sage, or Sympy. I am specifically interested in numpy/scipy, in which I am exploring the numpy "array space" as a finite subspace of Hilbert Space. Within these parameters, have others implemented an L2 inner product, perhaps using numpy.inner or numpy.linalg.norm?

Solution

With respect to speed, numpy.inner is probably the best choice for fixed n. numpy.trapz should be converging faster though. Either way, if you are worried about speed, you should also take into account the evaluation of the functions themselves will also take some time.

Below some simple benchmark I ran using different inner product implementations.

Timings

The plot below shows the runtime for the computation of only the integral, i.e. not the function evaluation. While numpy.trapz is a constant factor slower, numpy.inner is as fast as calling BLAS dirrectly. As Ophion pointed out, numpy.inner calls BLAS internally probably with some overhead for input checking.

It is also interesting to look at the time it takes to evaluate the function itself, which has of course to be done to compute the inner product. Below a plot that shows the evaluation for standard transcendental functions numpy.sin, numpy.sqrt and numpy.exp. The scaling is of course the same for the evaluation and the summing of the products and the overall time required is comparable

Error

Finally, one should also consider the accuracy of the different methods, and it's here where it actually gets interesting. Below a plot of the convergence of the different implementation for the computation of $\langle exp(x),exp(x) \rangle$ . Here we can see that numpy.trapz actually scales much better than the other two implementations, which don't even reach machine precision before I run out of memory.

Conclusion

Considering the bad convergence properties of numpy.inner, I would go for numpy.trapz. But even then a lot of integration nodes are required to get satisfactory accuracy. Since your integration domain is fixed you might even try going for higher order quadratures.

Code

import numpy as np
import matplotlib.pyplot as plt
import seaborn as sls
from scipy.linalg.blas import ddot
import timeit

## Define inner product.
def l2_inner_blas( f, g, dx ):
    return ddot( f, g )*dx / np.pi

def l2_inner( f, g, dx ):
    return np.inner( f, g )*dx / np.pi

def l2_inner_trapz( f, g, dx ):
    return np.trapz(f*g,dx=dx) / np.pi

sin1 = lambda x: np.sin( x )
sin2 = lambda x: np.sin( 2.0 * x)

## Timing setups.
setup1 = "import numpy as np; from __main__ import l2_inner,"
setup1 += "l2_inner_trapz, l2_inner_blas, sin1, sin2;"
setup1 += "n=%d; x=np.linspace(-np.pi,np.pi,n); dx=2.0*np.pi/(n-1);"
setup1 += "f=sin1(x); g=sin2(x);"

def time( n ):
    setupstr = setup1 % n
    time1 = timeit.timeit( 'l2_inner( f, g, dx)', setupstr, number=10 )
    time2 = timeit.timeit( 'l2_inner_blas( f, g, dx)', setupstr, number=10 )
    time3 = timeit.timeit( 'l2_inner_trapz( f, g, dx)', setupstr, number=10 )
    return (time1, time2, time3)

setup2 = "import numpy as np; x = np.linspace(-np.pi,np.pi,%d);"
def time_eval( n ):
    setupstr = setup2 % n
    time_sin = timeit.timeit( 'np.sin(x)', setupstr, number=10 )
    time_sqrt = timeit.timeit( 'np.sqrt(x)', setupstr, number=10 )
    time_exp = timeit.timeit( 'np.exp(x)', setupstr, number=10 )
    return (time_sin, time_sqrt, time_exp)

## Perform timing for vector product.
times = np.zeros( (7,3) )
for i in range(7):
    times[i,:] = time( 10**(i+1) )

x = 10**np.arange(1,8,1)
f, ax = plt.subplots()
ax.set( xscale='log', yscale='log', title='Inner vs. BLAS vs. trapz', \
        ylabel='time [s]', xlabel='n')
ax.plot( x, times[:,0], label='numpy.inner' )
ax.plot( x, times[:,1], label='scipy.linalg.blas.ddot')
ax.plot( x, times[:,2], label='numpy.trapz')
plt.legend()

## Perform timing for function evaluation.
times_eval = np.zeros( (7,3) )
for i in range(7):
    times_eval[i,:] = time_eval( 10**(i+1) )

x = 10**np.arange(1,8,1)
f, ax = plt.subplots()
ax.set( xscale='log', yscale='log', title='sin vs. sqrt vs. exp', \
        ylabel='time [s]', xlabel='n')
ax.plot( x, times_eval[:,0], label='numpy.sin' )
ax.plot( x, times_eval[:,1], label='numpy.sqrt')
ax.plot( x, times_eval[:,2], label='numpy.exp' )
plt.legend()

## Test convergence.
def error( n ):
    x = np.linspace( -np.pi, np.pi, n )
    dx = 2.0 * np.pi / (n-1)
    f = np.exp( x )
    l2 = 0.5/np.pi*(np.exp(2.0*np.pi) - np.exp(-2.0*np.pi))
    err1 = np.abs( (l2 - l2_inner( f, f, dx )) / l2)
    err2 = np.abs( (l2 - l2_inner_blas( f, f, dx )) / l2)
    err3 = np.abs( (l2 - l2_inner_trapz( f, f, dx )) / l2)
    return (err1, err2, err3)

acc = np.zeros( (7,3) )
for i in range(7):
    acc[i,:] = error( 10**(i+1) )

x = 10**np.arange(1,8,1)
f, ax = plt.subplots()
ax.plot( x, acc[:,0], label='numpy.inner' )
ax.plot( x, acc[:,1], label='scipy.linalg.blas.ddot')
ax.plot( x, acc[:,2], label='numpy.trapz')
ax.set( xscale='log', yscale='log', title=r'$\langle \exp(x), \exp(x) \rangle$', \
        ylabel='Relative Error', xlabel='n')
plt.legend()