pythonmathscipynumerical-integration

Integrating using scipy.integrate.simps


I'm trying to learn about the scipy package and I came across something that I just cannot understand.

from scipy.integrate import simps
import numpy as np
def f1(x):
    return x**2
x = np.array([1,3,4])
y1 = f1(x)
I1 = integrate.simps(y1,x)
print(I1)
21.0

This corresponds exactly to

14 x2 dx = 21,

what I don't get is the x = np.array([1, 3, 4]) line. Why do we need the 3 here? 1 and 4 are the limits of the integral so what is 3 then? Can someone explain that to me please?


Solution

  • The documentation of scipy.integrate.simps says:

    y : array_like
    
        Array to be integrated.
    
    x : array_like, optional
    
        If given, the points at which y is sampled.
    

    These are the points at which the function was sampled. As you do not pass the function to be integrated directly to the algorithm, you have to provide sample points. The second array gives the x-location of the y-values y1 that you calculated in the previous line. Although some implementations of numerical integration methods take the integrand function directly, they will always create sample points like you provide here.

    So the array x is not the integration interval, although its max and min give the interval.

    In general for any numerical integration algorithm a higher number of sample points, distributed over the integration interval, will increase the accuracy of the numerical result and only 3 points will almost surely result in a very poor approximation.

    However in your example the integrand is a simple polynomial of order 2. Such are easy to integrate (analytically as well as numerically). The algorithm you are using with scipy.integrate.simps is Simpson's rule, which is based on expanding the integrand up to order 2. Therefore this method is able to solve your sample integral exactly.

    To fully define a second order polynomial you need to specify 3 coefficients and to be able to derive these the algorithm needs to know at least 3 points of the second order polynomial. An additional fourth point however would not give any more information because the curve is already fully specified by three points. This is the reason, why in this example 3 points are sufficient to give the exact result.

    If you do not provide the list x with sample location the result will be in general wrong, as the spacing will be assumed to be 1 between the individual y-values in y1. (see documentation link above)

    Also as a side note using Python 2.7 with Numpy 1.6 and Scipy 0.10 the result of your code above is 20.75, probably because the type of x is assumed to be integer. Explicitly stating they are float with

    x = np.array([1.0,3.0,4.0])
    

    resolved this issue and the result is always exactly 21.0. You can also see that the actual middle value doesn't matter as long as it is between 1.0 and 4.0.