pythonnumpymathinformation-theory

Issue in python computing entropy


I am trying to calculate the differential entropy (from information theory) but run into some issues in python. My attempt is the following:

I have the following differential entropy function:

import numpy as np
from scipy.stats import norm
from scipy import integrate

def diff_entropy(nu, constant):

  def pdf_gaus_mixture(input):
    return (1-nu)*norm.pdf(input, loc=0, scale=1) + nu*norm.pdf(input, loc=constant, scale=1)

  def func(input):
    return pdf_gaus_mixture(input) * np.log(1 / pdf_gaus_mixture(input))

  return integrate.quad(func, -np.inf, np.inf)[0]

I would like to compute the following:

nu=0.1
beta=0.01
delta=0.1
sigma=0.01
diff_entropy(nu, np.sqrt(1/((beta/delta)+(sigma**2))))

But python is giving me the following errors:

<ipython-input-22-6267f1f9e56a>:7: RuntimeWarning: divide by zero encountered in double_scalars
  return pdf_gaus_mixture(input) * np.log(1 / pdf_gaus_mixture(input))
<ipython-input-22-6267f1f9e56a>:7: RuntimeWarning: invalid value encountered in double_scalars
  return pdf_gaus_mixture(input) * np.log(1 / pdf_gaus_mixture(input))
<ipython-input-22-6267f1f9e56a>:7: RuntimeWarning: overflow encountered in double_scalars
  return pdf_gaus_mixture(input) * np.log(1 / pdf_gaus_mixture(input))
<ipython-input-22-6267f1f9e56a>:9: IntegrationWarning: The occurrence of roundoff error is detected, which prevents 
  the requested tolerance from being achieved.  The error may be 
  underestimated.
  return integrate.quad(func, -np.inf, np.inf)[0]
nan

Issue: What am I doing wrong? I suspect the issue is due to the end points of the integral being negative and positive infinity. I can change it to something small like plus minus 10, but I am afraid of the loss in accuracy of the approximation. Is there a smarter way to overcome this? Thanks.


Solution

  • Your nested function func multiplies various times 0.0 by np.inf, which is undefined. I discovered this modifying your functions like this:

    def diff_entropy(nu, constant):
    
      def pdf_gaus_mixture(input):
    
        return (1-nu)*norm.pdf(input, loc=0, scale=1) + nu*norm.pdf(input, loc=constant, scale=1)
    
      def func(input):
        expr1 = pdf_gaus_mixture(input)
        expr2 = np.log(1 / pdf_gaus_mixture(input))
    
        if expr1 == 0:
            print(input, expr1, expr2)
        return expr1 * expr2
    
      return integrate.quad(func, -np.inf, np.inf)[0]
    

    Technically, you could try to loop over your calculations and increase the lower and upper boundaries of your integral until python multiplies 0 by np.inf, that is to say until python cannot give you a more accurate result. I used the code below to achieve this. Let me know if this was useful.

    import numpy as np
    from scipy.stats import norm
    from scipy import integrate
    
    
    def diff_entropy(nu, constant, lower_int_boundary, upper_int_boundary):
    
        def pdf_gaus_mixture(input):
    
            return (1-nu)*norm.pdf(input, loc=0, scale=1) + nu*norm.pdf(input, loc=constant, scale=1)
    
        def func(input):
            return pdf_gaus_mixture(input) * np.log(1 / pdf_gaus_mixture(input))
    
        return integrate.quad(func, lower_int_boundary, upper_int_boundary)[0]
    
    
    nu=0.1
    beta=0.01
    delta=0.1
    sigma=0.01
    constant = np.sqrt(1/((beta/delta)+(sigma**2)))
    
    lower_int_boundary = 0
    upper_int_boundary = 0
    step_size = 0.25
    entropy_results = list()
    boundaries = list()
    
    while True:
        lower_int_boundary -= step_size
        upper_int_boundary += step_size
    
        entropy = diff_entropy(nu, constant, lower_int_boundary, upper_int_boundary)
    
        if np.isnan(entropy):
            break
    
        entropy_results.append(entropy)
        boundaries.append([lower_int_boundary, upper_int_boundary])
    
    print(f"Most accurate entropy calculated: {entropy_results[-1]}")  # 1.6664093342815425
    print(f"Boundaries used: {boundaries[-1]}")  # [-37.5, 37.5]