pythonpandasnumpy

Why keep NumPy RuntimeWarning


Here is a sample data, even there is no negative or np.nan, it still show error message:

Data:

   gvkey  sale  ebit
4   1000  44.8  16.8
5   1000  53.2  11.5
6   1000  42.9   6.2
7   1000  42.4   0.9
8   1000  44.2   5.3
9   1000  51.9   9.7

Function:

def calculate_ln_values(df):
    conditions_ebit = [
        df['ebit'] >= 0.0,
        df['ebit'] <  0.0
    ]
    choices_ebit = [
        np.log(1 + df['ebit']),
        np.log(1 - df['ebit']) * -1
    ]
    df['lnebit'] = np.select(conditions_ebit, choices_ebit, default=np.nan)
    
    conditions_sale = [
        df['sale'] >= 0.0,
        df['sale'] <  0.0
    ]
    choices_sale = [
        np.log(1 + df['sale']),
        np.log(1 - df['sale']) * -1
    ]
    df['lnsale'] = np.select(conditions_sale, choices_sale, default=np.nan)
    return df

Run

calculate_ln_values(data)

Error Warning:

C:\Users\quoc\anaconda3\envs\uhart\Lib\site-packages\pandas\core\arraylike.py:399: RuntimeWarning: invalid value encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)
C:\Users\quoc\anaconda3\envs\uhart\Lib\site-packages\pandas\core\arraylike.py:399: RuntimeWarning: invalid value encountered in log
  result = getattr(ufunc, method)(*inputs, **kwargs)

I would very appreciate if someone could help me this issue

---- Edit: reply to Answer of @Emi OB and @Quang Hoang: ---------------

The formula as in the paper is:

enter image description here

ln(1+EBIT) if EBIT ≥ 0

-ln(1-EBIT) if EBIT < 0

so my code:

np.log(1 + df['ebit']),
np.log(1 - df['ebit']) * -1

follows the paper.

The part np.log(1 - df['ebit']) is impossible to be negative since it fall under the condition of ebit < 0.


Solution

  • The problem is in this block of code:

        choices_ebit = [
            np.log(1 + df['ebit']),
            np.log(1 - df['ebit']) * -1
        ]
    

    Here, you are calculating both formulas, for when ebit is positive and when it's negative, and storing them in choices_ebit. However, when ebit>=1, the second one will give you the runtime warning, and when ebit<=-1, the first one will give your the runtime warning.

    In order to avoid calculating both formulas, you can factor them out into one with abs() on the one hand, and np.sign() on the other:

        df['lnebit'] = np.log(1 + df['ebit'].abs()) * np.sign(df['ebit'])
    

    This meets your requirements: