pythonpandasyahoo-finance

ValueError: Array conditional must be same shape as self


I am super noob in pandas and I am following a tutorial that is obviously outdated.

I have this simple script that when I run I get tis error :

ValueError: Array conditional must be same shape as self

# loading the class data from the package pandas_datareader
import pandas as pd
from pandas_datareader import data
import matplotlib.pyplot as plt

# Adj Close:
# The closing price of the stock that adjusts the price of the stock for corporate actions.
# This price takes into account the stock splits and dividends.
# The adjusted close is the price we will use for this example.
# Indeed, since it takes into account splits and dividends, we will not need to adjust the price manually.

# First day
start_date = '2014-01-01'
# Last day
end_date = '2018-01-01'
# Call the function DataReader from the class data
goog_data = data.DataReader('GOOG', 'yahoo', start_date, end_date)

goog_data_signal = pd.DataFrame(index=goog_data.index)
goog_data_signal['price'] = goog_data['Adj Close']
goog_data_signal['daily_difference'] = goog_data_signal['price'].diff()

goog_data_signal['signal'] = 0.0
# this line produces the error
goog_data_signal['signal'] = pd.DataFrame.where(goog_data_signal['daily_difference'] > 0, 1.0, 0.0)
goog_data_signal['positions'] = goog_data_signal['signal'].diff()
print(goog_data_signal.head())

I am trying to understand the theory, the libraries and the methodology through practicing so bear with me if it is too obvious... :]


Solution

  • The where method is always called from a dataframe however here, you only need to check the condition for a series, so I found 2 ways to solve this problem:

    1. The new where method doesn't support setting a value for the rows where condition is true (1.0 in your case), but still supports setting a value for the false rows (called the other parameter in the doc). So you can set the 1.0's manually later as follows:
    goog_data_signal['signal'] = goog_data_signal.where(goog_data_signal['daily_difference'] > 0, other=0.0)
    # the true rows will retain their values and you can set them to 1.0 as needed.
    
    1. Or you can check the condition directly as follows:
    goog_data_signal['signal'] = (goog_data_signal['daily_difference'] > 0).astype(int)
    

    The second method produces the output for me:

    price  daily_difference  signal  positions
    Date                                                       
    2014-01-02  554.481689               NaN       0        NaN
    2014-01-03  550.436829         -4.044861       0        0.0
    2014-01-06  556.573853          6.137024       1        1.0
    2014-01-07  567.303589         10.729736       1        0.0
    2014-01-08  568.484192          1.180603       1        0.0