I have working code using test data, seen here: (sample data matches type and format of inputs that I will be using this script with later, including dates as strings appended with 'Z')
from sklearn import linear_model
import pandas as pd
import numpy as np
d = {'Start Sunday of FW': ['2022-03-02Z', '2022-03-03Z', '2022-03-04Z', '2022-03-05Z', '2022-03-06Z', '2022-03-01Z',
'2022-03-02Z', '2022-03-03Z', '2022-03-04Z', '2022-03-05Z', '2022-03-06Z', '2022-03-01Z'],
'Store': [1111, 1111, 1111, 1111, 1111, 1111, 2222, 2222, 2222, 2222, 2222, 2222],
'Sales': [3163, 4298, 2498, 4356, 4056, 3931, 3163, 4298, 2498, 4356, 4056, 1]}
df = pd.DataFrame(data=d)
def get_coef(input):
def model1(df):
y = df[['Sales']].values
x = df[['Start Sunday of FW']].values
return np.squeeze(linear_model.LinearRegression().fit(x,y).coef_)
cnames = {'Store': 'Store', 0: 'Coef'}
def prep_input(df):
df['Start Sunday of FW'] = df['Start Sunday of FW'].astype('string').str.rstrip('Z').astype('datetime64[ns]')
return df
return pd.DataFrame(prep_input(input).groupby('Store').apply(model1)).reset_index().rename(columns=cnames)
print(get_coef(df))
This code runs fine on its own.
I have installed Tableau Prep and TabPy, and set it up correctly per instructions.
When I try to run the version in the code block below from Prep, however, the Prep flow fails and I get the error:
```2022-04-06,13:08:58 [ERROR] (base_handler.py:base_handler:115): Responding with status=500, message="Error processing script", info="TypeError : Object of type ndarray is not JSON serializable"```
If I instead have it print()
the current return
line from get_coef()
, return the input
instead, and remove the get_output_schema
function: the return is printed correctly, the output does flow to Tableau Prep in the live previews, but the error still raises and the flow still won't work, which is baffling.
from sklearn import linear_model
import pandas as pd
import numpy as np
def get_coef(input):
def model1(df):
y = df[['Sales']].values
x = df[['Start Sunday of FW']].values
return np.squeeze(linear_model.LinearRegression().fit(x,y).coef_)
cnames = {'Store': 'Store', 0: 'Coef'}
def prep_input(df):
df['Start Sunday of FW'] = df['Start Sunday of FW'].astype('string').str.rstrip('Z').astype('datetime64[ns]')
return df
return pd.DataFrame(prep_input(input).groupby('Store').apply(model1)).reset_index().rename(columns=cnames)
def get_output_schema():
return pd.DataFrame({
'Store' : prep_string(),
'Coef' : prep_decimal()
})
Can someone help me understand the issue? I don't know anything about JSON serialization to begin with, so posts like this are of little help to me; I can't even assess relevance.
result = pd.DataFrame(prep_input(input).groupby('Store').apply(model1)).reset_index().rename(columns=cnames)
result['Coef'] = result['Coef'].astype('double')
return result
Coef was dtype object, TabPy requires double.