I want to create a calculated column with a Python data function in Spotfire. This is my script:
import pandas as pd
def calculate_passage(df, batch_col, date_col):
df['RowIndex'] = df.groupby(batch_col).cumcount() + 1
df['Passage'] = 0
for batch, batch_df in df.groupby(batch_col):
passage = 1
same_date_count = 0
prev_date = None
for index, row in batch_df.iterrows():
if row['RowIndex'] <= 6:
passage = 1
elif row['RowIndex'] <= 10:
passage = 2
else:
if row[date_col] == prev_date:
same_date_count += 1
else:
same_date_count = 0
if same_date_count == 1:
passage += 1
df.loc[index, 'Passage'] = passage
prev_date = row[date_col]
# Remove the 'RowIndex' column from the output DataFrame
df.drop(columns=['RowIndex'], inplace=True)
passage = df[['Passage']]
# Return a DataFrame with a single column 'Passage'
return passage
My parameters:
And these are my mapping settings if I click 'Run':
And still, I get the error "spotfire.data_function.DataFunctionError: Output variable 'passage' was not defined"...
You are defining a function within the data function. You need a main statement to call this internal function (calculate_passage) with the input parameters sent into the data function. In most cases, you won't have to return passage explicitly from the main bit (but obviously, from calculate_passage you do).
I don't have your data and cannot see your inputs. This simple data function worked for me:
import pandas as pd
import numpy as np
def calculate_passage(df):
df['Passage'] = np.random.randint(0, 99, df.shape[0])
passage = df[['Passage']]
# Return a DataFrame with a single column 'Passage'
return passage
### Main
passage = calculate_passage(df)
The other potential problem is that you are defining passage as a table, not a column. So it will be created as a new distinct data table. If you assign it to a column, provided you have not re-shuffled or reduced the number of rows within the calculate_passage function, it should be added as an extra column to df.