tableau-apitabpy

Using SCRIPT_REAL in Data Source Tab in Tableau


I want to create a calculated field in my data source table (in Data Source tab of my workbook). However, Tableau is complaining that the field contains an aggregate calculation data and so it depends on the visualization's level of detail. Here is my script (using TabPy):

SCRIPT_REAL("import numpy as np 
return (2*np.array(_arg1)).tolist()",
SUM([Wind Speed]))

In this case, all the rows in this calculated field are shown as null.

However, when I replace the SCRIPT_REAL with the following, it works:

2*SUM([Wind Speed])

Yet, in both cases, I'm using SUM [an aggregate function], so why does it work in this case but not when I'm using SCRIPT_REAL?

All I want to do is use SCRIPT_REAL to calculate values for a new field (row-wise) and add it to my data table. What am I missing?


Solution

  • Without knowing what your data looks like it is hard to say what is going on. The calculation is valid:

    enter image description here

    One thing that can help a lot when debugging these kinds of problems is to print the data before returning it so that you can inspect it in the console.

    SCRIPT_REAL(
    "import numpy as np
    values = (2*np.array(_arg1)).tolist()
    print values
    return values"
    ,
    SUM([Value])
    )
    

    You can then look at TabPy and see:

    INFO:__main__:{"INFO": "Initializing TabPy..."}
    INFO:__main__:{"INFO": "Done initializing TabPy."}
    INFO:__main__:{"INFO": "Web service listening on port 9004"}
    INFO:__main__:{"INFO": "function to evaluate=def _user_script(tabpy, _arg1):\n import numpy as np\n values = (2*np.array(_arg1)).tolist()\n print values\n return values\n"}
    [14, 20, 32, 44, 58, 52, 40, 66]
    INFO:tornado.access:200 POST /evaluate (172.17.0.1) 53.75ms
    

    So that calculation computes the values: [14, 20, 32, 44, 58, 52, 40, 66]

    As Alex Blakemore mentioned, these calculations are "Table Calculations" and will be computed on "Table" by default. That means that Tableau will pass all the values in the view at once to TabPy. Basically, it passes an array and you get an array back (like in the example above). If you change the "compute by" setting it can pass fractions of the data, or even each row individually, but generally "Table" is best since it passes all the data at once to TabPy and then TabPy passes all the results back in an array.

    =========================================

    Update: a "table calculation" is run on the data in the "view" it cannot be pre-computed in the underlying data since there is no way of knowing what you will be visualizing. If you take a look in the data preparation pane you will see a message about this:

    enter image description here

    Basically, a "table calculation" is run on demand and is ephemeral. If you want the values persisted then I recommend running a standard calculation, or pre-computing the values in the data source. TabPy depends on these ephemeral type of calculations so it is good for exploring data, but not for persisting it.