pythonexceldataframe

Dynamic Range from Python DataFrame into Pivot Table


Running Python in excel, how can a dataframe be used as the data for an excel table, to then make pivot charts with the data.

For example, in excel, cell A1 contains a pd.DataFrame: pd.DataFrame(np.random.randint(0,3,size=(3,3)),columns = list('ABC'),index = [1,2,3]). This dataframe is 4x4 (including the index which is output in an excel column), how can this be loaded into a excel table with 4 columns, 1 header row, and 3 rows? Ability to have a variable number of rows and headers would be nice also.

This can easily be viewed as a range in excel by changing the cell output to 'Excel Value' from 'Python Object', but making that range a table fails due to spill errors and so can't go into a pivot chart.

This could avoided easily by plotting using python, but that looses the excel chart familiarity and portability to other MS programs which is needed (unless anyone knows of a way to create excel pivot charts that keep MS program functionality in python?)

Ideally need a solution without VBA.


Solution

  • The issue here only partially has to do with "Python in Excel", namely as far as the shape is concerned (1). Otherwise, the difficulty lies with connecting the PivotTable to a dynamic range (2), setting up the refresh (3), and adding a timestamp for that refresh (4).


    1. Shape dataframe

    With your current use of pd.DataFrame, you'll end up with a blank top-left cell. E.g., with PY() in cell A1:

    A B C
    1 2 0 0
    2 0 2 2
    3 1 2 2

    Left like this, the connected PivotTable will protest: "The PivotTable field name is not valid...". To avoid this (the easy way), you want to fill up A1 with a "header" as well. Either leave out the index altogether (it will not show), or use a trick and add a name for the columns (not the index):

    E.g. for PY() in cell A1 with "Excel Value" selected, we can use something like below to have the data spill into the sheet:

    from string import ascii_uppercase
    rows = np.random.randint(1,11)
    cols = 4
    df = pd.DataFrame(np.random.randint(0,3,size=(rows,cols)),
                      columns=pd.Index(list(ascii_uppercase[:cols]), name='index'),
                      index=range(1, rows+1))
    df
    

    N.B.1 Make sure that the length of your data is at least one row (hence randint(1, ...) above). With 0 rows, you will likely trigger an error with the PivotTable: "One or more field names used in the report are no longer valid", plus the selected "Fields" will reset.

    N.B.2 You can add more columns, and if you relabel columns with "Fields" used by the PivotTable (e.g., from "B" for column C to "B" for column E), this will keep the PivotTable functionality as is. However, if you drop a label, the PivotTable will also drop that "Field" and you will have to reconfigure the "Fields" (go to "PivotTable Analyze" -> "Field List").


    2. Source PivotTable

    To use the spilled range as input for a PivotTable, it is best to create a "Name". Go to "Formulas -> Name Manager", add a "Name" and for "Refers to" use a reference to the cell with =PY() and add a # for the spill. Here, I've used df_data with =Sheet1!$A$1#.

    Add a Name

    Now, go to "Insert -> PivotTable". At "Table/Range" fill in the "Name" df_data (without = at the beginning). For auto-refresh at certain intervals / opening the file, make sure also to tick the box for "Add this data to the Data Model":

    PivotTable from table or range

    Configure the PivotTable, and go to "PivotTable Analyze" to add your PivotChart. N.B. You have to select a cell inside your PivotTable to see "PivotTable Analyze" in the ribbon.

    If you are happy with manual refreshing, you can skip ticking the "Data Model" box, and simply use "Data -> Refresh" or "Refresh all" as required, and you are done.


    3. Configure refresh

    For auto-refreshing, go to "Data -> Queries & Connections" and in the window pane (to the right) select WorksheetConnection_Book1!df_data. Right click to enter "Properties...":

    Queries & Connections

    Now, set the desired refresh in minutes + refresh upon opening the file. N.B. PY() cells do not auto-refresh, also not on opening. You will need to use "reset" in "Formulas -> Python (Preview)" to trigger a re-calculation.

    Connection Properties

    With refresh for your python code:

    PY refresh


    4. Add a timestamp for the refresh

    Somewhat annoyingly, the above implementation lacks a timestamp, so that the user cannot ascertain that there was a refresh. With VBA adding one would be easy. Without the aid of VBA, you can get fairly close by following the instructions provided in this answer by @Wizhi. Here, you do not have to tick the "Add this data to the Data Model" box. After adding the Query, copy the refresh settings that you used for the PivotChart connection, and the timestamp and data refresh will run successively at the interval, with a negligible delta.

    You can end up with something like this:

    PivotTable and Chart with refresh + timestamp