pythonpandasdataframejuliadataframes.jl

How to convert a Python pandas into a Julia DataFrame (using PyJulia) and back to Python Pandas with a function?


I want to pass a dataframe from python to a julia function, perform some calculations and then pass a dataframe back to python.

From How to convert a Python pandas into a Julia DataFrame (using PyJulia) and back to Python Pandas by setting the variables in the Julia environment we can do that which works:

import pandas as pd
from julia import Main

data = {
  "temperature": [420, 380, 390],
  "sun": [50, 40, 45]
}

df = pd.DataFrame(data)
print(df) #print initial dataframe

Main.eval('include("compute_milp.jl")') #source julia script
def call_julia():
    Main.df = df #set variable in julia env

    return Main.eval(f'compute_milp()') #execute julia function from julia script

df1 = call_julia() #pass dataframe, then return
print(df1) #print dataframe like initial

In the same folder, we have the dummy script compute_milp.jl

function compute_milp()
  return df
  end

This previous code is working well to return dataframe. But, without converting to array numpy and global variable, I would like something more clean and functional like:

Main.eval('include("compute_milp.jl")') #source julia script
def call_julia(df_arg): # ADD df
    #Main.df = df #SUPPRESS THIS SET

    return Main.eval(f'compute_milp({df_arg})') #execute julia function from julia script

df1 = call_julia(df) #pass dataframe, then return, ADD ARGUMENT
print(df1)

I get these errors with the functional type code:

Traceback (most recent call last):
  File "c:\Users\\python_julia\stack.py", line 19, in <module>
    df1 = call_julia(df) #pass dataframe, then return
  File "c:\Users\python_julia\stack.py", line 17, in call_julia
    return Main.eval(f'compute_milp({df_arg})') #execute julia function from julia script
  File "C:\Users\Anaconda3\envs\optim\lib\site-packages\julia\core.py", line 627, in eval
    ans = self._call(src)
  File "C:\Users\Anaconda3\envs\optim\lib\site-packages\julia\core.py", line 555, in _call
    self.check_exception(src)
  File "C:\Users\Anaconda3\envs\optim\lib\site-packages\julia\core.py", line 609, in check_exception
    raise JuliaError(u'Exception \'{}\' occurred while calling julia code:\n{}'
julia.core.JuliaError: Exception 'syntax: missing comma or ) in argument list' occurred while calling julia code:
compute_milp(   temperature  sun
0          420   50
1          380   40
2          390   45)

Any idea please?


Solution

  • I find it hard to figure out what exactly you mean by functional. So, I am assuming that you want to avoid global variables.

    Starting off by summarizing what exactly is going wrong:

    The last line of the error is missing comma or ) in argument list, and you can see afterwards how exactly it is trying to call the function compute_milp.

    This is happening because the string formating in Main.eval(f'compute_milp({df_arg})') is interpolating the argument string of eval with the string representation of the dataframe, i.e. it is as if you are running a julia script like

    compute_milp( temperature sun
    0          420   50
    1          380   40
    2          390   45)
    

    From what I understand, df has to be in the Julia environment to be callable from Julia code. So, you ought to uncomment the assignment to Main.df and settle for a compromise like so.

    def call_julia(JlEnv, df_arg):
        JlEnv.df = df_arg  # pass dataframe to Julia environment
        return JlEnv.eval("compute_milp(df)") # without string formating
    
    df1 = call_julia(Main, df)
    print(df1)
    

    Since we're trying to be "functional", I would pass the Julia environment explicitly as well.