I want to use PyJulia
to speed up some part of the code
import numpy as np
import julia
import pandas as pd
import random
from julia import Base
from julia import Main
from julia import DataFrames
n = 100000
randomlist = []
for i in range(0,n):
num = random.randint(1,100)
randomlist.append(num)
data = {
'Score': list(randomlist),
'ScoreBin': list(np.zeros(n))
}
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.dfj = df
Main.eval("""
for i = 1:10
#println(i)
if dfj.Score[i] >= 10
println(dfj.Score[i])
end
end
"""
)
However I get the following error Message:
JuliaError: Exception 'TypeError: non-boolean (PyObject) used in boolean context' occurred while calling julia code:
Moreover the following command:
Main.eval("""
println(dfj.Score[1])
"""
)
gives the output (which appears not to be a Julia DataFrame):
PyObject 84
Is there a way to convert a pandas DataFrame into a Julia DataFrame?
Edit 1
Thanks to the answer of @PrzemyslawSzufel, the following code now works:
import numpy as np
import julia
import pandas as pd
import random
import copy
from julia import Base
from julia import Main
from julia import DataFrames
from julia import Pandas
#julia.install(DataFrame)
%load_ext julia.magic
n = 100000
randomlist = []
for i in range(0,n):
num = random.randint(1,100)
randomlist.append(num)
data = {
'Score': list(randomlist),
'ScoreBin': list(np.zeros(n))
}
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;
Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
""")
However, although I put a ;
at the end of the line, I always get a printed output from dfj which is unwanted and long (100000 rows) and takes around a second. Is there way to avoid the printed output?
Moreover, if I now modify the dataframe in Julia (which is way faster than doing that in python and the goal of the whole question) and want it to convert it back to a python pandas, I also get an error
Main.eval("""
for i = 1:length(dfj[:, :Score])
if dfj[i, :Score] > 50
dfj[i, :ScoreBin] = 1
end
end
"""
)
dfjpy = pd.DataFrame(Main.dfj)
dfjpy
RuntimeError: Julia exception: MethodError: no method matching iterate(::DataFrames.DataFrame)
Closest candidates are:
iterate(!Matched::Core.SimpleVector) at essentials.jl:568
iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:568
iterate(!Matched::ExponentialBackOff) at error.jl:199
...
Stacktrace:
[1] jlwrap_iterator(::DataFrames.DataFrame) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:144
[2] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:125
By the way the command type(dfjpy)
gives PyCall.jlwrap
as output
Edit 2
In order to convert a julia Dataframe to Python Pandas, you have to first convert it to a Julia Pandas. Is is the latest working code
n = 100000
randomlist = []
for i in range(0,n):
num = random.randint(1,100)
randomlist.append(num)
data = {
'Score': list(randomlist),
'ScoreBin': list(np.zeros(n))
}
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;
Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
for i = 1:length(dfj[:, :Score])
if dfj[i, :Score] > 50
dfj[i, :ScoreBin] = 1
end
end
dfjp = dfj |> Pandas.DataFrame;
"""
)
dfjpy = Main.dfjp
dfjpy
You need to have Pandas.jl
installed. This library will process your Python pandas data frame for sanity with Julia and than you can convert it to DataFrames.jl
.
Here is the Julia code (assumes that dfj
is your Python variable):
import DataFrames
import Pandas
juliandf = dfj |> Pandas.DataFrame |> DataFrames.DataFrame;
Note that the last line can be also written as:
C= DataFrames.DataFrame(Pandas.DataFrame(dfj));
To convert back Pandas.DataFrame(juliandf)
should work.