juliapycall

Call Python from Julia


I am new to Julia and I have a Python function that I want to use in Julia. Basically what the function does is to accept a dataframe (passed as a numpy ndarray), a filter value and a list of column indices (from the array) and run a logistic regression using the statsmodels package in Python. So far I have tried this:

using PyCall

py"""
import pandas as pd
import numpy as np
import random
import statsmodels.api as sm
import itertools
def reg_frac(state, ind_vars):
    rows = 2000
    total_rows = rows*13
    data = pd.DataFrame({
    'state': ['a', 'b', 'c','d','e','f','g','h','i','j','k','l','m']*rows, \
    'y_var': [random.uniform(0,1) for i in range(total_rows)], \
    'school': [random.uniform(0,10) for i in range(total_rows)], \
    'church': [random.uniform(11,20) for i in range(total_rows)]}).to_numpy()
    try:
        X, y = sm.add_constant(np.array(data[(data[:,0] == state)][:,ind_vars], dtype=float)), np.array(data[(data[:,0] == state), 1], dtype=float)
        model = sm.Logit(y, X).fit(cov_type='HC0', disp=False)      
        rmse = np.sqrt(np.square(np.subtract(y, model.predict(X))).mean())
    except:
        rmse = np.nan
    return [state, ind_vars, rmse] 
"""

reg_frac(state, ind_vars) = (py"reg_frac"(state::Char, ind_vars::Array{Any}))

However, when I run this, I don't expect the results to be NaN. I think it is working but I am missing something.

reg_frac('b', Any[i for i in 2:3])

  0.000244 seconds (249 allocations: 7.953 KiB)
3-element Array{Any,1}:
    'b'
    [2, 3]
 NaN

Any help is appreciated.


Solution

  • In Python code you have strs while in Julia code you have Chars - it is not the same.

    Python:

    >>> type('a')
    <class 'str'>
    

    Julia:

    julia> typeof('a')
    Char
    

    Hence your comparisons do not work. Your function could look like this:

    reg_frac(state, ind_vars) = (py"reg_frac"(state::String, ind_vars::Array{Any}))
    

    And now:

    julia> reg_frac("b", Any[i for i in 2:3])
    3-element Array{Any,1}:
      "b"
      [2, 3]
     0.2853707270515166
    

    However, I recommed using Vector{Float64} that in PyCall gets converted in-flight into a numpy vector rather than using Vector{Any} so looks like your code still could be improved (depending on what you are actually planning to do).