pythonrreticulate

Import Python logistic regression results into R using


Because Python uses multiple cores and is more efficient with memory, I would like to run logistic regression in Python, but then import the summary table to R (as in the end is used in a Quarto document).

My reprex is below. I fail when I try to import the outcome from Python. Please help. I would like to omit a solution, where I save the outcome from Python as .csv and read it in from R again.

Reprex:

## libraries
require(tidyverse)
require(broom)
require(gt)
require(reticulate)
require(ISLR)

## creating a .csv example data set
## The data contains 1070 purchases where the customer either purchased Citrus Hill 
## or Minute Maid Orange Juice. A number of characteristics of the customer and product are recorded.
test_data <- ISLR::OJ %>%
        select(Purchase, PriceCH, SpecialCH, Store7)%>% 
        mutate(Purchase=if_else(Purchase=='CH',1,0))|>
        as_tibble()|>
        write_csv('data_test.csv')

## now comes the Python code
py_model <- py_run_string("
import pandas as pd
import statsmodels.api as sm

# Import the dataset
data = pd.read_csv('data_test.csv')

# Set up the independent and dependent variables
y = data['Purchase']
X = data[['PriceCH', 'SpecialCH', 'Store7']]

# Set up factor variables
X = pd.get_dummies(X, columns=['Store7'], drop_first=True)

# Add a constant term to the independent variables
X = sm.add_constant(X)

# Fit the logistic regression model
model = sm.Logit(y, X).fit()

# Show summary
print(model.summary())
")


## here I cannot select the model or its summary
## HELP HERE
py_result <- py_model$model.summary()

py_result|>
        gt()

## expected outcom similar to needed
glm(Purchase~PriceCH+SpecialCH+Store7,
    data=test_data,
    family = binomial())%>%
        tidy()|>
        gt()

Solution

  • You can take the model summary and convert it to a pandas dataframe, and then feed that result to gt()

    That is, add the following to your python string:

    result = model.summary().tables[1].as_html()
    result = pd.read_html(result, header=0, index_col=0)[0]
    

    You can then do this:

    gt(py$result)