Because Python uses multiple cores and is more efficient with memory, I would like to run logistic regression in Python, but then import the summary table to R (as in the end is used in a Quarto document).
My reprex is below. I fail when I try to import the outcome from Python. Please help. I would like to omit a solution, where I save the outcome from Python as .csv and read it in from R again.
Reprex:
## libraries
require(tidyverse)
require(broom)
require(gt)
require(reticulate)
require(ISLR)
## creating a .csv example data set
## The data contains 1070 purchases where the customer either purchased Citrus Hill
## or Minute Maid Orange Juice. A number of characteristics of the customer and product are recorded.
test_data <- ISLR::OJ %>%
select(Purchase, PriceCH, SpecialCH, Store7)%>%
mutate(Purchase=if_else(Purchase=='CH',1,0))|>
as_tibble()|>
write_csv('data_test.csv')
## now comes the Python code
py_model <- py_run_string("
import pandas as pd
import statsmodels.api as sm
# Import the dataset
data = pd.read_csv('data_test.csv')
# Set up the independent and dependent variables
y = data['Purchase']
X = data[['PriceCH', 'SpecialCH', 'Store7']]
# Set up factor variables
X = pd.get_dummies(X, columns=['Store7'], drop_first=True)
# Add a constant term to the independent variables
X = sm.add_constant(X)
# Fit the logistic regression model
model = sm.Logit(y, X).fit()
# Show summary
print(model.summary())
")
## here I cannot select the model or its summary
## HELP HERE
py_result <- py_model$model.summary()
py_result|>
gt()
## expected outcom similar to needed
glm(Purchase~PriceCH+SpecialCH+Store7,
data=test_data,
family = binomial())%>%
tidy()|>
gt()
You can take the model summary and convert it to a pandas dataframe, and then feed that result to gt()
That is, add the following to your python string:
result = model.summary().tables[1].as_html()
result = pd.read_html(result, header=0, index_col=0)[0]
You can then do this:
gt(py$result)