In my project I analyze the questions of a given exam. Let's say each exam has 10 questions.
For each question I compute some stuff and save it, using the constructor method of class QuestionData
(defined in file question_data.py
). Each QuestionData
object has a pandas
dataframe, some dicts, some float attributes and a numpy
array.
Next, the exam analysis is done using class ExamData
- which also has some simple attributes, some dicts and a list of all the QuestionData
objects.
Eventually, what I need to do is to return the ExamData
object as JSON so it can be sent back as a response.
I'm working with conda and python 3.12.4. I thought it's a sensible move to start with serializing a single QuestionData
object. Tried using the __dict__
trick explained here, but it failed with
AttributeError: 'weakref.ReferenceType' object has no attribute '__dict__'. Did you mean: '__dir__'?
Then I tried installing orjson using conda install orjson
, but it refuses to work due to SSL:
>conda install orjson
Collecting package metadata (current_repodata.json): failed
CondaSSLError: OpenSSL appears to be unavailable on this machine. OpenSSL is required to
download and install packages.
Exception: HTTPSConnectionPool(host='repo.anaconda.com', port=443): Max retries exceeded with url: /pkgs/main/win-64/current_repodata.json (Caused by SSLError("Can't connect to HTTPS URL because the SSL module is not available."))
The above is after I let it update openssl
from 3.0.14-h827c3e9_0 --> 3.0.15-h827c3e9_0
, which was a requirement for the installation.
orjson
?I have plenty of experience with various programming languages, with OOP and with JSON but I'm new to python so please tread lightly.
code:
question_data.py
:
import pandas as pd
import numpy as np
import scipy.stats as sps
import string
class QuestionData:
def __init__(self, data, item: str):
options_list = ...
#df for answer analysis
self._options_data = pd.DataFrame(index = options_list)
#percent chosen column
self._options_data["pct"] = ...
#mean ability for chosen answer
self._options_data["theta_mean"] = ...
#ability sd for chosen answer
self._options_data["theta_sd"] = ...
#corr of chosen answer with ability
self._options_data["theta_corr"] = ...
#item delta
self._delta = ...
#biserial of key with theta
self._key_biserial = ...
#initial IRT params. To be done later
self._IRT_params = {"a": 1, "b": 0, "c": 0}
self._IRT_info = {"theta_MI": 0, "info_theta_MI": 0}
#response times vector
self._response_time = data._response_times[str(item)].to_numpy()
exam_data.py
:
from question_data import QuestionData
from datetime import datetime
from dateutil import relativedelta
class ExamData:
_quantile_list = [5, 25, 50, 75, 95]
_date_format = '%d/%m/%Y'
def __init__(self, data):
fromDate = datetime.strptime(data._details["fromDate"], self._date_format)
toDate = datetime.strptime(data._details["toDate"], self._date_format)
delta = relativedelta.relativedelta(toDate, fromDate)
self._report_duration ={"years": delta.years, "months": delta.months, "days": delta.days}
self._exposure_num = ...
self._total_times = data._response_times.sum(axis = 1)
self._time_quantiles = dict(zip(self._quantile_list,
[self._total_times.quantile(q/100) for q in self._quantile_list]))
self._q_list = ...
self._q_data = dict(zip(self._q_list,
[QuestionData(data, q) for q in self._q_list]))
Examples of what I want to get-
QuestionData:
{
"_options_data": {"pct": {...}, "theta_mean": {...}, ...}, //<pandas df serialization>
"_delta": 10,
"_IRT_info": {"theta_MI": 0, "info_theta_MI": 0},
"_response_time": [25.5, 41.6, 30.9, ...],
...
}
ExamData:
{
"_report_duration": {"years": 0, "months": 0, "days": 17},
"_exposure_num": 150,
"_time_quantiles": {"5": 117.89, "25": 167.15, "50": 224.1, ...},
"_total_times": {"id1": 120.3, "id2": 149.9, ...}, //<pandas series serialization>
"_q_data": {"Q1": <QuestionData Object>, "Q2": <QuestionData Object>, ...},
...
}
Eventually the simplest solution was to write my own serializer, just a simple extension to this post.
import json
import numpy as np
import pandas as pd
from question_data import QuestionData
from exam_data import ExamData
# JSON serializer class so we can easily handle numpy+pandas objects
class CustomTypeEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.generic):
return obj.item()
elif ((isinstance(obj, np.ndarray)) or (isinstance(obj, pd.Series))):
return obj.tolist()
elif isinstance(obj, pd.DataFrame):
return obj.T.to_dict()
elif ((isinstance(obj, QuestionData)) or (isinstance(obj, ExamData))):
return obj.__dict__
elif hasattr(obj, 'to_json'):
return obj.to_json(orient='records')
return json.JSONEncoder.default(self, obj)
Then, when needed, using it as follows:
import json
from question_data import QuestionData
from exam_data import ExamData
data = ...
ed = ExamData(data)
q1d = ed._q_data["q1"] #QuestionData object
json_str1 = json.dumps(ed, cls=CustomTypeEncoder) #this works perfectly
json_str2 = json.dumps(q1d, cls=CustomTypeEncoder) #this too