I am having some issues inserting into MongoDB via FastAPI.
The below code works as expected. Notice how the response
variable has not been used in response_to_mongo()
.
The model
is an sklearn ElasticNet model.
app = FastAPI()
def response_to_mongo(r: dict):
client = pymongo.MongoClient("mongodb://mongo:27017")
db = client["models"]
model_collection = db["example-model"]
model_collection.insert_one(r)
@app.post("/predict")
async def predict_model(features: List[float]):
prediction = model.predict(
pd.DataFrame(
[features],
columns=model.feature_names_in_,
)
)
response = {"predictions": prediction.tolist()}
response_to_mongo(
{"predictions": prediction.tolist()},
)
return response
However when I write predict_model()
like this and pass the response
variable to response_to_mongo()
:
@app.post("/predict")
async def predict_model(features: List[float]):
prediction = model.predict(
pd.DataFrame(
[features],
columns=model.feature_names_in_,
)
)
response = {"predictions": prediction.tolist()}
response_to_mongo(
response,
)
return response
I get an error stating that:
TypeError: 'ObjectId' object is not iterable
From my reading, it seems that this is due to BSON/JSON issues between FastAPI and Mongo. However, why does it work in the first case when I do not use a variable? Is this due to the asynchronous nature of FastAPI?
As per the documentation:
When a document is inserted a special key,
"_id"
, is automatically added if the document doesn’t already contain an"_id"
key. The value of"_id"
must be unique across the collection.insert_one()
returns an instance of InsertOneResult. For more information on "_id", see the documentation on _id.
Thus, in the second case of the example you provided, when you pass the dictionary to the insert_one()
function, Pymongo will add to your dictionary the unique identifier (i.e., ObjectId
) necessary to retrieve the data from the database; and hence, when returning the response from the endpoint, the ObjectId
fails getting serialized—since, as described in this answer in detail, FastAPI, by default, will automatically convert that return value into JSON-compatible data using the jsonable_encoder
(to ensure that objects that are not serializable are converted to a str
), and then return a JSONResponse
, which uses the standard json
library to serialize the data.
Use the approach demonstrated here, by having the ObjectId
converted to str
by default, and hence, you can return the response
as usual inside your endpoint.
# place these at the top of your .py file
import pydantic
from bson import ObjectId
pydantic.json.ENCODERS_BY_TYPE[ObjectId]=str
return response # as usual
Dump the loaded BSON
to valid JSON
string and then reload it as dict
, as described here and here.
from bson import json_util
import json
response = json.loads(json_util.dumps(response))
return response
Define a custom JSONEncoder
, as described here, to convert the ObjectId
into str
:
import json
from bson import ObjectId
class JSONEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, ObjectId):
return str(o)
return json.JSONEncoder.default(self, o)
response = JSONEncoder().encode(response)
return response
You can have a separate output model without the 'ObjectId' (_id
) field, as described in the documentation. You can declare the model used for the response with the parameter response_model
in the decorator of your endpoint. Example:
from pydantic import BaseModel
class ResponseBody(BaseModel):
name: str
age: int
@app.get('/', response_model=ResponseBody)
def main():
# response sample
response = {'_id': ObjectId('53ad61aa06998f07cee687c3'), 'name': 'John', 'age': '25'}
return response
Remove the "_id"
entry from the response
dictionary before returning it (see here on how to remove a key from a dict
):
response.pop('_id', None)
return response