I am trying to add a list of dictionaries (whose name is stanzanerlist
) like the following:
stanzanerlist = [{
"text": "Harry Potter",
"type": "PER",
"start_char": 141,
"end_char": 153
}, {
"text": "Hogwarts",
"type": "LOC",
"start_char": 405,
"end_char": 413
}, {
"text": "JK Rowling",
"type": "PER",
"start_char": 505,
"end_char": 515
}]
as a field in a MongoDB document in a collection.
I am inserting the whole document as follows with stanzanerlist
as the last item in mongodocument
:
mongodocument = {
"_id": urlid,
"source": sourcename,
"stanzadoc": stanzadoc.to_serialized(),
"stanzaver": stanzaver,
# "timestamp": datetime.now(tzinfo),
"timestamp": datetime.now(
tz=pytz.timezone(cfgdata["timezone"]["name"])
),
"stanzanerlist": stanzanerlist,
}
try:
mdbrc = mdbcoll.insert_one(
mongodocument
) # insert fails if URL/_ID already exists
return mdbrc
except pymongo.errors.DuplicateKeyError:
# manage the record update
print(f"Article {urlid} already exists!")
but while all other fields work well, the addition of stanzanerlist
gives the following error:
cannot encode object: {
"text": "Harry Potter",
"type": "PER",
"start_char": 141,
"end_char": 153
}, of type: <class 'stanza.models.common.doc.Span'>
and I'm not able to understand if and how I could achieve that addition.
pymongo
doesn't natively know how to convert <class 'stanza.models.common.doc.Span'>
types to an acceptable BSON data type.
You could "teach" pymongo
how to do the proper conversion/encoding using a custom bson.codec_options.TypeEncoder
and then pymongo
would automatically perform type conversions as it does for other types. Or, you could do the conversion/encoding each time yourself before storing the Span
in your MongoDB collection.
Fortunately, Stanford NLP Stanza has convenience methods for type conversions. <class 'stanza.models.common.doc.Span'>
has a to_dict
method that will convert the type to type Dict
, which pymongo
does know how to encode.
So, in your code snippet, you could change the mongodocument
assignment of "stanzanerlist"
to:
"stanzanerlist": [stan.to_dict() for stan in stanzanerlist]
... and then each <class 'stanza.models.common.doc.Span'>
will be converted to a Dict
and pymongo
should be able to store it.