I'm working on a project that requires me to import a BSON dataset into Pandas. I'm trying to use the bson.decode_all
method to do so.
I have a conda environment named "tf" with pymongo
installed. This is my current script in Jupyter that I open using (tf) PS C:\Users\ashka\Desktop\spring 23\RSRC 4033\cybersecurity tweets\jupyter> jupyter lab
:
import os
import pprint
from platform import python_version
import pandas as pd
import bson
import tensorflow as tf
print(python_version())
print(bson.__file__)
print(bson.__all__)
gives:
3.9.16
C:\Users\ashka\anaconda3\envs\tf\lib\site-packages\bson\__init__.py
['loads', 'dumps']
and
# preprocessing
data_file = "../dataset/threat/tweets.bson"
with open(data_file, 'rb') as f:
dataset_dict = bson.decode_all(f.read())
pprint.pprint(dataset_dict)
gives:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[35], line 4
2 data_file = "../dataset/threat/tweets.bson"
3 with open(data_file, 'rb') as f:
----> 4 dataset_dict = bson.decode_all(f.read())
5 pprint.pprint(dataset_dict)
AttributeError: module 'bson' has no attribute 'decode_all'
As you can see, the only attributes of bson that are importing are "loads" and "dumps". Funnily enough, this doesn't seem to be a problem outside of Jupyter with the SAME environment. I created a new .py file in the same directory and ran it using (tf) PS C:\Users\ashka\Desktop\spring 23\RSRC 4033\cybersecurity tweets\jupyter> python .\bsontest.py
:
from platform import python_version
import bson
print(python_version())
print(bson.__file__)
print(bson.__all__)
and it gave:
3.9.16
C:\Users\ashka\anaconda3\envs\tf\lib\site-packages\bson\__init__.py
['ALL_UUID_SUBTYPES', 'CSHARP_LEGACY', 'JAVA_LEGACY', 'OLD_UUID_SUBTYPE', 'STANDARD', 'UUID_SUBTYPE', 'Binary', 'UuidRepresentation', 'Code', 'DEFAULT_CODEC_OPTIONS', 'CodecOptions', 'DBRef', 'Decimal128', 'InvalidBSON', 'InvalidDocument', 'InvalidStringData', 'Int64', 'MaxKey', 'MinKey', 'ObjectId', 'Regex', 'RE_TYPE', 'SON', 'Timestamp', 'utc', 'EPOCH_AWARE', 'EPOCH_NAIVE', 'BSONNUM', 'BSONSTR', 'BSONOBJ', 'BSONARR', 'BSONBIN', 'BSONUND', 'BSONOID', 'BSONBOO', 'BSONDAT', 'BSONNUL', 'BSONRGX', 'BSONREF', 'BSONCOD', 'BSONSYM', 'BSONCWS', 'BSONINT', 'BSONTIM', 'BSONLON', 'BSONDEC', 'BSONMIN', 'BSONMAX', 'get_data_and_view', 'gen_list_name', 'encode', 'decode', 'decode_all', 'decode_iter', 'decode_file_iter', 'is_valid', 'BSON', 'has_c', 'DatetimeConversion', 'DatetimeMS']
which contains everything that I need!!!
Why does the bson
module act differently in Jupyter and how can I fix it?
Although I am not sure what exactly was causing the issue, I managed to fix it by reinstalling all the packages (and making sure to avoid the bson
package and use pymongo
) and restarting Jupyter Lab.