pymongojupyter-labbson

pymongo BSON public APIs not imported in Jupyter Lab


I'm working on a project that requires me to import a BSON dataset into Pandas. I'm trying to use the bson.decode_all method to do so.

I have a conda environment named "tf" with pymongo installed. This is my current script in Jupyter that I open using (tf) PS C:\Users\ashka\Desktop\spring 23\RSRC 4033\cybersecurity tweets\jupyter> jupyter lab:

import os
import pprint
from platform import python_version

import pandas as pd
import bson
import tensorflow as tf

print(python_version())
print(bson.__file__)
print(bson.__all__)

gives:

3.9.16
C:\Users\ashka\anaconda3\envs\tf\lib\site-packages\bson\__init__.py
['loads', 'dumps']

and

# preprocessing
data_file = "../dataset/threat/tweets.bson"
with open(data_file, 'rb') as f:
    dataset_dict = bson.decode_all(f.read())
pprint.pprint(dataset_dict)

gives:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[35], line 4
      2 data_file = "../dataset/threat/tweets.bson"
      3 with open(data_file, 'rb') as f:
----> 4     dataset_dict = bson.decode_all(f.read())
      5 pprint.pprint(dataset_dict)

AttributeError: module 'bson' has no attribute 'decode_all'

As you can see, the only attributes of bson that are importing are "loads" and "dumps". Funnily enough, this doesn't seem to be a problem outside of Jupyter with the SAME environment. I created a new .py file in the same directory and ran it using (tf) PS C:\Users\ashka\Desktop\spring 23\RSRC 4033\cybersecurity tweets\jupyter> python .\bsontest.py:

from platform import python_version

import bson

print(python_version())
print(bson.__file__)
print(bson.__all__)

and it gave:

3.9.16
C:\Users\ashka\anaconda3\envs\tf\lib\site-packages\bson\__init__.py
['ALL_UUID_SUBTYPES', 'CSHARP_LEGACY', 'JAVA_LEGACY', 'OLD_UUID_SUBTYPE', 'STANDARD', 'UUID_SUBTYPE', 'Binary', 'UuidRepresentation', 'Code', 'DEFAULT_CODEC_OPTIONS', 'CodecOptions', 'DBRef', 'Decimal128', 'InvalidBSON', 'InvalidDocument', 'InvalidStringData', 'Int64', 'MaxKey', 'MinKey', 'ObjectId', 'Regex', 'RE_TYPE', 'SON', 'Timestamp', 'utc', 'EPOCH_AWARE', 'EPOCH_NAIVE', 'BSONNUM', 'BSONSTR', 'BSONOBJ', 'BSONARR', 'BSONBIN', 'BSONUND', 'BSONOID', 'BSONBOO', 'BSONDAT', 'BSONNUL', 'BSONRGX', 'BSONREF', 'BSONCOD', 'BSONSYM', 'BSONCWS', 'BSONINT', 'BSONTIM', 'BSONLON', 'BSONDEC', 'BSONMIN', 'BSONMAX', 'get_data_and_view', 'gen_list_name', 'encode', 'decode', 'decode_all', 'decode_iter', 'decode_file_iter', 'is_valid', 'BSON', 'has_c', 'DatetimeConversion', 'DatetimeMS']

which contains everything that I need!!!

Why does the bson module act differently in Jupyter and how can I fix it?


Solution

  • Although I am not sure what exactly was causing the issue, I managed to fix it by reinstalling all the packages (and making sure to avoid the bson package and use pymongo) and restarting Jupyter Lab.