I try to extract sentiment for very large dataset that consists of more than 606912 instances on Jupyter notebook, but it takes several days and interrupted this my code:
from camel_tools.sentiment import SentimentAnalyzer
sentiment_dataset=pd.DataFrame()
full_text=[]
sa = SentimentAnalyzer("CAMeL-Lab/bert-base-arabic-camelbert-da-sentiment")
full_text = dataset['clean_text'].tolist()
iter_len = len(full_text)
for e in range(iter_len):
print("Iterate through list:",full_text[e])
s = sa.predict(full_text[e])
sentiments.insert(e, s)
print("Iterate through sentiments list:",sentiments[e])
dataset['sentiments']=pd.DataFrame(sentiments)
can someone help me to solve this issue or speed up the operations.
It is not too efficient to proceed one big source dataset in one python instance. My recommendation are:
Version 1. - use our own parallelization
Version 2. - use existing solution for parallelization