[SOLVED] I need advice on Sentimental analysis and ML

I need advice on Sentimental analysis and ML

I'm using snscrape to scrape tweets about EURUSD and combining machine learning to predict if the price of of EURUSD will go up or down the following day using sentiments of those tweets that have been scraped. The problem I have with this project is how I would plan and structure my code, like for example should I use those tweets as features for the ML model or should I average the the sentiments of those tweets for that particular day and use them as features for the model to use. I will appreciate any advice from people that have worked on similar projects like these.

Solution

Provided that you have the tokens necessary you can test to do something like this:

import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score


tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('EURUSD').get_items()):
    if i > 1000:
        break
    tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=['date', 'text'])
df_tweets.to_csv('tweets.csv', index=False)


def get_sentiment(text):
    sentiment = TextBlob(text).sentiment.polarity
    if sentiment > 0:
        return 1
    elif sentiment < 0:
        return -1
    else:
        return 0
df_tweets['sentiment'] = df_tweets['text'].apply(get_sentiment)


df_features = df_tweets.groupby('date').agg({'sentiment': 'mean'})
df_features.reset_index(inplace=True)


X_train, X_test, y_train, y_test = train_test_split(df_features.drop('date', axis=1), df_features['price_direction'], test_size=0.2, random_state=42)


rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)


y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)


last_day_sentiment = df_features.iloc[-1]['sentiment']
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print('Next day direction:', next_day_direction)