pythonpandasmachine-learningsentiment-analysis

I need advice on Sentimental analysis and ML


I'm using snscrape to scrape tweets about EURUSD and combining machine learning to predict if the price of of EURUSD will go up or down the following day using sentiments of those tweets that have been scraped. The problem I have with this project is how I would plan and structure my code, like for example should I use those tweets as features for the ML model or should I average the the sentiments of those tweets for that particular day and use them as features for the model to use. I will appreciate any advice from people that have worked on similar projects like these.


Solution

  • Provided that you have the tokens necessary you can test to do something like this:

    import pandas as pd
    import snscrape.modules.twitter as sntwitter
    from textblob import TextBlob
    from sklearn.model_selection import train_test_split
    from sklearn.ensemble import RandomForestClassifier
    from sklearn.metrics import accuracy_score
    
    
    tweets = []
    for i, tweet in enumerate(sntwitter.TwitterSearchScraper('EURUSD').get_items()):
        if i > 1000:
            break
        tweets.append([tweet.date, tweet.content])
    df_tweets = pd.DataFrame(tweets, columns=['date', 'text'])
    df_tweets.to_csv('tweets.csv', index=False)
    
    
    def get_sentiment(text):
        sentiment = TextBlob(text).sentiment.polarity
        if sentiment > 0:
            return 1
        elif sentiment < 0:
            return -1
        else:
            return 0
    df_tweets['sentiment'] = df_tweets['text'].apply(get_sentiment)
    
    
    df_features = df_tweets.groupby('date').agg({'sentiment': 'mean'})
    df_features.reset_index(inplace=True)
    
    
    X_train, X_test, y_train, y_test = train_test_split(df_features.drop('date', axis=1), df_features['price_direction'], test_size=0.2, random_state=42)
    
    
    rfc = RandomForestClassifier(n_estimators=100, random_state=42)
    rfc.fit(X_train, y_train)
    
    
    y_pred = rfc.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print('Accuracy:', accuracy)
    
    
    last_day_sentiment = df_features.iloc[-1]['sentiment']
    next_day_direction = rfc.predict([[last_day_sentiment]])[0]
    print('Next day direction:', next_day_direction)