I am trying to search a list of multiple key words in Reddit posts in all subreddits. At the moment my dataframe is showing just results for the last word in the keyword_list list which is 'maritime history'. Does anyone know how I can get this to search all of the words?
# Creating keyword list
keyword_list = ['maritime heritage', 'shipwreck', 'shipwrecks', 'coastal archeaology', 'maritime archaeology',
'maritime ecologies', 'marine archeaology', 'marine heritage', 'cultural marine heritage', 'marine culutral haritage', 'maritime history']
# Searching in all subreddits
all = reddit.subreddit("all")
df = pd.DataFrame() # creating dataframe for displaying scraped data
# creating lists for storing scraped data
titles=[]
scores=[]
ids=[]
# looping over posts and scraping it
for submission in all.search(keyword_list, limit=None):
titles.append(submission.title)
scores.append(submission.score) #upvotes
ids.append(submission.id)
df['Title'] = titles
df['Id'] = ids
df['Upvotes'] = scores #upvotes
print(df.shape)
df.head(10)
Pretty straightforward change. You’re wanting to search for multiple things, but you’re not wanting to make multiple searches (queries).
Your keywords should be something like:
# Creating keyword list
keyword_list = '"maritime heritage", "shipwreck", "shipwrecks", "coastal archaeology", "maritime archaeology", "maritime ecologies", "marine archaeology", "marine heritage", "cultural marine heritage", "marine cultural heritage", "maritime history"'
And that will pass your single query, containing multiple keywords, and return the posts you want to process.
As long as your current post-processing is working correctly for the ‘maritime history’ responses then you shouldn’t have any issue processing the responses to the new query.