I've searched around for a solution to this problem, but can't seem to find any.
I have pulled tweets from Danish MP's using the rtweet package to acces the Twitter API. I used get_gimeline() to pull the data.
get_timeline(c(politikere), n = 100, parse = TRUE, since_id = "1315756184247435264", max_id = "1333904927559725056", type = "recent") %>%
dplyr::filter(created_at > "2020-10-25" & created_at <="2020-12-01")
Now i would like to categorize the different Twitter users by their Party ID, in order to do some pary specific sentiment analysis. From the API call you get all sorts of information in to a tibble dataframe e.g "user id" spanning to around 90 different variables.
user_id | status_id | created_at | screen_name | text | description | ...x_i |
---|
The point is that I want to create a new column in the dataset named party_id and I want to assign a new value onto each user according to the party they belong to: I would want to create a column which identifies the party affilitation. It should look something like this:
user_id | status_id | created_at | screen_name | text | description | party_id |
---|---|---|---|---|---|---|
1234346 | 683901040 | 2020-11-23 | larsen_mc | gg.. | Danish MP.. | Conservatives |
I looked at the dplyr package but I can't quite get my head around how to assign the same value to different rows that does not share the same identifiers. If e.g all the conservative MP's shared the same status_id it would be a somewhat easier task by using inner_join, but every user has it's own unique identifier in this case (of course).
Here is the example_data
structure(list(user_id = c("2373406198", "4360080437", "3512158337",
"746909257", "36910691", "58550919", "279986859", "1225930531",
"26263965", "2222188479"), status_id = c("1354094283230474241",
"1354707826317393922", "1354391556900483072", "1347169543853117444",
"1354866447735005185", "1332633849659088897", "1355522537669734401",
"1355554489361686530", "1329028442105458688", "1330791375449829376"
), created_at = structure(c(1611676209, 1611822489, 1611747085,
1610025223, 1611860307, 1606559643, 1612016732, 1612024349, 1605700047,
1606120363), tzone = "UTC", class = c("POSIXct", "POSIXt")),
screen_name = c("jacobmark_sf", "RuneLundEL", "kimvalentinDK",
"TommyPetersenDK", "JuulMona", "Blixt22", "JanEJoergensen",
"RasmusJarlov", "StemLAURITZEN", "olebirkolesen")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
Hopes this makes sense
Best, Gustav
Okay - I found a solution! After making the identifier manually (called Parti_id) I used the tidyverse package and used left_join():
poldata <- poldata %>%
select(screen_name,Parti_id)
FTtweets <- left_join(tmlpol, poldata, by = "screen_name")