pythontwittertweepytwitter-search

Why some tweets are in search api and not in streaming api and vice versa


I have a script which stores incoming tweets for a phrase (e.g. "python") into database table "A" using twitter streaming api. Later, another script searches the same phrase using twitter search api and stores results into table "B". My question is why there are some tweets in "A" that are not in "B" and vice versa.

I can think of one reason to have tweets in "B" and not in "A":

"A" only contains tweets that are posted after streaming api started while search api returns results from the last week. If streaming api has been running for more than a week, then there must not any tweet in "B" that is not in "A".

I know two reasons to have some tweets in "A" and not in "B":

  1. search API only returns only results from the last week while streaming api returns everything
  2. search API returns only a portion of results and not all as its focus is not on completeness.

I'd like to make sure if I got it correct or not.


Solution

  • For "B" not in "A" you are correct. A big indication of that is from the Search API link you included:

    It allows queries against the indices of recent or popular Tweets...

    For "A" not in "B" you're correct as well but with minor mistakes.

    1. The Streaming API will not return everything, it will only return 1% of the total tweets. The 1% filter is done internally in Twitter and there has not been any indication on how it's done. There has been an annoucement not long ago about fixing the 1% to make a true 1%, but I can't seem to find the link where I read it at.
    2. With the Streaming API you're also impaired by (more commonly):
      • Public stream limit (reaching 1%)
      • Stall warnings (warning)

    Few others depending on your use https://dev.twitter.com/streaming/overview/messages-types