pythonreddit

Python and Reddit APIs: my code doesn't give back all results from the huge reddit database. Why?


I am practicing on extracting data from Reddit. I have tried to obtain the 20 most relevant communities that contain the word "sport". There are hundreds of them, but my API request gave me back not even 20 of them. Do you know why?

Here is the code:

parameters = {'query': 'sport', "limit":20, 'sort':'relevance'}

res = requests.get("https://oauth.reddit.com//api/search_reddit_names", headers=headers, params=parameters)
res.json()

output:

{'names': ['sports',
  'sportsbook',
  'sportsarefun',
  'sportsbetting',
  'SportsFR',
  'sportscards',
  'SportsPorn',
  'SportingCP',
  'Sports_Women']}

Solution

  • As seen in the Reddit API Docs,

    api doc

    GET /api/search_reddit_names - List subreddit names that begin with a query string.

    This difference between "begins with" and "contains" is probably the cause of your problem.

    From a quick glance at the API, I haven't found a suitable function for your needs.

    EDIT:

    you could use a different API, specifically for search.

    import requests
    
    # Set the URL endpoint for the API request
    url = "https://www.reddit.com/subreddits/search.json"
    
    # Set the parameters for the API request
    
    search_param = "sport"
    
    params = {
        "q": search_param,  # Search query
        "limit": 100,  # Maximum number of results to retrieve
        "type": "sr"   # Limit search to subreddits
    }
    
    # Send the API request
    response = requests.get(url, params=params)
    
    # Check if the request was successful (status code 200)
    if response.status_code == 200:
        # Extract the JSON data from the response
        data = response.json()
    
        # Extract the list of subreddits from the JSON data
        subreddits = [item["data"]["display_name"] for item in data["data"]["children"]]
    
        # Print the list of subreddits
        print(f"Subreddits containing '{search_param}':")
        for subreddit in subreddits:
            print(subreddit)
    else:
        print("An error occurred while retrieving the data. Status code:", response.status_code)
    

    That outputs:

    Subreddits containing 'sport':
    sport
    soccer
    sports
    BroncoSport
    AskReddit
    nba
    sportsbetting
    formula1
    baseball
    leagueoflegends
    sportsbook
    teenagers
    Dualsport
    sportvids
    SportWagon
    nfl
    CFB
    OriginSport
    sportsarefun
    hockey
    granturismo
    weightlifting
    football
    unpopularopinion
    MMA
    GranTurismoSport
    ...
    

    As you can see, many don't contain 'sport', so you're gonna have to filter some more.

    E.g., in the for subreddit in subreddits:

    if search_param in subreddit:
    

    or in the loop comprehension beforehand.