youtube-apiyoutube-data-apiyoutube-livestreaming-api

list() returns different number of comments from a video


I'm trying to crawl comments of a given videoId with youtube API. But the number of crawled comments is less than its actual number. Do you have any idea about this? My code is like the below.

from googleapiclient.discovery import build
from typing import List

def get_comments(api, video_id: str, fields: str)-> List[List[str]]:
    comments = list()
    response = api.commentThreads().list(part='snippet', fields=fields, videoId=video_id, maxResults=50).execute()
    
    all_comment_crawled = True
    while all_comment_crawled:
        for item in response['items']:
            comment = item['snippet']['topLevelComment']['snippet']
            comments.append([comment['textOriginal'], comment['likeCount']])

        if 'nextPageToken' in response:
            response = api.commentThreads().list(part='snippet', videoId=video_id, fields=fields, pageToken=response['nextPageToken'], maxResults=50).execute()
        else:
            all_comment_crawled = False
         
    return comments

api_key = "MY_API_KEY"
api_obj = build('youtube', 'v3', developerKey=api_key)

video_id = 'fgSvGLxanCo'
fields = 'items(snippet(totalReplyCount, topLevelComment(snippet(textOriginal, likeCount)))), nextPageToken'

comments = get_comments(api_obj, video_id, fields)
print(len(comments)) # returns 1,945 actually is over 2,000

Solution

  • There is a trap (and I announce another trap that it is easy to fall into) when listing comments on a YouTube video:

    1. The comments count on YouTube counts all (not filtered) comments. The replies are included in this count and you haven't considered them in your algorithm. Have a look to CommentThreads: list
    2. The replies when using commentThreads are given up to 5 replies. If the comment have more than 5 replies you have to use Comments: list to list them all.

    An example of Python script treating all comments of a video is available here.