I'm trying to crawl comments of a given videoId with youtube API. But the number of crawled comments is less than its actual number. Do you have any idea about this? My code is like the below.
from googleapiclient.discovery import build
from typing import List
def get_comments(api, video_id: str, fields: str)-> List[List[str]]:
comments = list()
response = api.commentThreads().list(part='snippet', fields=fields, videoId=video_id, maxResults=50).execute()
all_comment_crawled = True
while all_comment_crawled:
for item in response['items']:
comment = item['snippet']['topLevelComment']['snippet']
comments.append([comment['textOriginal'], comment['likeCount']])
if 'nextPageToken' in response:
response = api.commentThreads().list(part='snippet', videoId=video_id, fields=fields, pageToken=response['nextPageToken'], maxResults=50).execute()
else:
all_comment_crawled = False
return comments
api_key = "MY_API_KEY"
api_obj = build('youtube', 'v3', developerKey=api_key)
video_id = 'fgSvGLxanCo'
fields = 'items(snippet(totalReplyCount, topLevelComment(snippet(textOriginal, likeCount)))), nextPageToken'
comments = get_comments(api_obj, video_id, fields)
print(len(comments)) # returns 1,945 actually is over 2,000
There is a trap (and I announce another trap that it is easy to fall into) when listing comments on a YouTube video:
commentThreads
are given up to 5 replies.
If the comment have more than 5 replies you have to use Comments: list to list them all.An example of Python script treating all comments of a video is available here.