telegramtelegram-botpython-telegram-botpy-telegram-bot-api

Telegram Bot: How to Download Multiple Photos from the Same media_group_id in a Single Update?


This question is a followup from the following question asked in a previous post.

This is my code block to download all photos from a telegram bot within the same media_group_id context.

async def download_photo(update: Update, context: ContextTypes.DEFAULT_TYPE):
    """Download the highest resolution photo sent in each message update."""
    photo = update.message.photo
    media_paths = []

    if photo:
        # The last item in the photo array is the highest resolution version
        file_id = photo[-1].file_id
        logger.info(f'Processing photo with file_id: {file_id}')

        try:
            file_info = await context.bot.get_file(file_id)
            download_url = file_info.file_path  # This is already the full path
            print("download path")
            print(download_url)
            # Create a unique filename for each photo, using the message_id and unique file_id
            file_name = f'photo_{update.message.message_id}_{photo[-1].file_unique_id}.jpg'

            # Log the full download URL
            full_download_url = f"{download_url}"
            logger.info(f'Downloading photo from URL: {full_download_url}')

            # Download the photo using the download URL
            response = requests.get(full_download_url)
            response.raise_for_status()  # Check if the download was successful
            with open(file_name, 'wb') as f:
                f.write(response.content)
            media_paths.append(file_name)
            logger.info(f'Downloaded photo successfully with file_id: {file_id}')
        except Exception as e:
            logger.error(f'Error downloading photo with file_id {file_id}: {e}')

    logger.info(f'All downloaded media paths: {media_paths}')
    return media_paths

I was able to download 1 picture of the 2 images which i had sent, below is the log

INFO:__main__:Downloading photo from URL: https://api.telegram.org/file/bot12800363:AAF7LWpZz79LOVevA6OmFUAMA6SjVn2esns/photos/file_42.jpg
download path
https://api.telegram.org/file/bot1200363:AAF7LWpZz79LMA6SjVn2esns/photos/file_42.jpg
INFO:__main__:Downloaded photo successfully with file_id: AgACAgUAAxkBAAIBx2sdFpfpjbDNU4nclU_ji1AALGvDEbpJVhVl6sAQADAgADeQADNQQ
INFO:__main__:All downloaded media paths: ['photo_455_AQADxrwxG6SVYVZ-.jpg']
INFO:__main__:Tweet posted: 1827892852770193799
INFO:__main__:Deleted file: photo_455_AQADxrwxG6SVYVZ-.jpg

The URL wont work because i had to modify it to comply with SO guidelines. I am trying to download both the images(or all incase it is more) i had sent but only 1 got downloaded. The code is not iterating through the entire update within the media_group_id. Could you please advise how I can resolve this issue.


Solution

  • When a message group is received, they are, in fact, separate messages that are just related to each other by the same media_group_id. Look at the images below (I used Cloudflare Workers to host the bot and print the updates to the output).

    Sent two images as a single group to the bot: Two images sent as a group to the bot

    Two separate message updates are logged in the console: Two different updates were logged

    Each message has its own unique message_id, but they share the same media_group_id: two different message updates with the same media_group_id

    As you can see, for the seemingly single message that contains two images, there are actually two separate messages with different message_ids received in two different updates.

    Solution:

    And so for the possible solutions, one straightforward way is to receive the first update and store the media_group_id in a variable. Then for the consecutive updates, check the media_group_id. If it is the same, download those images as well; otherwise, in a separate function, proceed with the rest of your logic.

    If you don't know the number of images in advance, using a timeout to decide when to start the download is a common approach:

    group_id = None
    urls = []
    
    def download_all_photos():
        media_paths = []
        
        for download_url, file_name, file_id in urls:
            try:
                # Log the full download URL
                full_download_url = f"{download_url}"
                logger.info(f'Downloading photo from URL: {full_download_url}')
        
                # Download the photo using the download URL
                response = requests.get(full_download_url)
                response.raise_for_status()  # Check if the download was successful
                with open(file_name, 'wb') as f:
                    f.write(response.content)
                media_paths.append(file_name)
                logger.info(f'Downloaded photo successfully with file_id: {file_id}')
            except Exception as e:
                logger.error(f'Error downloading photo with file_id {file_id}: {e}')
    
        logger.info(f'All downloaded media paths: {media_paths}')
        return media_paths
    
    async def download_photo(update: Update, context: ContextTypes.DEFAULT_TYPE):
        global group_id
        """Download the highest resolution photo sent in each message update."""
        photo = update.message.photo
    
        if photo:
            # The last item in the photo array is the highest resolution version
            file_id = photo[-1].file_id
            logger.info(f'Processing photo with file_id: {file_id}')
    
            file_info = await context.bot.get_file(file_id)
            download_url = file_info.file_path  # This is already the full path
            print("download path")
            print(download_url)
            # Create a unique filename for each photo, using the message_id and unique file_id
            file_name = f'photo_{update.message.message_id}_{photo[-1].file_unique_id}.jpg'
            
            # Append the download URL, file name, and file ID to the list of downloads
            urls.append([download_url, file_name, file_id])
            
            if group_id is None and update.message.media_group_id:
                group_id = update.message.media_group_id
                
            # Schedule the download after a short delay to ensure all updates are received
            await asyncio.sleep(2)
            
            # Check if no new message with the same media_group_id has been received
            if update.message.media_group_id != group_id:  # if there are 2 photos in the group 
                download_all_photos()
                group_id = None
                urls.clear()
    

    Another possible solution would be to note the media_group_id of the current update and iterate through the last messages to find the grouped messages. However, currently, bots cannot access the history of messages (see this answer), so you still need to rely on saving every message update, which leads back to the previous method.