python-3.12prompt-toolkit

prompt-toolkit: How can I properly escape HTML-formatted return results from an API?


Question is: How can I escape this return result and maintain formatting using printft()?

So, I've included the code below since it shows the dilemma better, but what I have is a very terrible little youtube searcher + mpv player that actually handles playlist searching unlike every single other popular library right now. When I get the results return, I use print_formatted_text(HTML) to display it all with basic coloring and such. Without proper escaping, I get this error occasionally:

not well formed (invalid token): line 1, column 111

when the return includes some damn character that gets spit back by the function. If I use printft(HTML(escape ... and escape my entire f-string, I lose formatting or color or whatever else I wanna do with it since, well, it's all escaped. I cannot do something like

printft(HTML ...)) + print(escape [results])) because of the + operator and comparing int and str, plus it feels like the worst possible solution to this anyway. I've included the len section of my search to demonstrate why I can't do an int and why I might just be a bigass dummy here and missing something obvious. Also, I feel like I have to use len for the comparison because the API has a limit and Google says I am going to burn in hellfire if I exceed it.

        for i, result in enumerate(results, start=1):
            if 'title' not in result:
                print_formatted_text(HTML("No results found"), style=style)
                continue
            i = 0 + i
            print_formatted_text(HTML(f"{i}. Title: <title>{result['title']}</title>"), style=style)
            if search_type == 'v':
                if 'videoId' not in result:
                    print_formatted_text(HTML("No results found"), style=style)
                    continue
                print_formatted_text(HTML(f"   Video ID: <video-id>{result['videoId']}</video-id>"), style=style)
                if 'description' not in result:
                    print_formatted_text(HTML("No description found"), style=style)
                    continue
                print_formatted_text(HTML(f"   <descriptiontext>Description: {(result['description'][:100])}...</descriptiontext>"), style=style)   
            else:
                if 'playlistId' not in result:
                    print_formatted_text(HTML("No results found"), style=style)
                    continue
                print_formatted_text(HTML(f"   <video-id>Playlist ID:</video-id> {result['playlistId']}"), style=style)
                if 'description' not in result:
                    print_formatted_text(HTML("No description found"), style=style)
                    continue
                print(f"  Description: {(result['description'][:100])}...")         
                print_formatted_text("━" * 50)

=========

try:
    if not query:
        print("Please provide a search query.")
        return
    
    results = []
    page_token = None
    while len(results) < max_results:
        request = youtube.search().list(
            part="snippet",
            maxResults=min(max_results - len(results), 50),
            q=query,
            type=search_type,
            pageToken=page_token
        )
        response = request.execute()

        for item in response['items']:
            if 'snippet' in item:
                snippet = item['snippet']
                if search_type == 'video':
                    results.append({
                        'title': snippet['title'],
                        'videoId': item['id']['videoId'],
                        'channelTitle': snippet['channelTitle'],
                        'description': snippet['description']
                    })
                elif search_type == 'playlist':
                    results.append({
                        'title': snippet['title'],
                        'playlistId': item['id']['playlistId'],
                        'channelTitle': snippet['channelTitle'],
                        'description': snippet['description']
                    })

python 3.12.8

prompt-toolkit Version: 3.0.48


Solution

  • The Problem: Escape Issue: You want to escape the HTML content to avoid issues with invalid tokens (such as unescaped special characters), but you lose the formatting (like colors) when using escape(). HTML Content: You are using HTML() for styling (e.g., coloring text), and applying escape() would interfere with that. Solution: Instead of escaping the entire HTML string, you can sanitize the content only where necessary, keeping the formatting intact. You should:

    Only escape the parts of the HTML that could cause issues, such as dynamic content (like description). Use the HTML() formatter correctly to prevent your formatted string from being malformed.

    from html import escape
    

    for i, result in enumerate(results, start=1): if 'title' not in result: print_formatted_text(HTML("No results found"), style=style) continue i = 0 + i print_formatted_text(HTML(f"{i}. Title: {escape(result['title'])}"), style=style)

    if search_type == 'v':
        if 'videoId' not in result:
            print_formatted_text(HTML("No results found"), style=style)
            continue
        print_formatted_text(HTML(f"   Video ID: <video-id>{escape(result['videoId'])}</video-id>"), style=style)
        
        if 'description' not in result:
            print_formatted_text(HTML("No description found"), style=style)
            continue
        # Escape only the description text here, keeping HTML tags intact
        print_formatted_text(HTML(f"   <descriptiontext>Description: {escape(result['description'][:100])}...</descriptiontext>"), style=style)
    
    else:
        if 'playlistId' not in result:
            print_formatted_text(HTML("No results found"), style=style)
            continue
        print_formatted_text(HTML(f"   <video-id>Playlist ID:</video-id> {escape(result['playlistId'])}"), style=style)
        
        if 'description' not in result:
            print_formatted_text(HTML("No description found"), style=style)
            continue
        # Escape description content but not the HTML structure
        print(f"  Description: {escape(result['description'][:100])}...")
        
    print_formatted_text("━" * 50)