I'm following "Hands on Large Language Models" by Alammar.
One of the examples is:
from transformers import AutoModelForCausalLM, AutoTokenizer
colors_list = [
'102;194;165', '252;141;98', '141;160;203',
'231;138;195', '166;216;84', '255;217;47'
]
def show_tokens(sentence, tokenizer_name):
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name)
token_ids = tokenizer(sentence).input_ids
for idx, t in enumerate(token_ids):
print(
f'\x1b[0;30;48;2;{colors_list[idx % len(colors_list)]}m' +
tokenizer.decode(t) +
'\x1b[0m',
end=' '
)
text = """
English and CAPITALIZATION
🎵 鸟
show_tokens False None elif == >= else: two tabs:" " Three tabs: " "
12.0*50=600
"""
show_tokens(text, "bert-base-uncased")
Which is supposed to show up as:
However, in Visual Studio Code I see it as:
You’re running this inside a Jupyter Notebook, and that’s the key detail here.
The color formatting in your code uses ANSI escape codes, which are designed to work in standard terminals like your system shell (e.g., Terminal on macOS/Linux, or Command Prompt/Powershell on Windows). These terminals understand those escape sequences and display colored backgrounds properly. However, Jupyter Notebooks don’t natively support ANSI color codes in the same way, they just show the raw escape sequences or ignore them altogether. That’s why it looks off or doesn’t render as expected.
Try running the script in a regular terminal (outside of Jupyter/VS Code)
If you’re set on seeing it within a Jupyter-like environment, you can explore using libraries like rich or IPython.display tools that are more Jupyter-compatible for output formatting
Or, if you’re just debugging tokenization, consider printing token info with regular text output or using HTML output via IPython.display.HTML