I am reading a documentation about lexical analysis of python,which described the course how INDENT and DEDENT tokens are generated.I post the description here.
The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows.
Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it must be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero.
I've tried to understand the DEDENT section,but failed to,could somebody give a better explanation than the referenced?
As Python sometimes is easier than English, here is a rough translation of this description to Python. You can see real-world parser (written by myself) that works like this here.
import re
code = """
for i in range(10):
if i % 2 == 0:
print(i)
print("Next number")
print("That's all")
for i in range(10):
if i % 2 == 0:
print(i)
print("That's all again")
for i in range(10):
if i % 2 == 0:
print(i)
print("That's all")
"""
def get_indent(s) -> int:
m = re.match(r' *', s)
return len(m.group(0))
def add_token(token):
print(token)
INDENT="indent"
DEDENT="dedent"
indent_stack = [0]
# Before the first line of the file is read, a single zero is pushed on the stack
for line in code.splitlines():
print("processing line:", line)
indent = get_indent(line)
# At the beginning of each logical line, the line’s
# indentation level is compared to the top of the stack.
if indent > indent_stack[-1]:
# If it is larger, it is pushed on the stack,
# and one INDENT token is generated.
add_token(INDENT)
indent_stack.append(indent)
elif indent < indent_stack[-1]:
while indent < indent_stack[-1]:
# If it is smaller, ...
# all numbers on the stack that are larger are popped off,
# and for each number popped off a DEDENT token is generated.
add_token(DEDENT)
indent_stack.pop()
if indent != indent_stack[-1]:
# it must be one of the numbers occurring on the stack;
raise IndentationError
while indent_stack[-1]>0:
# At the end of the file, a DEDENT token is generated for each number
# remaining on the stack that is larger than zero.
add_token(DEDENT)
indent_stack.pop()
Here is the output:
processing line:
processing line: for i in range(10):
processing line: if i % 2 == 0:
indent
processing line: print(i)
indent
processing line: print("Next number")
dedent
processing line: print("That's all")
dedent
processing line:
processing line: for i in range(10):
processing line: if i % 2 == 0:
indent
processing line: print(i)
indent
processing line: print("That's all again")
dedent
dedent
processing line:
processing line: for i in range(10):
processing line: if i % 2 == 0:
indent
processing line: print(i)
indent
processing line: print("That's all")
dedent
dedent
File "<string>", line unknown
IndentationError