I'm writing a programming language in Python, but I have a problem with the lexer function.
I'll leave you the code, which is fully functional:
import sys
inputError = """
(!) Error.
(!) Cannot get input.
(!) Please try again.
"""
lexerError = """
(!) Error.
(!) Lexer error. Cannot read input and separate tokens.
(!) Please try again.
"""
def __get_input():
err = False
ipt = None
try:
ipt = input("} ")
err = False
except:
ipt = None
err = True
return err, ipt
def __lexer(ipt):
err = False
digits = "1234567890"
NUM = "DIGIT"
PLUS = "PLUS"
MINUS = "MINUS"
MULTP = "MULTP"
DIV = "DIV"
L_BRK = "L_BRK"
R_BRK = "R_BRK"
token = ""
tokens = []
ttypes = []
try:
for i in range(len(ipt)):
char = ipt[i]
if char==' ':
tokens.append(token)
token = ""
else:
token = token + ipt[i]
tokens.append(token)
for i in range(len(tokens)):
token = tokens[i]
for j in range(len(token)):
if token[j] in digits:
ttypes[i] = DIGITS
else:
ttypes[i] = ''
err = False
except:
tokens = []
ttypes = []
err = True
return err, tokens, ttypes
def __init():
err, ipt = __get_input()
if err==True:
print(inputError)
sys.exit()
err, tokens, ttypes = __lexer(ipt)
if err==True:
print(lexerError)
sys.exit()
print(err, tokens, ttypes)
__init()
My problem is at lines 66-67. I was expecting the lexer function (the rest works perfectly) to read the input from the user, separate it in tokens- and this works - and so recognize the type of the token. I've defined all the types of tokens, but I wanted to start from the integer type.
The for
cycle at line 62 starts the routine to verify if the token contains only valid digits or not, and so assigns the corresponding type to the current element of the array ttypes
.
So at line 67, using GDB online debugger, the program jumps directly to the except
, and I don't know why. Things got more complicated to understand to me when I tried to put some print()
functions instead of the lines ttypes[i] = DIGITS
and ttypes[i] = ''
, because it worked as expected: if the token was composed by a number it printed something, else it printed something else.
...
for i in range(len(tokens)):
token = tokens[i]
for j in range(len(token)):
if token[j] in digits:
print("It's an integer")
else:
print("It isn't an integer.")
...
That's what I tried.
I hope you could understand my problem. Thank you very much!
This issue is related to the handling of the ttypes
list.
You are attempting to assign values to ttypes[i]
, but ttypes
is initialized as an empty list. The problem arises because Python lists don't allow direct assignment to indices that don't exist.
This is an example.
ttypes = []
ttypes[0] = NUM
print(ttypes) # This will throw an IndexError
So when you try to assign ttypes[i] = NUM
(I assume you were trying to assign NUM
, instead of DIGITS
), it raises an IndexError
, which is why the code jumps to the except
block. Use the append()
function instead.
Here's my suggestion.
for token in tokens:
if all(char in digits for char in token):
ttypes.append(NUM)
else:
ttypes.append("UNKNOWN") # You can add more logic
I hope this will help a little.