pythonpython-3.xprogramming-languageslexerhigh-level

I'm writing a programming language in Python, but I have a problem with the lexer function


I'm writing a programming language in Python, but I have a problem with the lexer function.

I'll leave you the code, which is fully functional:

import sys

inputError = """

(!) Error.
(!) Cannot get input.
(!) Please try again.

"""
lexerError = """

(!) Error.
(!) Lexer error. Cannot read input and separate tokens.
(!) Please try again.

"""

def __get_input():
    err = False
    ipt = None
    
    try:
        ipt = input("} ")
        err = False
        
    except:
        ipt = None
        err = True
        
    return err, ipt

def __lexer(ipt):
    err = False
    
    digits  = "1234567890"
    
    NUM     = "DIGIT"
    PLUS    = "PLUS"
    MINUS   = "MINUS"
    MULTP   = "MULTP"
    DIV     = "DIV"
    L_BRK   = "L_BRK"
    R_BRK   = "R_BRK"
    
    token = ""
    tokens = []
    ttypes = []
    
    try:
        for i in range(len(ipt)):
            char = ipt[i]
            
            if char==' ':
                tokens.append(token)
                token = ""
            
            else:
                token = token + ipt[i]
        
        tokens.append(token)
        
        for i in range(len(tokens)):
            token = tokens[i]

            for j in range(len(token)):
                if token[j] in digits:
                    ttypes[i] = DIGITS
                
                else:
                    ttypes[i] = ''
        
        err = False
        
    except:
        tokens = []
        ttypes = []
        
        err = True
        
    
    return err, tokens, ttypes

def __init():
    err, ipt = __get_input()
    
    if err==True:
        print(inputError)
        sys.exit()
    
    err, tokens, ttypes = __lexer(ipt)
    
    if err==True:
        print(lexerError)
        sys.exit()
    
    print(err, tokens, ttypes)
    
__init()

My problem is at lines 66-67. I was expecting the lexer function (the rest works perfectly) to read the input from the user, separate it in tokens- and this works - and so recognize the type of the token. I've defined all the types of tokens, but I wanted to start from the integer type. The for cycle at line 62 starts the routine to verify if the token contains only valid digits or not, and so assigns the corresponding type to the current element of the array ttypes. So at line 67, using GDB online debugger, the program jumps directly to the except, and I don't know why. Things got more complicated to understand to me when I tried to put some print() functions instead of the lines ttypes[i] = DIGITS and ttypes[i] = '', because it worked as expected: if the token was composed by a number it printed something, else it printed something else.

...
        for i in range(len(tokens)):
            token = tokens[i]

            for j in range(len(token)):
                if token[j] in digits:
                    print("It's an integer")
                
                else:
                    print("It isn't an integer.")
...

That's what I tried.

I hope you could understand my problem. Thank you very much!


Solution

  • This issue is related to the handling of the ttypes list. You are attempting to assign values to ttypes[i], but ttypes is initialized as an empty list. The problem arises because Python lists don't allow direct assignment to indices that don't exist.

    This is an example.

    ttypes = []
    ttypes[0] = NUM
    print(ttypes) # This will throw an IndexError
    

    So when you try to assign ttypes[i] = NUM(I assume you were trying to assign NUM, instead of DIGITS), it raises an IndexError, which is why the code jumps to the except block. Use the append() function instead.

    Here's my suggestion.

    for token in tokens:
        if all(char in digits for char in token):
            ttypes.append(NUM)
        else:
            ttypes.append("UNKNOWN") # You can add more logic
    

    I hope this will help a little.