pythonterminalcommand-linesyslanguage-concepts

How do you 'catch' Email addresses inside a log file?


'''
This program is to read through any number of inputs (that is only: .txt files) the user passes through the sys.argv (through the terminal only). The file should only be a .txt file. Which means there is a conditional. Then the program should print those results to the terminal.
'''


import sys


def find_email(line1):
    '''
    We must find the 'email address pattern' that is: username@domainname.domain
    There is only one symbol to help the read file method catch and print those email addresses:
    The @ symbol and then the two ' ' spaces at each end of the email address. 
    when we call from main() we must read the log file containing emails. 
    '''
    line1 = [' ', '@', ' ']
    at_postion = line1.find('@')
    first_place = line1.find(' ', at_postion)
    second_place = line1.find(' ', at_postion)
    return line1[second_place: first_place + 1]


def main():
    '''
    Main for sys argv input and one wrong file conditional.
    '''
    result = []
    results = []
    sys.argv(input('Enter .txt file: '))
    with open('r', sys.argv) as input:
        for result in find_email():
            result.append(results)
        print(result)
        if sys.argv(input('Enter .txt file: ')) == type(str):
            find_email()
        else:
            print("Error, enter .txt files only")
            exit()
    print(results)

if __name__ == "__main__":
    main()

Note again: I am only supposed to use string methods to find these spam email addresses in the log file using read file method and print those emails to the terminal. I am also certain that I have not encountered other conditional errors yet, so it is a work in progress. Also not that this is a school assignment.

a) With the correct sys.argv values. Given that a user can enter as much input as they want. Show error messages or usage in case of an error input.

b) Find all email addresses from a given logfile, considering I can enter a different file. DO NOT use any static file.

c) Using the code modularity, refactor your code.

I ran the code a few times and rewritten and I am still getting a variety of errors, I am certain that I am not conceptualizing some critical things. Please point out what you think I am missing or not understanding.

Please feel free to point me to some resources that can help me understand the following: 1) code modularity, 2) read file method, 3) anything I got right, and 4) anything I forgot.


Solution

  • While you've said this isn't a school assignment, it reads like a programming exercise, and for that reason, I won't provide a complete answer. However, I will provide a couple of pointers to set you in the right direction.

    The first pointer is that you shouldn't be trying to mess with argv. argv is a list whose values are populated based on the arguments provided on the command-line. So, if your command-line looked like this: some-program.py arg1 arg2 arg3, then the following would be the contents of argv:

    argv[0] = "some-program.py"
    argv[1] = "arg1"
    argv[2] = "arg2"
    argv[3] = "arg3"
    

    The second pointer I'll provide is that you have the arguments flipped for the open call. The file name should be the first argument, and I generally will specify the mode parameter explicitly, like so: open("some_file.txt", mode = "r")

    You cannot use input as a variable name, as you're trying to do in with open('r', sys.argv) as input, as input is a reserved word regardless of where you put it.

    You also don't have any logic to read things from the file. My suggestion would be to use the readline() method, like so:

    with open("some_file.txt", mode = "r") as the_file:
        while ( (line := the_file.readline() ) != "" ):
            fields = line.split()
            print("The new line is: %s" % line)
            print("The line's fields are: %s" % str(fields) )
    

    (Note that the walrus operator := only works with python >= 3.8)

    The above snippet opens a file called "some_file.txt" and creates a file handle for it: the_file. Then, each line of the file is read and stored in line. The first statement in the while loop body splits the line into fields based on whitespace**. The contents of line are then printed out with the first print statement, and the contents of the fields list are printed out with the second. When line == "", the file has been exhausted and the loop ends.

    **For instance, if line == "something@hotmail.com is an email", then fields would consist of the following:

    fields[0] = "something@hotmail.com"
    fields[1] = "is"
    fields[2] = "an"
    fields[3] = "email"
    

    As for picking out the email addresses, were you not restricted by the "I am only supposed to use string methods", I'd honestly prefer using regular expressions to capture things like this. However, from the docstring in the find_email function, there are three tokens that are consistent across all emails:

    1. One @
    2. One .
    3. Two ' 's at the end

    At a high level, here's how I would approach this:

    1. Use the find method to check whether @ and . exist in the string
    2. If they both exist, check whether the @ appears before .
    3. Check whether the last two characters of the string are ' ' and ' '.

    If all three checks pass, then you can print out the original string.

    Regarding code modularity, the find_email method is a good example of that. If you're writing a snippet of code that could potentially be reused in another context or could potentially be executed multiple times, wrap it in a function. Not only does this improve the readability of your code, but it also makes debugging tremendously easier.

    Hopefully the above items are of some use. Feel free to leave a comment if you have a question.