pythonparsingmodulestructureorganization

Coding Structure in Python


I'm having difficulty understanding how to incorporate a general structure into my scripts.

Currently I'm writing code that will parse out various sections of a data file into separate lists (a reader as I understand it), then code that will allow me to make user-determined alterations to said lists, and finally a code for writing the new lists to a file. Ultimately the goal is to have code that will allow me to add/remove particular data values and produce new data files with the determined changes.

Up until now, writing the code itself has not been particularly difficult to read edit and write the file, but I have no idea how the various functions should be written so as to be "neat and organized". Because I'm learning the language alongside actually doing this little project, I'm unfamiliar with many of what I refer to as I guess standard code structuring.

Generally speaking for a project like this should I be writing it all in one script made up of separate classes/functions? Is it better to write separate modules that import return values from one into another? Is it typical that a single module would have more than one class? From a sort of birds eye view approach to structuring the code what would be the best way to do this? Currently there is no actual structure to my code and everything is literally written outside of any main() classes and functions and it simply just runs given a particular input file but I'm fairly certain this is likely not the proper way to write code.


Solution

  • One thing that has helped me is creating different functions for manageable chunks of logic. You mentioned all of your code is written without any structure.

    The great thing is that you've already mentioned the logical separations. Try to have functions that have a single job based on that logic. For example:

    -- "parse out various sections of a data file into separate lists"

    def parse_data_file(full_file_path):
        raw_data = read_file(full_file_path) # delegate this out to another function if you're feeling like you need more separation
        data_lists = generate_sub_lists(raw_data) # delegate
        return data_lists
    

    -- "will allow me to make user-determined alterations to said lists"

    def edit_lists(user_commands_kv_args, data_lists):
        for type_of_command, command in user_commands_kv_args:
            if type_of_command == 'Delete':
                run_command_on_lists(command='Delete', data_lists)
    

    -- "a code for writing the new lists to a file"

    def write_lists_to_file(data_lists, output_file_full_path):
        with open(output_file_full_path, 'w+') as f:
            for ls in data_lists:
                f.writelines(ls)
    

    At least now you have some separation based on functionality. Later on you can even create separate classes based on the type of action. For example:

    class DataParser():
        def __init__(full_file_path):
            self.file_path = full_file_path
    
        def read_from_sql():
            pass
    
        def read_from_mongo():
            pass
    
        def read_from_txt():
            pass
        
    

    The advantage of that is that if your file format or source changes you only need to adjust your class interface for parsing that data instead of all over your main entrypoint code.