pythonloggingpython-loggingseed

Python execution log


I'd like to create a log for a Python script execution. For example:

import pandas as pd
data = pd.read_excel('example.xlsx')
data.head()

How can I create a log for this script un order to know who run the script, when was executed, when did it finish. And ir for example, suppossing I take a sample of the df, how can I make to create a seed so I can share it to another person to execute it and have the same result?


Solution

  • You could use the logging module that comes by default with Python. You'll have to add a few extra lines of code to configure it to log the information you require (time of execution and user executing the script) and specify a file name where the log messages should be stored at.

    In respect to adding the information of "who" ran the script, it will depend on how you want to differentiate users. If your script is intended to be executed on some server, you might want to differentiate users by their IP addresses. Another solution is to use the getpass module, like I did in the example below.

    Finally, when generating a sample from data, you can set an integer as seed to the parameter random_state to make the sample always contain the same rows.

    Here's a modified version of your script with the previously mentioned changes:

    # == Necessary Imports =========================================================
    import logging
    import pandas as pd
    import getpass
    
    
    # == Script Configuration ======================================================
    # Set a seed to enable reproducibility
    SEED = 1
    
    # Get the username of the person who is running the script.
    USERNAME = getpass.getuser()
    
    # Set a format to the logs.
    LOG_FORMAT = '[%(levelname)s | ' + USERNAME + ' | %(asctime)s] - %(message)s'
    
    # Name of the file to store the logs.
    LOG_FILENAME = 'script_execution.log'
    
    # Level in which messages are to be logged. Logging, by default has the
    # following levels, ordered by ranking of severity:
    # 1. DEBUG: detailed information, useful only when diagnosing a problem.
    # 2. INFO: message that confirms that everything is working as it should.
    # 3. WARNING: message with information that requires user attention
    # 4. ERROR: an error has occurred and script is unable to perform some function.
    # 5. CRITICAL: serious error occurred and script may stop running properly.
    LOG_LEVEL = logging.INFO
    # When you set the level, all messages from a higher level of severity are also
    # logged. For example, when you set the log level to `INFO`, all `WARNING`,
    # `ERROR` and `CRITICAL` messages are also logged, but `DEBUG` messages are not.
    
    
    # == Set up logging ============================================================
    logging.basicConfig(
        level=LOG_LEVEL,
        format=LOG_FORMAT,
        force=True,
        datefmt="%Y-%m-%d %H:%M:%S",
        handlers=[logging.FileHandler(LOG_FILENAME, "a", "utf-8"),
                  logging.StreamHandler()]
    )
    
    
    # == Script Start ==============================================================
    # Log the script execution start
    logging.info('Script started execution!')
    
    # Read data from the Excel file
    data = pd.read_excel('example.xlsx')
    
    # Retrieve a sample with 50% of the rows from `data`.
    # When a `random_state` is set, `pd.DataFrame.sample` will always return
    # the same dataframe, given that `data` doesn't change.
    sample_data = data.sample(frac=0.5, random_state=SEED)
    
    # Other stuff
    # ...
    
    # Log when the script finishes execution
    logging.info('Script finished execution!')
    
    
    

    Running the above code prints to the console the following messages:

    [INFO | erikingwersen | 2023-02-13 23:17:14] - Script started execution!
    [INFO | erikingwersen | 2023-02-13 23:17:14] - Script finished execution!
    

    It also creates or updates a file named 'script_execution.log', located at the same directory as the script with the same information that gets printed to the console.