pythonpandasjupyter-notebookparquetwatchdog

Why is my Python file watcher not writing the data from Parquet files to a data frame?


I have written a file watcher in Python that will watch a specific folder in my laptop and whenever a new parquet file is created in it, the watcher will pull it and read the data inside using Pandas and construct a data frame from it.

Issue: It does all those activities with perfection except the last bit where it has to write the data to the data frame

Here is the code I have written:

# Imports and decalarations

import os
import sys
import time
import pathlib
import pandas as pd
import pyarrow.parquet as pq

from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler, PatternMatchingEventHandler
# Eventhandler class

class Handler(FileSystemEventHandler):
    
    def on_created(self, event):
        
        # Import Data

        filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
        time.sleep(10) # To allow time to complete file write to disk
        dataset = pd.read_parquet(filepath, engine='pyarrow')
        dataset = dataset.reset_index(drop=True)
        dataset.head()

# Code to run for Python Interpreter

if __name__ == "__main__":
    
    path = r"D:\Folder1\Folder2\Folder3" # Path to watch
    
    observer = Observer()
    event_handler = Handler()
    observer.schedule(event_handler, path, recursive=True)
    observer.start()
    
    try:
        while(True):
            pass
            
    except KeyboardInterrupt:
        observer.stop()
        observer.join()

The expected output is the first five rows of the data frame, however, it shows me nothing and I get no error either.

Some Useful Information

From this, the natural conclusion would be that the data frame isn't getting created at all. But this is what baffles me. Because I have successfully read this same parquet file from a somewhat less sophisticated code (below) yesterday where I fed the file path as a raw string.

# Less Sophisticated Code

filepath = r"D:\Folder1\Folder2\Folder3\filename.parquet"

dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True) # Resets index of dataframe and replaces with integers
dataset.head()

Output Screenshot (In Jupyter Notebook)

Is the filepath the issue then? I am very happy to provide any other information you may need.

Edit: I have added a screenshot of the output from the code that did not have a file watcher


Solution

  • If you don't print dataset.head(), there will be nothing to display unlike dataset.info():

    class Handler(FileSystemEventHandler):
        
        def on_created(self, event):
            
            # Import Data
    
            filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
            time.sleep(10) # To allow time to complete file write to disk
            dataset = pd.read_parquet(filepath, engine='pyarrow')
            dataset = dataset.reset_index(drop=True)
            print(dataset.head())  # <- HERE
    

    Else your code works for me.

    Note: prefer use Path instead of PureWindowsPath.