I have written a file watcher in Python that will watch a specific folder in my laptop and whenever a new parquet file is created in it, the watcher will pull it and read the data inside using Pandas and construct a data frame from it.
Issue: It does all those activities with perfection except the last bit where it has to write the data to the data frame
Here is the code I have written:
# Imports and decalarations
import os
import sys
import time
import pathlib
import pandas as pd
import pyarrow.parquet as pq
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler, PatternMatchingEventHandler
# Eventhandler class
class Handler(FileSystemEventHandler):
def on_created(self, event):
# Import Data
filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
time.sleep(10) # To allow time to complete file write to disk
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True)
dataset.head()
# Code to run for Python Interpreter
if __name__ == "__main__":
path = r"D:\Folder1\Folder2\Folder3" # Path to watch
observer = Observer()
event_handler = Handler()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while(True):
pass
except KeyboardInterrupt:
observer.stop()
observer.join()
The expected output is the first five rows of the data frame, however, it shows me nothing and I get no error either.
Some Useful Information
I have been running this code in Jupyter Notebook.
However, I have also run it in Spyder to see whether a data frame appears at all in its Variable Explorer section. But it didn't.
From this, the natural conclusion would be that the data frame isn't getting created at all. But this is what baffles me. Because I have successfully read this same parquet file from a somewhat less sophisticated code (below) yesterday where I fed the file path as a raw string.
# Less Sophisticated Code
filepath = r"D:\Folder1\Folder2\Folder3\filename.parquet"
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True) # Resets index of dataframe and replaces with integers
dataset.head()
Is the filepath the issue then? I am very happy to provide any other information you may need.
Edit: I have added a screenshot of the output from the code that did not have a file watcher
If you don't print
dataset.head()
, there will be nothing to display unlike dataset.info()
:
class Handler(FileSystemEventHandler):
def on_created(self, event):
# Import Data
filepath = pathlib.PureWindowsPath(event.src_path).as_posix()
time.sleep(10) # To allow time to complete file write to disk
dataset = pd.read_parquet(filepath, engine='pyarrow')
dataset = dataset.reset_index(drop=True)
print(dataset.head()) # <- HERE
Else your code works for me.
Note: prefer use Path
instead of PureWindowsPath
.