pythonpandasnumpyflaskwaitress

Segmentation fault using np.cov while serving a flask app via waitress


I wanted to perform a simple calculation of the covariance within a more complex flask app. Below I created a minimal random example without flask (which is actually working) of the calculation causing the problems (in the flask/waitress setup).

import pandas as pd
import numpy as np

#import faulthandler
#faulthandler.enable()

n = 547
t = 50
data = np.random.random(size=(n*t,2))*10**-6

times = pd.date_range(start="20210501",end="20210502",periods=t)
index = pd.MultiIndex.from_product([times,range(n)],names=["time","id"])
df = pd.DataFrame(data,index=index, columns=["lon","lat"])

for time, group in df.groupby("time"):    
    print(group)
    if group['lon'].max() == group['lon'].min():
        cov = np.zeros((2,2))
    else:
        if group['lat'].max() == group['lat'].min():
            cov = np.zeros((2,2))
        else:
            cov = np.cov(group['lat'].astype(float),
                         group['lon'].astype(float))            
            
print(cov)

A segmentation fault occurs, when the calculation is performed (as a function call) from an flask app endpoint served via waitress. It is not caused diretly by np.cov, as the call returns fine. Data is quite small (approx. (500,2)). But the segmentation fault does occur within the loop. Sometimes 150. iteration, sometimes earlier, sometimes never. I checked memory usage after np.cov and it is always around 100mb. Around 16GB are available.

Maybe it makes sense for somebody? Or maybe it is bad practice to use pandas/numpy within a flask app? Thanks in advance!!

flask 1.1.2 waitress 1.4.3 numpy 1.19.1 pandas 1.1.3


Solution

  • Updating all packages solved the issue