pythonsignal-processingfftspectrum

Mismatch between periodogram calculated by SciPy periodogram and AstroPy Lomb Scargle periodogram at low frequencies


I am trying to compute the periodogram of my data using both SciPy's periodogram and AstroPy Lomb-Scargle periodogram—the periodogram matches everywhere except at frequencies near the minimum frequency as shown in my plots. These are the results of numerical simulations.

Based on observational data, I expect a strong signal near 0. Hence, the SciPy periodogram results look more physically plausible than the Lomb-Scargle periodogram.

I haven't figured out why and how to make them similar. Any insight is deeply appreciated.

Below is the code to reproduce my plots.

From standard SciPy periodogram: enter image description here From Lomb-Scargle periodogram: enter image description here

from astropy.timeseries import LombScargle
import numpy as np
import pandas as pd
from scipy import signal
import requests 
import matplotlib.pyplot as plt



def plot_periodogram(x,y,N_freq,min_freq,max_freq,height_threshold,periodogram_type): 

fig, ax = plt.subplots(figsize=(12,8))

if periodogram_type == 'periodogram':
    dx = np.mean(np.diff(x))  # Assume x is uniformly sampled
    fs = 1 / dx

    freq, power_periodogram = signal.periodogram(y,fs,scaling="spectrum",nfft=N_freq,
                                                     return_onesided=True,detrend='constant')
    power_max = power_periodogram[~np.isnan(power_periodogram)].max()
    
    plt.plot(freq, power_periodogram/power_max,linestyle="solid",color="black",linewidth=2)
    
    filename = "PowerSpectrum"
    
else:
    
    freq = np.linspace(min_freq,max_freq,N_freq)
    ls= LombScargle(x, y,normalization='psd',nterms=1)
    power_periodogram= ls.power(freq)
            
    power_max = power_periodogram[~np.isnan(power_periodogram)].max()
    
    false_alarm_probabilities = [0.01,0.05]
    periodogram_peak_height= ls.false_alarm_level(false_alarm_probabilities,minimum_frequency=min_freq, 
                                                  maximum_frequency=max_freq,method='bootstrap')
    
    filename = "PowerSpectrum_LombScargle"
    plt.plot(freq, power_periodogram/power_max,linestyle="solid",color="black",linewidth=2)
    plt.axhline(y=periodogram_peak_height[0]/power_max, color='black', linestyle='--')
    plt.axhline(y=periodogram_peak_height[1]/power_max, color='black', linestyle='-')



peaks_index, properties = signal.find_peaks(power_periodogram/power_max, height=height_threshold)    
peak_values = properties['peak_heights']
peak_power_freq = freq[peaks_index]

for i in range(len(peak_power_freq)):
    plt.axvline(x = peak_power_freq[i],color = 'red',linestyle='--')
    ax.text(peak_power_freq[i]+0.05, 0.95, str(round(peak_power_freq[i],2)), color='red',ha='left', va='top', rotation=0,transform=ax.get_xaxis_transform())

   
fig.patch.set_alpha(1)   
plt.ylabel('Spectral Power',fontsize=20)
plt.xlabel('Spatial Frequency', fontsize=20)
plt.grid(True)
plt.xlim(left=min_freq,right=max_freq)
   
plt.xticks(fontsize=20)
plt.yticks(fontsize=20)
plt.savefig(filename,bbox_inches='tight')
plt.show()




# URL of the CSV file on Pastebin
url = 'https://pastebin.com/raw/uFi8WPvJ'

# Fetch the raw data from the URL
response = requests.get(url)

# Check if the request was successful
if response.status_code == 200:
    # Decode the response content to text
    data = response.text
    
    # Save the data to a CSV file
    with open('data.csv', 'w') as f:
        f.write(data)
        
df =pd.read_csv('data.csv',sep=',',comment='%', names=['x', 'Bphi','r','theta'])
x = df['x'].values
y = df['Bphi'].values

# https://stackoverflow.com/questions/37540782/delete-nan-and-corresponding-elements-in-two-same-length-array
indices = np.logical_not(np.logical_or(np.isnan(x), np.isnan(y)))
x = x[indices]
y = y[indices]

y = y - np.mean(y)

N_freq = 10000

min_freq = 0.001; 
max_freq = 4.0
height_threshold =0.7

plot_periodogram(x,y,N_freq,min_freq,max_freq,height_threshold,"periodogram")
plot_periodogram(x,y,N_freq,min_freq,max_freq,height_threshold,"ls")

   

Solution

  • I was messing around a lot with the code and realized that I did not even check how the data looks like:

    raw data

    The data does not only have an offset, but a trend. This needs to be removed before any kind of frequency transformation.

    I hence used the following:

    df =pd.read_csv('data.csv',sep=',',comment='%', names=['x', 'Bphi','r','theta'])
    df.dropna(inplace = True)
    x = df['x'].values
    y = signal.detrend(df['Bphi'].values)
    

    The results are not identical, but very similar.

    Scipy approach:

    scipy

    Astropy approach:

    Astropy

    I would recommend a very deep look into the documentation of both functions. At the end of refining this approach, you can use np.allclose() to check if the results are acceptable.