I have a TimeSeries shown below. We can clearly see that there is a cyclical behavior
in the data, where rise and fall happens every 12 months and so on, irrespective of if trend is rising/falling. I am struggling to find how to extract this frequency
of 12 from the data using scipy.fft
python library. Mostly every time repetition happens around 12 months, but it can be around 11 months as well or 13 months, but 12 is the most common frequency. I am using using Fourier transformation, but can't figure out how to extract this most common frequency (12) using some python library?
Thanks
Here is the data:
11130.578853063385,
6509.723592808,
5693.928982796129,
3415.099921325464,
-9299.291673374173,
-3388.284173885658,
-5577.9316298032745,
-3509.583232678111,
2285.99065857961,
3844.3061166856014,
-7383.526882168155,
-4622.792125020905,
2813.586128745183,
-1501.9405075290495,
8911.954971658348,
7800.444458471794,
-1190.7952377053866,
4768.791467877524,
2173.593871988719,
-2420.04786197912,
-2304.842777539399,
-3562.1788525161837,
-8766.208378658988,
-7655.936603573945,
-5890.9298543619125,
-9628.764012284291,
12124.740839506767,
12391.257220312522,
7512.253051850619,
12921.032383220418,
10063.270097922614,
-1350.2599176773563,
-6887.434936788105,
-11116.26528794868,
-10196.871058658888,
-10874.172006461778,
-15014.190086779208,
-17837.744786550902,
15235.434771455053,
17183.25815161994,
16835.95193044929,
21063.986176551374,
17987.99577807288,
-270.6290142721815,
-11239.957979217992,
-18724.854251002133,
-11752.820614075652,
-14332.597031388648,
-24609.22398631297,
-26155.98046224267,
18192.356438131075,
22165.14150786262,
26758.419290443217,
29284.65841543005,
25762.928479462924,
865.3393849464444,
-15121.264730579132,
-26306.45361387036,
-13494.286360139175,
-18089.58324839494,
-34738.184049794625,
-34718.87495981627,
21145.112760626133,
27322.030709198487,
37252.78168890166,
37846.98231395838,
33206.62103950547,
2092.870600583023,
-18537.521405900694,
-33955.48182565038,
-15445.551953312479,
-22284.152196532646,
-45880.94206326016,
-44229.92788481257,
24988.038646046363,
32958.71017047145,
49117.93320642349,
47304.760779654374,
40776.01828187993,
3403.573579093782,
-22402.79273128273,
-42361.96378730598,
-17190.060741565456,
-27378.2527904574,
-59155.49212555031,
-60122.10588005664,
26272.133100405994,
44887.192435244986,
69002.74742137044,
59037.928523261784,
42122.51604012313,
6075.663868325184,
-20631.710295791454,
-48088.66531781877,
-23396.29341809641,
-40847.479839729145,
-68317.87342502769,
-73679.4424942532,
28302.69374713241,
57321.16868946109,
83820.10748232565,
68399.66173487401,
44989.374076533895,
8830.088516704302,
-18149.500187183363,
-52028.5021898363,
-31013.963236266634,
-53956.5249205745,
-77250.59604604884,
-86642.45203443282,
30541.62328593645,
69812.47143595785,
98233.7834300242,
77385.915451272,
48189.69475295938,
11504.22579592029,
-15251.799652343976,
-55879.292898282,
-38956.992207762654,
-67210.9936142441,
-86636.69916492153,
-99845.12467446178,
32751.253099701484,
82656.01928819218,
113259.2399845611,
86532.20966362985,
51019.20889397171,
14289.09297146163,
-11777.371574335935,
-59627.30976102835,
-47170.18721199697,
-81027.36407627042,
-96178.09587995053,
-113526.93736260894,
34817.23859755824,
95927.57143777516,
128782.84687524068,
95920.65382048927,
53226.62965224956,
17272.000877533148,
-7716.869736424804,
-63110.06727848651,
-55696.68126167806,
-95538.60898488457,
-105325.08525283691,
-127600.17956244369,
36734.97589442811,
109601.51109750797,
144205.71977383518,
105517.48123365057,
54793.814706888734,
20380.77940730315,
-3119.1108027357986,
-66153.73274186133,
-64702.85998743505,
-110650.72884973585
Original time series wasn't provided, so I made some up. EDIT: newly-supplied data provided at the end of the post
Take the magnitude of the DFT using numpy.fft.fft or, for real data as here, numpy.fft.rfft, and calculate its amplitude. (If your data isn't equally spaced in time you can interpolate first.)
You can find peaks in the frequency trace, but note that there can be a lot of energy in low-frequency components, so you may need to exclude these.
Note that the frequency interval is 2.pi/tmax in rad/time_unit, or 1/tmax in cycles/time_unit, so this limits the accuracy you will get.
import numpy as np
import matplotlib.pyplot as plt
T = 150
N = 10000 # For accurate frequency, N and number of cycles must be large
t = np.linspace( 0.0, T, N, endpoint=False )
signal = 75000 * ( t / 150 ) ** 1.5 * ( 1 + 0.8 * np.sin( 2 * np.pi * t / 12 ) + 0.4 * np.sin( 2 * np.pi * t / 4 ) )
dw = 2 * np.pi / T # Frequency interval in rad/time_unit
df = 1 / T # Frequency interval in cycles/time_unit
dt = T / N
FT = np.fft.rfft( signal ) # Use rfft for a real signal (note: only about half the components)
A = np.abs( FT ) / N # Normalised magnitude of Fourier components
w = np.linspace( 0.0, N * dw, N, endpoint=False ) # Frequency values in rad/time_unit
low = 4 # Number of low-frequency components to exclude
m = low + np.argmax( A[low:] ) # Index of dominant frequency (excluded low frequencies)
omega = m * dw # Dominant frequency in rad/s
f = omega / ( 2 * np.pi ) # Dominant frequency in Hz
#f = np.fft.rfftfreq( N, dt )[m] # Alternative, direct from numpy.fft
print( 'Index:', m )
print( 'Frequency interval in rad/time_unit (dw):', dw )
print( 'Dominant frequency in rad/time_unit (w) :', omega )
print( 'Frequency interval in cycles/time_unit (df):', df )
print( 'Dominant frequency in cycles/time_unit (f) :', f )
plt.title( "Time Domain" )
plt.plot( t, signal, 'b-' )
plt.xlabel( 't' )
plt.show()
plt.title( "Frequency Domain" )
plt.plot( w[:len(A)] / ( 2 * np.pi ), A, 'r-' )
plt.xlim( 0.0, 0.6 )
plt.xlabel( 'f' )
plt.show()
Output:
Index: 12
Frequency interval in rad/time_unit (dw): 0.041887902047863905
Dominant frequency in rad/time_unit (w) : 0.5026548245743668
Frequency interval in cycles/time_unit (df): 0.006666666666666667
Dominant frequency in cycles/time_unit (f) : 0.07999999999999999
A time period of 12 corresponds to an f-frequency of 1/12=0.083. Accuracy is limited by df.
EDIT: Supplied Data
Assuming the data is put, as-is, in a file "signal.dat", change the first part of the code to
import numpy as np
import matplotlib.pyplot as plt
signal = np.loadtxt( 'signal.dat', delimiter=',', usecols=0, unpack=True )
N = len(signal)
T = N
t = np.linspace( 0.0, T, N, endpoint=False )
dw = 2 * np.pi / T # Frequency interval in rad/time_unit
df = 1 / T # Frequency interval in cycles/time_unit
dt = T / N # Time interval (will be 1 month)
Then dt = 1 (month), whilst N and T are (numerically) the number of months in the record.
The output gives f=0.083 as the maximum frequency, corresponding to a period of 1/f=12 months.
Index: 13
Frequency interval in rad/time_unit (dw): 0.04027682889217683
Dominant frequency in rad/time_unit (w) : 0.5235987755982988
Frequency interval in cycles/time_unit (df): 0.00641025641025641
Dominant frequency in cycles/time_unit (f) : 0.08333333333333333