I have this code that aims to fit the data in here DriveFilelink
The function read_file is utilized to extract information in a structured manner. However, I'm encountering challenges with the Gaussian fit, evident from the discrepancies observed in the Gaussian fit image. This issue appears to stem from the constraints imposed on certain parameters, such as fixing the mean of all the Gaussians and three out of the four amplitudes. Despite these constraints, they are necessary as they are based on available information.
Does anyone know how I can fix this problem? In order to have a better fit.
The code is the following:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import find_peaks
from scipy.optimize import curve_fit
from google.colab import drive
import os
# Mount Google Drive
drive.mount('/content/drive')
# Function to read numeric data from a file
def read_numeric_data(file_path):
with open(file_path, 'r', encoding='latin1') as f:
data = []
live_time = None
real_time = None
for line in f:
if 'LIVE_TIME' in line:
live_time = line.strip()
elif 'REAL_TIME' in line:
real_time = line.strip()
elif all(char.isdigit() or char in ".-+e" for char in line.strip()):
row = [float(x) for x in line.split()]
data.extend(row)
return np.array(data), live_time, real_time
file_path = '/content/drive/MyDrive/ProjetoXRF_Manuel/April2024/20240402_In.mca'
data, _, _ = read_numeric_data(file_path)
a = -0.0188026396003431
b = 0.039549044037714
Data = data
# Función para convolucionar múltiples gaussianas
def multi_gaussian(x, *params):
eps = 1e-10
y = np.zeros_like(x)
amplitude_relations = [1,0.5538, 0.1673 , 0.1673*0.5185 ]
meanvalues= [24.210, 24.002 , 27.276 , 27.238]
amplitude = params[0]
for i in range(4):
sigma = params[i * 3 + 2] # Get the fitted standard deviation for this Gaussian
y += amplitude*amplitude_relations[i]* np.exp(-(x - meanvalues[i])**2 / (2 * sigma**2 + eps))
return y
sigma=[]
area=[]
# Función para graficar el espectro de energía convolucionado
def plot_convolved_spectrum(Data, a, b,i, ax=None):
maxim = np.max(Data)
workingarray = np.squeeze(Data)
# Definir puntos máximos
peaks, _ = find_peaks(workingarray / maxim, height=0.1)
peak_values = workingarray[peaks] / maxim
peak_indices = peaks
# Calcular valores de energía correspondientes a los picos
energy_spectrum = a + b * peak_indices
# Preparar datos para convolución
data = workingarray[:885] / maxim
data_y = data / data.max()
data_x = a + b * np.linspace(0, 885, num=len(data_y))
# Ajustar múltiples gaussianas al espectro de energía
peak_indices2, _ = find_peaks(data_y, height=0.1)
peak_amplitudes = [1,0.5538, 0.1673 , 0.1673*0.5185 ]
peak_means = [24.210, 24.002 , 27.276 , 27.238]
peak_sigmas = [0.1] * 4
params_init = list(zip(peak_amplitudes, peak_means, peak_sigmas))
params_init = np.concatenate(params_init)
# Ajustar curva
params, params_cov = curve_fit(multi_gaussian, data_x, data_y, p0=params_init)
# Obtener una interpolación de alta resolución del ajuste
x_fine = np.linspace(data_x.min(), data_x.max(), num=20000)
y_fit = multi_gaussian(x_fine, *params)
# Data Graphic
ax.scatter(data_x, data_y, color='black', marker='o', s=20, label = 'Data' )
y_data_error =np.sqrt(workingarray[:885])
ax.plot(data_x, data_y + y_data_error/maxim, color='black',linestyle='--')
ax.plot(data_x, data_y - y_data_error/maxim, color='black',linestyle='--')
# Fit Graphic
ax.plot(x_fine, y_fit, label="Fit", linewidth=1.5)
# Extraer desviaciones estándar y amplitudes
sigmas_array = params[2::3]
# Calcular sigma promedio
sigma.append(np.mean(sigmas_array))
# Configuración del gráfico
ax.set_xlabel('Energy (KeV)')
ax.set_ylabel('Normalized Data')
ax.legend()
ax.set_title('Convolved Energy Spectrum')
# Imprimir información
print("Standard deviations:", sigmas_array)
fig, ax = plt.subplots()
plot_convolved_spectrum(Data, a, b,1,ax=ax)
ax.set_xlim(22, 28)
plt.show()
Note I've experimented with setting bounds and refining initial guesses to improve the Gaussian fit. However, it's important to note that working with bounds isn't equivalent to fixing parameters. My aim is to maintain a degree of freedom of just two parameters: the amplitude of the first Gaussian and the standard deviation of all Gaussians. While exploring these adjustments, I'm striving to strike a balance between constraint and flexibility to achieve the desired fit accuracy.
The parameters of the Gaussian function are defined as follows: A represents the amplitude, x_0 denotes the mean value, and sigma represents the standard deviation. In the context of analyzing energy spectra, I possess prior knowledge about the mean values and the relationships governing the amplitudes. Consequently, I aim to constrain certain parameters (such as A and mean) based on this known information. Specifically, I intend to fit only the first amplitude and then utilize the known relationships, such as A2 = 0.1673 * A1 for the second peak. And the to fit the corresponding standar deviation.
Why using a sum of gaussians?.
The apparent singularity of the first peak in the plot might suggest a single Gaussian. However, this is not the case. The Gaussian representing this energy peak actually consists of two Gaussians that are summed together. This complexity arises because the energy emission at that level comprises two distinct values, 24.002 and 24.210. And due to the resolution of the experiment we are not able to distinguish them just by seen the plot.
And for all these reasons is that I am trying to fix some parameters.
If someone has issue with the data here it is:
Here is a MCVE showing how to regress a variable number of peaks from the significant part of your signal.
import itertools
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy import signal, stats, optimize
We load you data and post-process it to get physical series:
data = pd.read_csv("20240402_In.mca", encoding="latin-1", names=["y"]).loc[12:2058]
data["y"] = pd.to_numeric(data["y"])
data["y"] /= data["y"].max()
data["x"] = -0.0188026396003431 + 0.039549044037714 * data.index
data = data.loc[(20. <= data["x"]) & (data["x"] <= 30.), :]
We create the model which accept a variable number of peaks (driven by find_peaks
solution):
def peak(x, A, s, x0):
law = stats.norm(loc=x0, scale=s)
return A * law.pdf(x) / law.pdf(x0)
def model(x, *parameters):
n = len(parameters) // 3
y = np.zeros_like(x)
for i in range(n):
y += peak(x, *parameters[(i * 3):((i + 1) * 3)])
return y
We identify peaks:
indices, bases = signal.find_peaks(data.y, prominence=0.0125)
#(array([ 67, 118, 195, 210]),
# {'prominences': array([0.01987281, 0.99920509, 0.18918919, 0.03338633]),
# 'left_bases': array([ 1, 1, 179, 205]),
# 'right_bases': array([ 76, 160, 219, 219])})
This is the key point, from the identification, we create our educated guess:
x0s = data.x.values[indices]
As = bases["prominences"]
ss = (data.x.values[bases["right_bases"]] - data.x.values[bases["left_bases"]]) / 8.
p0 = list(itertools.chain(*zip(As, ss, x0s)))
#[0.01987281399046105,
# 0.37077228785356864,
# 22.682348638047493,
# 0.9992050874403816,
# 0.7860372502495658,
# 24.699349883970907,
# 0.1891891891891892,
# 0.19774522018856988,
# 27.744626274874886,
# 0.033386327503974564,
# 0.06921082706599968,
# 28.337861935440593]
Now we regress the model wrt your dataset:
popt, pcov = optimize.curve_fit(model, data.x, data.y, p0=p0)
#array([1.03735804e-02, 9.61270732e-01, 2.29030214e+01, 9.92651381e-01,
# 1.59694755e-01, 2.46561911e+01, 1.85332645e-01, 1.21882422e-01,
# 2.77807838e+01, 3.67805911e-02, 1.21890416e-01, 2.83583849e+01])
We can now estimate the model:
yhat = model(data.x, *popt)
Graphically, it leads to:
Which is quite satisfactory for the unconstrained fit but certainly does not fulfill the constraint you want to enforce.
Could you publish your data ready for coding as arrays in your post?