I have two questions that I put in one help request. so I hope it will not make this overcrowded.
I spend quite some time figuring this out but not successful so far. I am trying to plot only points from a series of data that are close to each other so not the transitions (see below plot). Probably I need an if condition that says if x(2)-x(1)<0.005 plot and if not do not plot. Later I want to do a linear regression on these points (thats why I want to exclude transitions). Can you please help me how to do the plotting with this condition and do linear regression.
This is my code:
# which value you want to use or plot reading from log data
desired_field1= "x y Box"
desired_value1 = "x1 [um]"
desired_field2= "x y Box"
desired_value2 = "x3 [um]"
desired_field3= "LM Position"
desired_value3 = "Z [um]"
# extracting desired data from logging
data=pd.read_excel(r"test2.xlsx", sheet_name='Sheet1')
data = data[(data['_time'] < '2024-05-21T09:49:37.6089875Z') & (data['_time'] > '2024-05-21T09:43:31.7141954Z')] #selecting desired time interval
data_measurement1 = data.loc[data['_measurement'] == desired_field1]
data_field1 = data_measurement1.loc[data['_field'] == desired_value1]
data_measurement2 = data.loc[data['_measurement'] == desired_field2]
data_field2 = data_measurement2.loc[data['_field'] == desired_value2]
data_measurement3 = data.loc[data['_measurement'] == desired_field3]
data_field3 = data_measurement3.loc[data['_field'] == desired_value3]
values1 = list(data_field1['_value']) #values we are interested in
values2 = list(data_field2['_value'])
values3 = list(data_field3['_value'])
#....
mean_xs = [(g + h) / 2 for g, h in zip(values1, values2)]
LM_mean = [50-x for x in mean_xs]
#start plotting
data_field1['_time'] = pd.to_datetime(data_field1['_time'].str.split().str[-1])
data_field2['_time'] = pd.to_datetime(data_field2['_time'].str.split().str[-1])
data_field3['_time'] = pd.to_datetime(data_field3['_time'].str.split().str[-1])
plt.plot(data_field1['_time'], values1, '-', label = desired_value1)
plt.plot(data_field2['_time'], values2, '-', label = desired_value2 )
plt.plot(data_field3['_time'], values3, '-', label = desired_value3)
plt.xlabel('time [D hh:mm]')
plt.ylabel(' x [um] MCS')
plt.legend(loc='best')
plt.gca().yaxis.grid(True)
plt.figure()
plt.plot(LM_mean, values3, 'o')
example data:
9988 2024-05-21T09:46:00.1164445Z 1294.005333
9989 2024-05-21T09:46:01.1115275Z 1294.005333
9990 2024-05-21T09:46:02.1254956Z 1294.005667
9991 2024-05-21T09:46:03.1191685Z 1294.005667
9992 2024-05-21T09:46:04.1325494Z 1294.005333
9993 2024-05-21T09:46:05.1268794Z 1294.005333
9994 2024-05-21T09:46:06.1409297Z 1294.005333
9995 2024-05-21T09:46:07.1346292Z 1294.005000
9996 2024-05-21T09:46:08.1488069Z 1294.005333
9997 2024-05-21T09:46:09.1417524Z 1294.005333
9998 2024-05-21T09:46:10.1563002Z 1294.005333
9999 2024-05-21T09:46:11.1692835Z 1294.005333
10000 2024-05-21T09:46:12.1642492Z 1332.747333
10001 2024-05-21T09:46:13.1977216Z 1344.011333
10002 2024-05-21T09:46:14.1926256Z 1344.012000
10003 2024-05-21T09:46:15.2062685Z 1344.011667
10004 2024-05-21T09:46:16.2200463Z 1344.011667
10005 2024-05-21T09:46:17.2339343Z 1344.012000
10006 2024-05-21T09:46:18.2479639Z 1344.012000
10007 2024-05-21T09:46:19.2405515Z 1344.012000
10008 2024-05-21T09:46:20.2556817Z 1344.012000
I have tried searching for this but was not successful
Update on how I solved this, note I have two time series and wanted to plot them against each other for same timestep:
idxSteps = np.array(np.where(np.abs(x.diff()) > 0.05))
idxSteps2 = np.array(np.where(np.abs(y.diff()) > 0.05))
previousStep = 0 # init the previous step
xIdx = list()
xMid = list() # init list for x
yMid = list() # ==/== for y
i=0
for currentStep in idxSteps[0]:# for every step detected
if currentStep != previousStep:
if currentStep in idxSteps2[0]:# and if they are not the same as previousStep
dummyX = x[previousStep:currentStep] # get the data between these two steps
dummyY = y[previousStep:currentStep] # ==/==
xIdx.append((previousStep+currentStep)//2)
xMid.append(dummyX.mean())
yMid.append(dummyY.mean()) # append the statistical mode (most recurring value)
previousStep = currentStep
assign previousStep as currentStep for the next loop
Sorry for taking too long, I hope I can assist you further with the following adjusted answer.
Here's the code as a dump, I will explain the code step by step in the section after the code:
%matplotlib notebook
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.dates as mdates
df = pd.read_excel("Data.xlsx")
df["Datetime"] = pd.to_datetime(df["Datetime"])
idxSteps = np.array(np.where(np.abs(df["Data"].diff()) > 20)) # search for drops as the abs of the difference
previousStep = 0 # init the previous step
xIdx = list()
xMid = list() # init list for x
yMid = list() # ==/== for y
for currentStep in [idxSteps[0], len(df.index)]: # for every step detected
if not isinstance(currentStep, int):
currentStep= currentStep[0]
if currentStep != previousStep: # and if they are not the same as previousStep
dummyX = df["Datetime"].loc[previousStep:currentStep] # get the data between these two steps
dummyY = df["Data"].loc[previousStep:currentStep] # ==/==
xIdx.append((previousStep+currentStep)//2)
xMid.append(dummyX.mean()) # append to the average of the date to the x list
yMid.append(dummyY.mode()) # append the statistical mode (most recurring value)
previousStep = currentStep # assign previousStep as currentStep for the next loop
plt.figure()
plt.plot(df["Datetime"], df["Data"])
plt.scatter(xMid, yMid, marker = "x", color = "red")
fit = np.polyfit(xIdx, yMid, 1) # fit
plt.plot(df["Datetime"], np.polyval(fit, df["Datetime"].index), "k--") # plot fit
locator = mdates.AutoDateLocator(minticks = 3, maxticks = 7)
formatter = mdates.ConciseDateFormatter(locator)
plt.gca().xaxis.set_major_locator(locator)
plt.gca().xaxis.set_major_formatter(formatter)
plt.grid()
To start off, I saved the data you included in the post in an excel file under the following structure:
I can read this with the pd.read_excel()
function, keeping in mind that the datetime column needs to be converted with pd.to_datetime()
Then we basically do the following:
for currentStep in [idxSteps[0], len(df.index)]
previousStep = currentStep
and then the loop start againSome important information:
min()
. Alternatively, at the end, go with max()
Here's how the end result looks like:
A note to you: please read and understand my code. Don't just copy paste and hope for the best. Check for the intermediate data that I am generating with the code.
This code so far only works for the section of data that you posted. It might not work on others, which is why you understanding how this solution was thought out is important! So you can come up with your own solution!