I am working on a Python script that loads several CSV files containing timestamps and ping data and then displays them on a plot. The X-axis is supposed to display the timestamps in HH:MM format, with the timestamps coming from multiple CSV files that record different ping values for different addresses.
The challenge is that I only want to display a limited number of timestamps for the X axis, e.g. 10-12 timestamps, based on the number of data points in the CSV files. I also want to ensure that the X-axis is correctly labeled with the appropriate timestamps and associated ping values.
Problem: The plot shows the data, but the timestamps on the X-axis are not correct and too few ticks appear. Only the first timestamp is displayed and only 8 ticks are generated on the X-axis.
In addition, the X-axis ticks do not seem to match the timestamps from the data correctly, which affects the readability of the plot.
Goal: The X-axis should correctly display timestamps in the format HH:MM:SS for all addresses from the CSV files.
I would like to have a limited number of timestamps (approx. 10-12) on the X-axis based on the data points in the CSV files.
It is important to mention that the information for the plot is stored in x_labels and x_positions. 11 subdivisions are also correctly created and saved for 99 data records, but these are still displayed incorrectly.
Example:
x_positions: [0.0, 2.55, 5.1, 7.65, 10.216666666666667, 12.766666666666667, 15.316666666666666, 17.866666666666667, 20.416666666666668, 22.983333333333334, 25.533333333333335]
x_labels: ['17:24:43', '17:27:16', '17:29:49', '17:32:22', '17:34:56', '17:37:29', '17:40:02', '17:42:35', '17:45:08', '17:47:42', '17:50:15']
This is the picture I get, but it should have 11 dividing lines on the X axis and all of them should be labeled
Here is some test Data, I store in the csv:
Time,Ping (ms)
17:24:43,0.1
17:25:00,0.2
17:25:17,0.23
17:25:34,0.12
17:25:51,0.23
17:26:08,0.123
17:26:25,0.321
17:26:42,0.231
Here is My Code:
import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime, timedelta
# Funktion zum Laden der Daten aus den CSV-Dateien
def load_data(folder):
data = {}
for root, dirs, files in os.walk(folder):
for file in files:
if file.endswith(".csv"):
address = file.replace('_', '.').replace('.csv', '')
file_path = os.path.join(root, file)
df = pd.read_csv(file_path)
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
df['Ping (ms)'] = df['Ping (ms)'].apply(lambda x: 0 if x == 0 else x)
data[address] = df
return data
# Funktion zum Erstellen des Plots
def plot_data(data):
plt.figure(figsize=(14, 8))
colors = generate_colors(len(data))
# Bestimme die Anzahl der Datenpunkte für eine einzelne Adresse
df = next(iter(data.values())) # Wähle den ersten DataFrame aus
total_data_points = len(df)
# Berechne den dif-Wert
dif = total_data_points // 10
if dif < 1:
dif = 1
# Sammle alle Zeitstempel für die X-Achse
x_labels = []
x_positions = []
for i in range(0, len(df), dif):
time = df['Time'].iloc[i]
x_labels.append(time.strftime('%H:%M:%S'))
x_positions.append((time - min(df['Time'])).total_seconds() / 60)
# Plotten der Ping-Daten für jede Adresse
for idx, (address, df) in enumerate(data.items()):
df['Time_diff'] = (df['Time'] - min(df['Time'])).dt.total_seconds() / 60
mask_timeout = df['Ping (ms)'] == 0
mask_normal = ~mask_timeout
plt.plot(df['Time_diff'][mask_normal], df['Ping (ms)'][mask_normal], label=address, color=colors[idx % len(colors)])
plt.plot(df['Time_diff'][mask_timeout], df['Ping (ms)'][mask_timeout], color='r', lw=2)
# Anpassen der X-Achse
plt.xticks(x_positions, x_labels, rotation=45, ha='right')
plt.xlabel('Time')
plt.ylabel('Ping (ms)')
plt.title('Ping Times for Different Addresses')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()
def generate_colors(n):
colors = []
for i in range(n):
hue = i / n
colors.append(plt.cm.hsv(hue))
return colors
# Main-Funktion
def main():
data_folder = input("Bitte geben Sie den Pfad zum Ordner mit den CSV-Dateien ein: ")
if not os.path.exists(data_folder):
print(f"Der Ordner {data_folder} existiert nicht.")
return
data = load_data(data_folder)
plot_data(data)
if __name__ == "__main__":
main()
Every time you add a new plot, a new axis is added for both 'x' and 'y'. And I'm unsure if you can control which axis will be on top. so the workaround that I can think about is to set the ticks param for the 'x' axis every time you add a new plot:
for idx, (address, df) in enumerate(data.items()):
df['Time_diff'] = (df['Time'] - min(df['Time'])).dt.total_seconds() / 60
mask_timeout = df['Ping (ms)'] == 0
mask_normal = ~mask_timeout
plt.plot(df['Time_diff'][mask_normal], df['Ping (ms)'][mask_normal], label=address, color=colors[idx % len(colors)])
plt.tick_params(axis='x', which='both', labelbottom=False)
plt.plot(df['Time_diff'][mask_timeout], df['Ping (ms)'][mask_timeout], color='r', lw=2)
plt.tick_params(axis='x', which='both', labelbottom=False)
And set it back to true (in my example for the minor) right after you set your xticks:
plt.xticks(x_positions, x_labels, rotation=45, ha='right')
plt.tick_params(axis='x', which='minor', labelbottom=True)
I believe it's true for your 'y' axis as well (your graph shows that Google pings better than your local devices).