pythoncsvmatplotlibplot

Problems plotting timestamps on the x-axis with Matplotlib


I am working on a Python script that loads several CSV files containing timestamps and ping data and then displays them on a plot. The X-axis is supposed to display the timestamps in HH:MM format, with the timestamps coming from multiple CSV files that record different ping values for different addresses.

The challenge is that I only want to display a limited number of timestamps for the X axis, e.g. 10-12 timestamps, based on the number of data points in the CSV files. I also want to ensure that the X-axis is correctly labeled with the appropriate timestamps and associated ping values.

Problem: The plot shows the data, but the timestamps on the X-axis are not correct and too few ticks appear. Only the first timestamp is displayed and only 8 ticks are generated on the X-axis.

In addition, the X-axis ticks do not seem to match the timestamps from the data correctly, which affects the readability of the plot.

Goal: The X-axis should correctly display timestamps in the format HH:MM:SS for all addresses from the CSV files.

I would like to have a limited number of timestamps (approx. 10-12) on the X-axis based on the data points in the CSV files.

It is important to mention that the information for the plot is stored in x_labels and x_positions. 11 subdivisions are also correctly created and saved for 99 data records, but these are still displayed incorrectly.

Example: x_positions: [0.0, 2.55, 5.1, 7.65, 10.216666666666667, 12.766666666666667, 15.316666666666666, 17.866666666666667, 20.416666666666668, 22.983333333333334, 25.533333333333335] x_labels: ['17:24:43', '17:27:16', '17:29:49', '17:32:22', '17:34:56', '17:37:29', '17:40:02', '17:42:35', '17:45:08', '17:47:42', '17:50:15']

This is the picture I get, but it should have 11 dividing lines on the X axis and all of them should be labeled enter image description here

Here is some test Data, I store in the csv:

Time,Ping (ms)
17:24:43,0.1
17:25:00,0.2
17:25:17,0.23
17:25:34,0.12
17:25:51,0.23
17:26:08,0.123
17:26:25,0.321
17:26:42,0.231

Here is My Code:

import os
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from datetime import datetime, timedelta

# Funktion zum Laden der Daten aus den CSV-Dateien
def load_data(folder):
    data = {}
    for root, dirs, files in os.walk(folder):
        for file in files:
            if file.endswith(".csv"):
                address = file.replace('_', '.').replace('.csv', '')
                file_path = os.path.join(root, file)
                df = pd.read_csv(file_path)
                df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
                df['Ping (ms)'] = df['Ping (ms)'].apply(lambda x: 0 if x == 0 else x)
                data[address] = df
    return data

# Funktion zum Erstellen des Plots
def plot_data(data):
    plt.figure(figsize=(14, 8))
    colors = generate_colors(len(data))

    # Bestimme die Anzahl der Datenpunkte für eine einzelne Adresse
    df = next(iter(data.values()))  # Wähle den ersten DataFrame aus
    total_data_points = len(df)

    # Berechne den dif-Wert
    dif = total_data_points // 10
    if dif < 1:
        dif = 1

    # Sammle alle Zeitstempel für die X-Achse
    x_labels = []
    x_positions = []
    for i in range(0, len(df), dif):
        time = df['Time'].iloc[i]
        x_labels.append(time.strftime('%H:%M:%S'))
        x_positions.append((time - min(df['Time'])).total_seconds() / 60)

    # Plotten der Ping-Daten für jede Adresse
    for idx, (address, df) in enumerate(data.items()):
        df['Time_diff'] = (df['Time'] - min(df['Time'])).dt.total_seconds() / 60
        mask_timeout = df['Ping (ms)'] == 0
        mask_normal = ~mask_timeout

        plt.plot(df['Time_diff'][mask_normal], df['Ping (ms)'][mask_normal], label=address, color=colors[idx % len(colors)])
        plt.plot(df['Time_diff'][mask_timeout], df['Ping (ms)'][mask_timeout], color='r', lw=2)

    # Anpassen der X-Achse
    plt.xticks(x_positions, x_labels, rotation=45, ha='right')

    plt.xlabel('Time')
    plt.ylabel('Ping (ms)')
    plt.title('Ping Times for Different Addresses')
    plt.legend()
    plt.grid(True)
    plt.tight_layout()
    plt.show()

def generate_colors(n):
    colors = []
    for i in range(n):
        hue = i / n
        colors.append(plt.cm.hsv(hue))
    return colors

# Main-Funktion
def main():
    data_folder = input("Bitte geben Sie den Pfad zum Ordner mit den CSV-Dateien ein: ")
    if not os.path.exists(data_folder):
        print(f"Der Ordner {data_folder} existiert nicht.")
        return

    data = load_data(data_folder)
    plot_data(data)

if __name__ == "__main__":
    main()

Solution

  • Every time you add a new plot, a new axis is added for both 'x' and 'y'. And I'm unsure if you can control which axis will be on top. so the workaround that I can think about is to set the ticks param for the 'x' axis every time you add a new plot:

    for idx, (address, df) in enumerate(data.items()):
        df['Time_diff'] = (df['Time'] - min(df['Time'])).dt.total_seconds() / 60
        mask_timeout = df['Ping (ms)'] == 0
        mask_normal = ~mask_timeout
    
        plt.plot(df['Time_diff'][mask_normal], df['Ping (ms)'][mask_normal], label=address, color=colors[idx % len(colors)])
        plt.tick_params(axis='x', which='both', labelbottom=False)
        plt.plot(df['Time_diff'][mask_timeout], df['Ping (ms)'][mask_timeout], color='r', lw=2)
        plt.tick_params(axis='x', which='both', labelbottom=False)
    

    And set it back to true (in my example for the minor) right after you set your xticks:

    plt.xticks(x_positions, x_labels, rotation=45, ha='right')
    plt.tick_params(axis='x', which='minor', labelbottom=True)
    

    I believe it's true for your 'y' axis as well (your graph shows that Google pings better than your local devices).