pythonscientific-computing

How do I plot data from multiple CSVs each with different column numbers


The file has no header but I would need to select say columns, 5,9, 13, 17 etc against column 2 (time). How can this be achieved in the case where the headers are present as well. Edit : Each file contains data for one day, the time format is GPS time which is the YR,Day of YR and Sec since midnight. How can i plot for say 1=30 January 2019? Here is one code i tried


    import numpy as np
    import glob,os
    import matplotlib.pyplot as plt

    files = glob.glob('*.s4')
    #print(files)
    for file in files:
        f=np.loadtxt(file,skiprows=3)
        #print(file[0:9].upper())
        for i in range (5,50,4):
            t=f[:,2]/3600.;s4=f[:,i]
            pos= np.where (t)[0]
            pos1=np.where(s4[pos]<0.15)[0];s4[pos1]='nan'
            plt.scatter(t,s4)
            #print(len(s4))
            plt.xticks(np.arange(0, 26, 2)) 
            #plt.title(str(i))
    plt.show()

The problem is that particular code only plots for one day at a time.
Here is a sample of the data.


19 001    45 11  1 0.07 214.9 37.5  8 0.08 314.5 34.2 10 0.14 102.6 14.3 11 0.07 241.2 49.6 14 0.07 152.0 50.0 18 0.05 212.7 68.0 22 0.08 226.1 33.7 27 0.06 346.0 22.0 31 0.04  63.5 47.7 32 0.06 144.3 30.4 138 0.09 282.0 17.8
19 001   105 11  1 0.05 214.9 37.9  8 0.07 314.9 33.8 10 0.24 102.2 14.1 11 0.07 241.7 49.9 14 0.06 151.9 49.6 18 0.06 213.0 68.4 22 0.12 225.7 34.0 27 0.06 346.2 21.7 31 0.04  64.1 47.9 32 0.06 144.2 30.0 138 0.09 282.0 17.8
19 001   165 11  1 0.06 214.9 38.4  8 0.11 315.3 33.5 10 0.12 101.8 13.9 11 0.06 242.3 50.1 14 0.06 151.8 49.1 18 0.05 213.4 68.9 22 0.07 225.2 34.2 27 0.11 346.5 21.3 31 0.04  64.8 48.2 32 0.10 144.0 29.6 138 0.09 282.0 17.8
19 001   225 11  1 0.06 214.9 38.8  8 0.06 315.8 33.2 10 0.10 101.4 13.7 11 0.06 242.8 50.4 14 0.05 151.7 48.6 18 0.04 213.7 69.4 22 0.06 224.8 34.4 27 0.08 346.8 20.9 31 0.05  65.5 48.4 32 0.09 143.9 29.2 138 0.09 282.0 17.8
19 001   285 11  1 0.06 215.0 39.2  8 0.11 316.2 32.9 10 0.14 100.9 13.6 11 0.05 243.4 50.6 14 0.06 151.6 48.2 18 0.06 214.1 69.8 22 0.08 224.4 34.7 27 0.07 347.0 20.5 31 0.06  66.1 48.6 32 0.09 143.7 28.8 138 0.09 282.0 17.8
19 001   345 11  1 0.06 215.0 39.7  8 0.08 316.6 32.5 10 0.10 100.5 13.4 11 0.04 244.0 50.9 14 0.06 151.5 47.7 18 0.04 214.6 70.3 22 0.07 223.9 34.9 27 0.08 347.3 20.2 31 0.07  66.8 48.9 32 0.08 143.6 28.4 138 0.09 282.0 17.8
19 001   405 11  1 0.06 215.1 40.1  8 0.07 317.0 32.2 10 0.13 100.1 13.2 11 0.05 244.6 51.1 14 0.08 151.4 47.3 18 0.05 215.0 70.8 22 0.07 223.5 35.1 27 0.12 347.5 19.8 31 0.08  67.5 49.1 32 0.12 143.4 28.0 138 0.09 282.0 17.8
19 001   465 11  1 0.06 215.1 40.5  8 0.12 317.4 31.9 10 0.10  99.7 13.0 11 0.08 245.2 51.4 14 0.05 151.3 46.8 18 0.06 215.5 71.2 22 0.06 223.0 35.4 27 0.12 347.8 19.4 31 0.03  68.2 49.3 32 0.18 143.3 27.7 138 0.09 282.0 17.8
19 001   525 11  1 0.09 215.2 40.9  8 0.12 317.9 31.5 10 0.11  99.3 12.8 11 0.04 245.8 51.6 14 0.15 151.2 46.4 18 0.06 216.0 71.7 22 0.06 222.6 35.6 27 0.08 348.0 19.1 31 0.05  68.9 49.5 32 0.08 143.1 27.3 138 0.09 282.0 17.8
19 001   585 11  1 0.07 215.2 41.4  8 0.09 318.3 31.2 10 0.12  98.9 12.6 11 0.04 246.5 51.8 14 0.06 151.1 45.9 18 0.05 216.5 72.2 22 0.06 222.1 35.8 27 0.08 348.3 18.7 31 0.07  69.6 49.7 32 0.11 143.0 26.9 138 0.09 282.0 17.8


Solution

  • Assuming that a space character is the column separator, you can load them into a list of lists:

    data = []
    with open(datafile,'r') as file:
        for line in file:
           # splits into list based on white space separator
           data.append(line.split)
    

    Taking part of your example: to compare the values in column 2 with column 5 you could do:

    for line in data:
        if line[1] == line[4]:
           print("it's a match!")
    

    If you have a header you want to ignore, just skip the first line when you open the file:

    with open(datafile,'r') as file:
        # do nothing with this line
        header = f.readline()
       ...