I'm relativly new to tensorflow and therefore I'm struggling with the data preparation.
I have a folder with about 500 .txt
files. Each of these files contain the data and a label of the data. (The data represents MFCCs, which are audio features that get generated for each "frame" of a .wav audio file.)
Each of these files look like this:
1
1.013302233064514191e+01
-1.913611804400369110e+01
1.067932213100989847e+00
1.308777013246182364e+01
-3.591032944037165109e+00
1.294307486784356698e+01
5.628056691023937574e+00
5.311223121033092909e+00
1.069261850699697014e+01
4.398722698218969995e+00
5.045254154360372389e+00
7.757820364628694954e+00
-2.666228281486863416e+00
9.236707894117541784e+00
-1.727334954006132151e+01
5.166050472560470119e+00
6.421742650353079007e+00
2.550240091606466031e+00
9.871269941885440602e+00
7.594591526898561984e-01
-2.877228968309437196e+00
5.592507658015017924e-01
8.828475996369435919e+00
2.946838169848354561e+00
8.420693074096489150e-01
7.032494888004835687e+00
...
In the first line of each file, I got the label of the data (in this case 1). In the rest of the file, I got 13 numbers representing 13 MFCCs for each frame. Each frame MFCCs are separated with a newline.
So my question would be whats an easy way of getting the content of all these files into tensors so tensorflow can use them?
Thanks!
Not sure if this is the Optimized way of doing but this can be done as explained in the steps below:
Text File
and append its data to a List
'\n'
in each element with ','
because our goal is to create CSV
out of itCSV File
Tensorflow Dataset
using tf.data.experimental.make_csv_dataset. Please find this Tutorial on how to convert CSV File
to Tensorflow Dataset
.Code which performs First Three Steps mentioned above is given below:
import os
import pandas as pd
# The Folder where all the Text Files are present
Path_Of_Text_Files = '/home/mothukuru/Jupyter_Notebooks/Stack_Overflow/Text_Files'
List_of_Files = os.listdir(Path_Of_Text_Files)
List_Of_Elements = []
# Iterate through each Text File and append its data to a List
for EachFile in List_of_Files:
with open(os.path.join(Path_Of_Text_Files, EachFile), 'r') as FileObj:
List_Of_Elements.append(FileObj.readlines())
# Below code is to remove '\n' at the end of each Column
for i in range(len(List_Of_Elements)):
List_Of_Elements[i] = [sub.replace('\n', ',') for sub in List_Of_Elements[i]]
Column_Names = ['Label,', 'F1,', 'F2,', 'F3,', 'F4,', 'F5,', 'F6,', 'F7,',
'F8,', 'F9,', 'F10,', 'F11,', 'F12,', 'F13']
# Write the Data in the List, List_Of_Elements to a CSV File
with open(os.path.join(Path_Of_Text_Files, 'Final_Data.csv'), 'w') as FileObj:
FileObj.writelines(Column_Names)
for EachElement in List_Of_Elements:
with open(os.path.join(Path_Of_Text_Files, 'Final_Data.csv'), 'a') as FileObj:
FileObj.write('\n')
FileObj.writelines(EachElement)
Path_Of_Final_CSV = os.path.join(Path_Of_Text_Files, 'Final_Data.csv')
Data = pd.read_csv(Path_Of_Final_CSV, index_col = False)
To check if our Data is Fine, print(Data.head())
will output the below data: