I am trying to convert this dataset: COCOMO81 to arff.
Before converting to .arff, I am trying to convert it to .csv
I am following this LINK to do this.
I got that dataset from promise site. I copied the entire page to notepad as cocomo81.txt and now I am trying to convert that cocomo81.txt file to .csv using python. (I intend to convert the .csv file to .arff later using weka)
However, when I run
import pandas as pd
read_file = pd.read_csv(r"cocomo81.txt")
I get THIS ParserError.
To fix this, I followed this solution and modified my command to
read_file = pd.read_csv(r"cocomo81.txt",on_bad_lines='warn')
I got a bunch of warnings - you can see what it looks like here
and then I ran
read_file.to_csv(r'.\cocomo81csv.csv',index=None)
But it seems that the fix for ParserError didn't work in my case because my cocomo81csv.csv file looks like THIS in Excel.
Can someone please help me understand where I am going wrong and how can I use datasets from the promise repository in .arff format?
Looks like it's a csv file with comments as the first lines. The comment lines are indicated by %
characters, but also @
(?), and the actual csv data starts at line 230.
You should skip the first rows and manually set the column names, try something like this:
# set column names manually
col_names = ["rely", "data", "cplx", "time", "stor", "virt", "turn", "acap", "aexp", "pcap", "vexp", "lexp", "modp", "tool", "sced", "loc", "actual" ]
filename = "cocomo81.arff.txt"
# read csv data
df = pd.read_csv(filename, skiprows=229, sep=',', decimal='.', header=None, names=col_names)
print(df)