I have some profiling results from a python profiler, shown below:
Filename: main.py
Line # Mem usage Increment Line Contents
================================================
30 121.8 MiB 121.8 MiB @profile(stream=f)
31 def parse_data(data):
32 121.8 MiB 0.0 MiB Y=data["price"].values
33 121.8 MiB 0.0 MiB Y=np.log(Y)
34 121.8 MiB 0.0 MiB features=data.columns
35 121.8 MiB 0.0 MiB X1=list(set(features)-set(["price"]))
36 126.3 MiB 4.5 MiB X=data[X1].values
37 126.3 MiB 0.0 MiB ss=StandardScaler()
38 124.6 MiB 0.0 MiB X=ss.fit_transform(X)
39 124.6 MiB 0.0 MiB return X,Y
Filename: main.py
Line # Mem usage Increment Line Contents
================================================
41 127.1 MiB 127.1 MiB @profile(stream=f)
42 def linearRegressionfit(Xt,Yt,Xts,Yts):
43 127.1 MiB 0.0 MiB lr=LinearRegression()
44 131.2 MiB 4.1 MiB model=lr.fit(Xt,Yt)
45 132.0 MiB 0.8 MiB predict=lr.predict(Xts)
46
Now, I need to obtain these results for plotting and other purpose. But the text is not something very handy. The table shows line-by-line profiling results. How can I get a pandas dataframe or a tabular version which can be used to obtain any row or column from this table ?
P.S. I have visited regex
and parsimonious
but can't seem to get them to use in my case.
It is just a bit of parsing exercise. With standard split() and some minor adjustments, you can get a pretty clean data frame in a few lines of code.
txt = '''
Filename: main.py
Line # Mem usage Increment Line Contents
================================================
30 121.8 MiB 121.8 MiB @profile(stream=f)
31 def parse_data(data):
32 121.8 MiB 0.0 MiB Y=data["price"].values
33 121.8 MiB 0.0 MiB Y=np.log(Y)
34 121.8 MiB 0.0 MiB features=data.columns
35 121.8 MiB 0.0 MiB X1=list(set(features)-set(["price"]))
36 126.3 MiB 4.5 MiB X=data[X1].values
37 126.3 MiB 0.0 MiB ss=StandardScaler()
38 124.6 MiB 0.0 MiB X=ss.fit_transform(X)
39 124.6 MiB 0.0 MiB return X,Y
Filename: main.py
Line # Mem usage Increment Line Contents
================================================
41 127.1 MiB 127.1 MiB @profile(stream=f)
42 def linearRegressionfit(Xt,Yt,Xts,Yts):
43 127.1 MiB 0.0 MiB lr=LinearRegression()
44 131.2 MiB 4.1 MiB model=lr.fit(Xt,Yt)
45 132.0 MiB 0.8 MiB predict=lr.predict(Xts)
'''
import pandas as pd
lines = []
for line in txt.split('\n'):
#print(line)
if line.startswith('Filename'): continue
if line.startswith('Line'): continue
if line.startswith('='): continue
if line == '': continue
data = [i.strip() for i in line.split()]
#Fix def lines
if data[1] == 'def':
data = [data[0],'','','','',' '.join(data[1:4])]
data = [data[0], ' '.join(data[1:3]), ' '.join(data[3:5]), data[-1]]
lines.append(data)
df = pd.DataFrame(lines, columns=['Line #', 'Mem usage', 'Increment','Line Contents'])
print(df)
Line # Mem usage Increment Line Contents
0 30 121.8 MiB 121.8 MiB @profile(stream=f)
1 31 def parse_data(data):
2 32 121.8 MiB 0.0 MiB Y=data["price"].values
3 33 121.8 MiB 0.0 MiB Y=np.log(Y)
4 34 121.8 MiB 0.0 MiB features=data.columns
5 35 121.8 MiB 0.0 MiB X1=list(set(features)-set(["price"]))
6 36 126.3 MiB 4.5 MiB X=data[X1].values
7 37 126.3 MiB 0.0 MiB ss=StandardScaler()
8 38 124.6 MiB 0.0 MiB X=ss.fit_transform(X)
9 39 124.6 MiB 0.0 MiB X,Y
10 41 127.1 MiB 127.1 MiB @profile(stream=f)
11 42 def linearRegressionfit(Xt,Yt,Xts,Yts):
12 43 127.1 MiB 0.0 MiB lr=LinearRegression()
13 44 131.2 MiB 4.1 MiB model=lr.fit(Xt,Yt)
14 45 132.0 MiB 0.8 MiB predict=lr.predict(Xts)
You can then split the data frame when '@profile'
is in 'Line Contents'
.
For example:
split_idx = df[df['Line Contents'].str.startswith('@profile')].index
dataframes = []
for i, idx in enumerate(split_idx):
try:
dataframes.append(df.iloc[idx, split_idx[i+1]])
except IndexError:
dataframes.append(df.iloc[idx:])
print(dataframes[0])
print('======')
print(dataframes[1])
Line # Mem usage Increment Line Contents
0 30 121.8 MiB 121.8 MiB @profile(stream=f)
1 31 def parse_data(data):
2 32 121.8 MiB 0.0 MiB Y=data["price"].values
3 33 121.8 MiB 0.0 MiB Y=np.log(Y)
4 34 121.8 MiB 0.0 MiB features=data.columns
5 35 121.8 MiB 0.0 MiB X1=list(set(features)-set(["price"]))
6 36 126.3 MiB 4.5 MiB X=data[X1].values
7 37 126.3 MiB 0.0 MiB ss=StandardScaler()
8 38 124.6 MiB 0.0 MiB X=ss.fit_transform(X)
9 39 124.6 MiB 0.0 MiB X,Y
10 41 127.1 MiB 127.1 MiB @profile(stream=f)
11 42 def linearRegressionfit(Xt,Yt,Xts,Yts):
12 43 127.1 MiB 0.0 MiB lr=LinearRegression()
13 44 131.2 MiB 4.1 MiB model=lr.fit(Xt,Yt)
14 45 132.0 MiB 0.8 MiB predict=lr.predict(Xts)
======
Line # Mem usage Increment Line Contents
10 41 127.1 MiB 127.1 MiB @profile(stream=f)
11 42 def linearRegressionfit(Xt,Yt,Xts,Yts):
12 43 127.1 MiB 0.0 MiB lr=LinearRegression()
13 44 131.2 MiB 4.1 MiB model=lr.fit(Xt,Yt)
14 45 132.0 MiB 0.8 MiB predict=lr.predict(Xts)