I need to extract seq_00034
from a file path like
file = "/home/user/workspace/data/seq_00034.pkl"
I know 2 ways to achieve it:
import os
seq_name = os.path.basename(file).split(".")[0]
or
seq_name = file.split("/")[-1].split(".")[0]
Which is safer/faster?
(taking the cost of import os
into account)
Is there a more elegent way to extract seq_name
from given path?
It turns out split twice
(i.e. Method B) is faster than os.path + split
.
They are both significantly faster than using pathlib
speed test:
import os
import pathlib
import time
given_path = "/home/home/user/workspace/data/task_2022_02_xx_xx_xx_xx.pkl"
time1 = time.time()
for _ in range(10000):
seq_name = given_path.split("/")[-1].split(".")[0]
print(time.time()-time1, 'time of split')
time2 = time.time()
for _ in range(10000):
seq_name = pathlib.Path(given_path).stem
print(time.time()-time2, 'time of pathlib')
time3 = time.time()
for _ in range(10000):
seq_name = os.path.basename(given_path).split(".")[0]
print(time.time()-time3, 'time of os.path')
result (on my PC) is:
0.00339508056640625 time of split
0.0355381965637207 time of pathlib
0.005405426025390625 time of os.path
if we take the time consumed for importing into account, split twice (i.e. Method B) is still the fastest
(assume the code is only called once)
time1 = time.time()
seq_name = given_path.split("/")[-1].split(".")[0]
print(time.time()-time1, 'time of split')
time2 = time.time()
import pathlib
seq_name = pathlib.Path(given_path).stem
print(time.time()-time2, 'time of pathlib')
time3 = time.time()
import os
seq_name = os.path.basename(given_path).split(".")[0]
print(time.time()-time3, 'time of os.path')
speed test result:
0.000001430511474609375 time of split
0.003416776657104492 time of pathlib
0.0000030994415283203125 time of os.path