pythonline-profiler

Python line profiler results inconsistent


Why is it that python code runs at drastically different speeds despite being the same code run twice?

Code

I was doing some profiling for a short Python code:

import urllib3

@profile
def download(url, file_path):
    http = urllib3.PoolManager()
    r = http.request("GET", url)
    print("FINISHED GET!")
    print("WRITING TO "+file_path)
    with open(file_path, "wb") as f:
        f.write(r.data)
    r.release_conn()

url = "http://interactivepaper.todayonline.com/jrsrc/260516/260516.pdf"

download(url, "")

Testing

I am using line_profiler with the command kernprof -l -v test.py. I tested this code multiple times and all the results are inconsistent.

Test 1:

FINISHED GET!
WRITING TO 
Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 44.653 s
File: test.py
Function: download at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def download(url, file_path):
     5         1          273    273.0      0.0      http = urllib3.PoolManager()
     6         1     44652667 44652667.0    100.0      r = http.request("GET", url)
     7         1           37     37.0      0.0      print("FINISHED GET!")
     8         1            4      4.0      0.0      print("WRITING TO "+file_path)
     9         1           29     29.0      0.0      with open(file_path, "wb") as f:
    10                                                   f.write(r.data)
    11                                               r.release_conn()
(There was an IO Error from here onwards as I used an empty string)

Test 2 (I edited the code):

FINISHED GET!
WRITING TO 
Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 44.6693 s
File: test.py
Function: download at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def download(url, file_path):
     5         1          186    186.0      0.0      http = urllib3.PoolManager()
     6         1     44669082 44669082.0    100.0      r = http.request("GET", url)
     7         1           42     42.0      0.0      print("FINISHED GET!")
     8         1            4      4.0      0.0      print("WRITING TO "+file_path)

Test 3:

FINISHED GET!
WRITING TO 
Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 4.53504 s
File: test.py
Function: download at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def download(url, file_path):
     5         1          262    262.0      0.0      http = urllib3.PoolManager()
     6         1      4534736 4534736.0    100.0      r = http.request("GET", url)
     7         1           37     37.0      0.0      print("FINISHED GET!")
     8         1            4      4.0      0.0      print("WRITING TO "+file_path)

This is the part that I found confusing. A process that took 44s to run at first now takes 4s to run. I also noted that whenever I edited the file, it would take a long time to run again. Here are three more tests proving my point:

First test after edit:

Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 49.7018 s
File: test.py
Function: download at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def download(url, file_path):
     5         1          187    187.0      0.0      http = urllib3.PoolManager()
     6         1     49701585 49701585.0    100.0      r = http.request("GET", url)

Second test after edit:

Timer unit: 1e-06 s

Total time: 9.10985 s
File: test.py
Function: download at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def download(url, file_path):
     5         1          185    185.0      0.0      http = urllib3.PoolManager()
     6         1      9109665 9109665.0    100.0      r = http.request("GET", url)

Third test after edit (its similar to second test):

Wrote profile results to test.py.lprof
Timer unit: 1e-06 s

Total time: 12.9593 s
File: test.py
Function: download at line 3

Line #      Hits         Time  Per Hit   % Time  Line Contents
==============================================================
     3                                           @profile
     4                                           def download(url, file_path):
     5         1          189    189.0      0.0      http = urllib3.PoolManager()
     6         1     12959072 12959072.0    100.0      r = http.request("GET", url)

Solution

  • The major difference is in the following line of code:

    r = http.request("GET", url)
    

    In this line you are trying to access remote web server.

    The following reasons might result with different access time to the Web Server:

    1) Caching

    2) Network load

    3) Remote server load