pythonseleniumbrowsermob-proxy

Selenium + Browsermob Proxy, cannot capture bodies of some HTTPS responses


I need to capture the body of a response packet sent to my Selenium browser from BrowserMob proxy. BrowserMob is capturing the response body for only some requests (and not for the ones I need to capture) and I can't figure out why.

I am using Selenium Firefox via python. I have installed the BrowserMob certificates to Firefox.

Here is the code I'm using:

from browsermobproxy import Server
import time
import json

server = Server(path="./bmp/bin/browsermob-proxy", options={'port': 8090})
server.start()
time.sleep(1)
proxy = server.create_proxy()
time.sleep(1)

from selenium import webdriver
profile = webdriver.FirefoxProfile()
selenium_proxy = proxy.selenium_proxy()
profile.set_proxy(selenium_proxy)
driver = webdriver.Firefox(firefox_profile=profile)

proxy.new_har("name", options={'captureHeaders': True, 'captureContent': True})
driver.get("http://www.example.com")

# some clicking and key typing here...

time.sleep(20)
print(json.dumps(proxy.har))

server.stop()
driver.quit()

With the packet I'm trying to inspect the content of, I get this result:

"response": {
  "status": 200,
  "statusText": "OK",
  "httpVersion": "HTTP/1.1",
  "cookies": [],
  "headers": [
    {
      "name": "Date",
      "value": "Wed, 08 May 2019 19:25:35 GMT"
    },
    {
      "name": "Content-Type",
      "value": "application/json;charset=UTF-8"
    }
    // more headers...
  ],
  "content": {
    "size": 8888,
    "mimeType": "application/json;charset=UTF-8",
    "comment": ""
    // NO TEXT FIELD HERE
  },
  "redirectURL": "",
  "headersSize": 814,
  "bodySize": 8888,
  "comment": ""
}

while some other packets (that I'm not interested in) in the HAR file do indeed have a "text" field in their response.content.

EDIT: I solved this by using mitmproxy instead of browsermob. It includes a scripting mechanism which allows you to filter packets and run arbitrary python code on them.


Solution

  • You need to add "captureBinaryContent" in options:

    proxy.new_har("", options={'captureHeaders': True, 'captureContent': True, 'captureBinaryContent': True})