python-3.xselenium-webdriveramazon-ec2selenium-chromedriverubuntu-14.04

Python script on ubuntu - OSError: [Errno 12] Cannot allocate memory


I am running a script on AWS (Ubuntu) EC2 instance. It's a web scraper that uses selenium/chromedriver and headless chrome to scrape some webpages. I've had this script running previously with no problems, but today I'm getting an error. Here's the script:

options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")

options.binary_location='/usr/bin/chromium-browser'
driver = webdriver.Chrome(chrome_options=options)


#Set base url (SAN FRANCISCO)
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
      
events = []
    
for i in range(1,90):
    #cycle through pages in range
    driver.get(base_url + str(i))
    pageURL = base_url + str(i)
    print(pageURL)

When I run this script from ubuntu, I get this error:

  Traceback (most recent call last):
  File "BandsInTown_Scraper_SF.py", line 91, in <module>
    driver = webdriver.Chrome(chrome_options=options)
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
    self.service.start()
  File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
    stdin=PIPE)
  File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
    restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory

I confirmed that I'm running the same version of Chromedriver/Chromium Browser:

ChromeDriver 79.0.3945.130 (e22de67c28798d98833a7137c0e22876237fc40a-refs/branch-heads/3945@{#1047})


Chromium 79.0.3945.130 Built on Ubuntu , running on Ubuntu 18.04

For what it's worth, I have this running on a mac, and I do have multiple web scraping scripts like this one running on the same EC2 instance (only 2 scripts so far, so not that much).

Update

I'm now getting these errors as well when trying to run this script on ubuntu:

    Traceback (most recent call last):
      File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
        (self.host, self.port), self.timeout, **extra_kw)
      File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
        for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
      File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
        for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
    socket.gaierror: [Errno -3] Temporary failure in name resolution


     During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
    self._validate_conn(conn)
  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 852, in _validate_conn
    conn.connect()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 284, in connect
    conn = self._new_conn()
  File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
        timeout=timeout
    ^[[B  File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
    ^[[B^[[A^[[A    _stacktrace=sys.exc_info()[2])
      File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
        raise MaxRetryError(_pool, url, error or ResponseError(cause))
    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "BandsInTown_Scraper_SF.py", line 39, in <module>
        res = requests.get(url)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 75, in get
        return request('get', url, params=params, **kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 60, in request
        return session.request(method=method, url=url, **kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
        resp = self.send(prep, **send_kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
        r = adapter.send(request, **kwargs)
      File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
        raise ConnectionError(e, request=request)
    requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))

Finally, here's my currently monthly AWS usage, which doesn't show any memory quota being exceed.

enter image description here


Solution

  • This error message...

        restore_signals, start_new_session, preexec_fn)
    OSError: [Errno 12] Cannot allocate memory
    

    ...implies that the operating system was unable to allocate memory to initiate/spawn a new session.

    Additionally, this error message...

    urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
    

    ...implies that your program have successfully iterated till Page 5 and while on Page 6 you see this error.


    I don't see any issues in your code block as such. I have taken your code, made some minor adjustments and here is the execution result:


    Deep dive

    This error is coming from subprocess.py:

    self.pid = _posixsubprocess.fork_exec(
        args, executable_list,
        close_fds, tuple(sorted(map(int, fds_to_keep))),
        cwd, env_list,
        p2cread, p2cwrite, c2pread, c2pwrite,
        errread, errwrite,
        errpipe_read, errpipe_write,
        restore_signals, start_new_session, preexec_fn)
    

    However, as per the discussion in OSError: [Errno 12] Cannot allocate memory this error OSError: [Errno 12] Cannot allocate memory is related to RAM / SWAP.


    Swap Space

    Swap Space is the memory space in the system hard drive that has been designated as a place for the to temporarily store data which it can no longer hold with in the RAM. This gives you the ability to increase the amount of data your program can keep in its working . The swap space on the hard drive will be used primarily when there is no longer sufficient space in RAM to hold in-use application data. However, the information written to I/O will be significantly slower than information kept in RAM, but the operating system will prefer to keep running application data in memory and use swap space for the older data. Deploying swap space as a fall back for when your system’s RAM is depleted is a safety measure against out-of-memory issues on systems with non-SSD storage available.


    System Check

    To check if the system already has some swap space available, you need to execute the following command:

    $ sudo swapon --show
    

    If you don’t get any output, that means your system does not have swap space available currently. You can also verify that there is no active swap using the free utility as follows:

    $ free -h
    

    If there is no active swap in the system you will see an output as:

    Output
                   total        used       free        shared      buff/cache  available
    Mem:           488M         36M        104M        652K        348M        426M
    Swap:            0B          0B          0B
    

    Creating Swap File

    In these cases you need to allocate space for swap to use as a separate partition devoted to the task and you can create a swap file that resides on an existing partition. To create a 1 Gigabyte file you need to execute the following command:

    $ sudo fallocate -l 1G /swapfile
    

    You can verify that the correct amount of space was reserved by executing the following command:

    $ ls -lh /swapfile
    
    #Output
    $ -rw-r--r-- 1 root root 1.0G Mar 08 10:30 /swapfile
    

    This confirms the swap file has been created with the correct amount of space set aside.


    Enabling the Swap Space

    Once the correct size file is available we need to actually turn this into swap space. Now you need to lock down the permissions of the file so that only the users with specific privileges can read the contents. This prevents unintended users from being able to access the file, which would have significant security implications. So you need to follow the steps below:


    Conclusion

    Once the Swap Space has been set up successfully the underlying operating system will begin to use it as necessary.