I am running a script on AWS (Ubuntu) EC2 instance. It's a web scraper that uses selenium/chromedriver and headless chrome to scrape some webpages. I've had this script running previously with no problems, but today I'm getting an error. Here's the script:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")
options.binary_location='/usr/bin/chromium-browser'
driver = webdriver.Chrome(chrome_options=options)
#Set base url (SAN FRANCISCO)
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
events = []
for i in range(1,90):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
When I run this script from ubuntu, I get this error:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 91, in <module>
driver = webdriver.Chrome(chrome_options=options)
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
I confirmed that I'm running the same version of Chromedriver/Chromium Browser:
ChromeDriver 79.0.3945.130 (e22de67c28798d98833a7137c0e22876237fc40a-refs/branch-heads/3945@{#1047})
Chromium 79.0.3945.130 Built on Ubuntu , running on Ubuntu 18.04
For what it's worth, I have this running on a mac, and I do have multiple web scraping scripts like this one running on the same EC2 instance (only 2 scripts so far, so not that much).
I'm now getting these errors as well when trying to run this script on ubuntu:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 852, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
^[[B File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
^[[B^[[A^[[A _stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 39, in <module>
res = requests.get(url)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
Finally, here's my currently monthly AWS usage, which doesn't show any memory quota being exceed.
This error message...
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
...implies that the operating system was unable to allocate memory to initiate/spawn a new session.
Additionally, this error message...
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
...implies that your program have successfully iterated till Page 5 and while on Page 6 you see this error.
I don't see any issues in your code block as such. I have taken your code, made some minor adjustments and here is the execution result:
Code Block:
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
for i in range(1,10):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
Console Output:
https://www.bandsintown.com/en/c/san-francisco-ca?page=1
https://www.bandsintown.com/en/c/san-francisco-ca?page=2
https://www.bandsintown.com/en/c/san-francisco-ca?page=3
https://www.bandsintown.com/en/c/san-francisco-ca?page=4
https://www.bandsintown.com/en/c/san-francisco-ca?page=5
https://www.bandsintown.com/en/c/san-francisco-ca?page=6
https://www.bandsintown.com/en/c/san-francisco-ca?page=7
https://www.bandsintown.com/en/c/san-francisco-ca?page=8
https://www.bandsintown.com/en/c/san-francisco-ca?page=9
This error is coming from subprocess.py
:
self.pid = _posixsubprocess.fork_exec(
args, executable_list,
close_fds, tuple(sorted(map(int, fds_to_keep))),
cwd, env_list,
p2cread, p2cwrite, c2pread, c2pwrite,
errread, errwrite,
errpipe_read, errpipe_write,
restore_signals, start_new_session, preexec_fn)
However, as per the discussion in OSError: [Errno 12] Cannot allocate memory this error OSError: [Errno 12] Cannot allocate memory
is related to RAM / SWAP.
Swap Space is the memory space in the system hard drive that has been designated as a place for the os to temporarily store data which it can no longer hold with in the RAM. This gives you the ability to increase the amount of data your program can keep in its working memory. The swap space on the hard drive will be used primarily when there is no longer sufficient space in RAM to hold in-use application data. However, the information written to I/O will be significantly slower than information kept in RAM, but the operating system will prefer to keep running application data in memory and use swap space for the older data. Deploying swap space as a fall back for when your system’s RAM is depleted is a safety measure against out-of-memory issues on systems with non-SSD storage available.
To check if the system already has some swap space available, you need to execute the following command:
$ sudo swapon --show
If you don’t get any output, that means your system does not have swap space available currently. You can also verify that there is no active swap using the free utility as follows:
$ free -h
If there is no active swap in the system you will see an output as:
Output
total used free shared buff/cache available
Mem: 488M 36M 104M 652K 348M 426M
Swap: 0B 0B 0B
In these cases you need to allocate space for swap to use as a separate partition devoted to the task and you can create a swap file that resides on an existing partition. To create a 1 Gigabyte file you need to execute the following command:
$ sudo fallocate -l 1G /swapfile
You can verify that the correct amount of space was reserved by executing the following command:
$ ls -lh /swapfile
#Output
$ -rw-r--r-- 1 root root 1.0G Mar 08 10:30 /swapfile
This confirms the swap file has been created with the correct amount of space set aside.
Once the correct size file is available we need to actually turn this into swap space. Now you need to lock down the permissions of the file so that only the users with specific privileges can read the contents. This prevents unintended users from being able to access the file, which would have significant security implications. So you need to follow the steps below:
Make the file only accessible to specific user e.g. root by executing the following command:
$ sudo chmod 600 /swapfile
Verify the permissions change by executing the following command:
$ ls -lh /swapfile
#Output
-rw------- 1 root root 1.0G Apr 25 11:14 /swapfile
This confirms only the root user has the read and write flags enabled.
Now you need to mark the file as swap space by executing the following command:
$ sudo mkswap /swapfile
#Sample Output
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=6e965805-2ab9-450f-aed6-577e74089dbf
Next you need to enable the swap file, allowing the system to start utilizing it executing the following command:
$ sudo swapon /swapfile
You can verify that the swap is available by executing the following command:
$ sudo swapon --show
#Sample Output
NAME TYPE SIZE USED PRIO
/swapfile file 1024M 0B -1
Finally check the output of the free
utility again to validate the settings by executing the following command:
$ free -h
#Sample Output
total used free shared buff/cache available
Mem: 488M 37M 96M 652K 354M 425M
Swap: 1.0G 0B 1.0G
Once the Swap Space has been set up successfully the underlying operating system will begin to use it as necessary.