The script bellow is supposed to send the data to the url when the Google Compute Engine instance (using Container-Optimized OS image) is started and the dockerized app working. Unfortunately, even if it fails to post the data, the data is received when the app is working.
The output is:
('Error', ConnectionError(MaxRetryError("HTTPConnectionPool(host='34.7.8.8', port=12345): Max retries exceeded with url: /didi.json (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 111] Connection refused',))",),))
Does it come from GCE ?
Here is the python code:
for i in range(0,100):
while True:
try:
response = requests.post('http://%s:12345/didi.json' % ip_of_instance, data=data)
except requests.exceptions.RequestException as err:
print ("Error",err)
time.sleep(2)
continue
break
Edit - here are the parameters of the post request:
data = {
'url': 'www.website.com',
'project': 'webCrawl',
'spider': 'indexer',
'setting': 'ELASTICSEARCH_SERVERS=92.xx.xx.xx',
'protocol': 'https',
'scraper': 'light'
}
What I see is that you are using a while true loop, when it exceeds maximum retrys you get an error because you are being banned by the server but this status does not long forever, and when the banning is removed you start to get more data because the while still running.
If my theory is not right you can take a look at this other thread.