pythonscrapyimperva

Bypassing Imperva bot detection with Scrapy. Any way possible?


I'm trying to scrape several links containing information about events. I am rotating my paid proxies and user agents generated by UserAgent library. Imperva, which requires a US IP, is so sensitive that even it doesn't allow my browser event if I use a free US proxy!

I asked this question in a scrapiping-releated Discord channel. Someone contacted me and said it is possible to bypass Imperva but he can't tell me how because he doesn't wan't me as a competitor in the ticket scraping market :(

In addition to user agents and proxies, I tried to imitate the browser's succesful request headers but it didn't work. I just have 405s and 403s. I will try to scrape the event section but I couldn't even see a 200 response for any of the 27 links I have ( I added some below)

How do you think Imperva could be bypassed with Scrapy or Requests? It's also okay to recommend me an academic resource which I can study to develope my Scrapy skills.


Some of the links I'm trying to scrape

https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=

My spider code which is comprised of a class to import my proxies from file and the spider code proper. I add my proxy as a meta value as said in Scrapy documentation. I use download delays:

import scrapy
from scrapy import Request
from random_user_agent.user_agent import UserAgent
import random
import pandas as pd

    class ProxyFunctions:
        (...)
    
    class AlexSpider(scrapy.Spider):
    
        name = 'alex'
        s = ProxyFunctions()
        s.prox_list_fixer()  #proxylerin bulunduğu txt'yi düzelip yeni bir txt oluşturdu.
        proxies = s.imp_proxies()
        def __init__(self):
            self.root = "https://partnercarrier.com"
            self.start_url = "https://partnercarrier.com/PA/"
            #self.initial_links = self.imp_links() dosyadan tüm linkler eklendiğinde kullanılacak
            user_agent_rotator = UserAgent(software_names=['chrome'], operating_systems=['windows', 'linux'])
            self.user_agents = user_agent_rotator.get_user_agents()
            #self.root_link = "https://www.google.com"
            self.UA_rand = random.choice(self.user_agents)['user_agent'] #User Agent set
            #self.UA_LIST = open("/home/draco/docs/scraping/scrapyyy/thomas/USER_AGENTS.txt","r") #manual UA importation from  text
        #dosyadaki proxy listesinden random proxy alır
        def imp_randp(self, path="/home/draco/docs/scraping/scrapyyy/thomas/proxies.txt"):
            with open (path) as PROXIES:
                lines = PROXIES.readlines()
                return random.choice(lines).strip()
        #dosyadan linkleri alır
        
        def imp_links(self, path="/home/draco/docs/scraping/Selenium/inputs.csv"):
                x = pd.read_csv(path)
                links = x['Url']
                links = [i for i in links]
                return links
        def start_requests(self):
            print("INITIAL REQUEST")
            links = self.imp_links()
    
            for link in links:
                print(f"---INFO: Requesting page=> {link}")
                proxy = random.choice(self.proxies)
                #print("---INFO: Using proxy => ", proxy)
                h = {
                    'User-Agent': random.choice(self.user_agents)['user_agent'],
                    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9', 
                    'Accept-Encoding': 'gzip, deflate, br', 
                    'Accept-Language': 'tr-TR,tr;q=0.9,en-US;q=0.8,en;q=0.7',
                    'Cache-Control': 'max-age=0',
                    'Connection': 'keep-alive',
                    'Host' : link.split("/")[2],
                    'Sec-Fetch-Dest': 'document',
                    'Upgrade-Insecure-Requests': '1',
                    'Sec-Fetch-Mode': 'navigate',
                    'sec-ch-ua-platform': '"Linux"',
                    'sec-ch-ua' : '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
                    
                    }
                b = 'groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode='
                yield Request(
                                    url = link ,
                                    callback = self.parse_gen,
                                    headers = {"user-agent": random.choice(self.user_agents)['user_agent']},
                                    meta = {"proxy": proxy},
                                    body = b,
                                    dont_filter= True  
                                    )
        def parse_gen(self, response):
            print("---INFO: General parser opened. PARSER1")

My terminal Output:

draco@draco:~/docs/scraping/scrapyyy/upwork$ scrapy crawl alex
https://umasstix.evenue.net
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: upwork)
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.10 (default, Nov 26 2021, 20:14:08) - [GCC 9.3.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m  14 Dec 2021), cryptography 36.0.1, Platform Linux-5.13.0-35-generic-x86_64-with-glibc2.29
2022-03-20 20:23:01 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-03-20 20:23:01 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
 'BOT_NAME': 'upwork',
 'CONCURRENT_REQUESTS_PER_DOMAIN': 14,
 'HTTPCACHE_ENABLED': True,
 'NEWSPIDER_MODULE': 'upwork.spiders',
 'SPIDER_MODULES': ['upwork.spiders']}
2022-03-20 20:23:01 [scrapy.extensions.telnet] INFO: Telnet Password: 7f185fdb1347847f
2022-03-20 20:23:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.throttle.AutoThrottle']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats',
 'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-20 20:23:05 [scrapy.core.engine] INFO: Spider opened
2022-03-20 20:23:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-20 20:23:05 [scrapy.extensions.httpcache] DEBUG: Using filesystem cache storage in /home/draco/docs/scraping/scrapyyy/upwork/.scrapy/httpcache
2022-03-20 20:23:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
INITIAL REQUEST
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
---INFO: General parser opened. PARSER1
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Closing spider (finished)
2022-03-20 20:23:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 15189,
 'downloader/request_count': 27,
 'downloader/request_method_count/GET': 27,
 'downloader/response_bytes': 304575,
 'downloader/response_count': 27,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/403': 16,
 'downloader/response_status_count/405': 10,
 'elapsed_time_seconds': 0.444587,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2022, 3, 20, 17, 23, 6, 67887),
 'httpcache/hit': 27,
 'httperror/response_ignored_count': 26,
 'httperror/response_ignored_status_count/403': 16,
 'httperror/response_ignored_status_count/405': 10,
 'log_count/DEBUG': 28,
 'log_count/INFO': 36,
 'memusage/max': 126562304,
 'memusage/startup': 126562304,
 'response_received_count': 27,
 'scheduler/dequeued': 27,
 'scheduler/dequeued/memory': 27,
 'scheduler/enqueued': 27,
 'scheduler/enqueued/memory': 27,
 'start_time': datetime.datetime(2022, 3, 20, 17, 23, 5, 623300)}
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Spider closed (finished)

Solution

  • i bypass imperva using real chrome browser using browser extension to automate the process and usa mobile proxy. imperva checks followings,

    1. ip address (most important)
    2. screen resolution, window sizing parameters, document sizing parameters (important)
    3. useragent (less important)