pythonsql-serverweb-scrapinggrequests

Python SQL not inserting data for the whole list of URLs


I am using the grequest module to use multiple URLs. However, my issue here is that I'm only getting data inserted into my database from 1 of the URLs and not each one in the List.

Question:

How can I solve this?

import urllib.request as urllib
import socket
import pyodbc
from datetime import datetime
import ssl
import OpenSSL
import requests
import grequests
#
# Timestamp for undersøgelse
timestamp = datetime.now().strftime('%d-%m-%Y %H:%M:%S')

List = open("C:\\Users\\Farzad\\Desktop\\hosts.txt").read().splitlines()

rs = (grequests.get(url) for url in List)

requests = grequests.map(rs)

for response in requests:
    print(response.status_code)

#SQL Connection til local database
con = pyodbc.connect('Driver={SQL Server Native Client 11.0};'
                      'Server=DESKTOP-THV2IDL;'
                      'Database=host;'
                      'Trusted_Connection=yes;')

cursor = con.cursor()

#Insert responseheader data/sikkerhedskontrollers der bliver brugt
cursor.execute('INSERT INTO host.dbo.domain (ip, Host, HSTS, HPKP, XContentTypeOptions, XFrameOptions, ContentSecurityPolicy, XXssProtection, Server, Timestamp) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)', 
        (response.url, 1, response.headers.get('Strict-Transport-Security'), response.headers.get('Public-Key-Pins'), response.headers.get('X-Content-Type-Options'), response.headers.get('X-Frame-Options'), response.headers.get('Content-Security-Policy'), response.headers.get('X-XSS-Protection'), response.headers.get('Server'), timestamp))

con.commit()

Solution

  • In the following lines you are iterating over responses:

    for response in requests:
       print(response.status_code)
    

    Then here you execute your SQL code:

    cursor.execute('INSERT INTO host.dbo.domain (ip, Host, HSTS, HPKP, XContentTypeOptions, XFrameOptions, ContentSecurityPolicy, XXssProtection, Server, Timestamp) VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)', 
        (response.url, 1, response.headers.get('Strict-Transport-Security'), response.headers.get('Public-Key-Pins'), response.headers.get('X-Content-Type-Options'), response.headers.get('X-Frame-Options'), response.headers.get('Content-Security-Policy'), response.headers.get('X-XSS-Protection'), response.headers.get('Server'), timestamp))
    

    You use variable response and it's attributes/methods to pass to your SQL code. But this response variable is just the last request's result (last one written when you were iterating earlier).

    If you want to do something with your database for each response then you should execute your SQL code in for loop.