pythonweb-scrapingtreelxmllxml.html

Compare string result from path & requests


I am scraping the HTML code from the URL defined, mainly focussing on the tag, to extract the results of it. Then, compare if string "example" exists in the script, if yes, print something and flag =1.

I am not able to compare the results extracted from the HTML.fromstring

Able to scrape the HTML content and view the full successfully, wanted to proceed further but not able to (compare strings)

import requests
from lxml import html

page = requests.get("http://econpy.pythonanywhere.com/ex/001.html")
tree = html.fromstring(page.text) #was page.content

# To get all the content in <script> of the webpage
scripts = tree.xpath('//script/text()')

# To get line of script that contains the string "location" (text)
keyword = tree.xpath('//script/text()[contains(., "location")]')

# To get the element ID of the script that contains the string "location"
keywordElement = tree.xpath('//script[contains(., "location")]')

print('\n<SCRIPT> is :\n', scripts)

# To print the Element ID
print('\n\KEYWORD script is discovered @ ',keywordElement)

# To print the line of script that contain "location" in text form
print('Supporting lines... \n\n',keyword)

# ******************************************************
# code below is where the string comparison comes in
# to compare the "keyword" and display output to user
# ******************************************************

string = "location"

if string in keyword:
    print('\nDANGER: Keyword detected in URL entered')
    Flag = "Detected" # For DB usage
else:
    print('\nSAFE: Keyword does not exist in URL entered')
    Flag = "Safe" # For DB usage


# END OF PROGRAM

Actual result: able to retrieve all the necessary information including its element and content

Expected result: To print the DANGER / SAFE word to user and define the variable "Flag" which will then stored into database.


Solution

  • keyword is a list.

    You need to index the list to get the string after which you will be able to search for the specific string

    "location" in keyword[0] #gives True