I am trying to get links from HTML in Clojure with Enlive. Can I get a list of all links from a page? Can I iterate over them?
from bs4 import BeautifulSoup
soup = BeautifulSoup(html_doc, 'html.parser')
print(soup.prettify())
# <html>
# <head>
# <title>
# The Dormouse's story
# </title>
# </head>
# <body>
# <p class="title">
# <b>
# The Dormouse's story
# </b>
# </p>
# <p class="story">
# Once upon a time there were three little sisters; and their names were
# <a class="sister" href="http://example.com/elsie" id="link1">
# Elsie
# </a>
# ,
# <a class="sister" href="http://example.com/lacie" id="link2">
# Lacie
# </a>
# and
# <a class="sister" href="http://example.com/tillie" id="link2">
# Tillie
# </a>
# ; and they lived at the bottom of a well.
# </p>
# <p class="story">
# ...
# </p>
# </body>
# </html>
links = soup.find_all('a')
or
links = soup('a')
How can I do this in Clojure with Enlive?
that would be very simple:
(require '[net.cgrand.enlive-html :as enlive])
(let [data (enlive/html-resource (java.net.URL. "https://www.stackoverflow.com"))
all-refs (enlive/select data [:a])]
(first all-refs))
;;=> {:tag :a, :attrs {:href "https://stackoverflow.com", :class "-logo js-gps-track", :data-gps-track "top_nav.click({is_current:true, location:1, destination:8})"}, :content ("\n " {:tag :span, :attrs {:class "-img"}, :content ("Stack Overflow")} "\n ")}
the all-refs
collection would contain all the links from page in enlive representation form.
(let [data (enlive/html-resource (java.net.URL. "https://www.stackoverflow.com"))
all-refs (enlive/select data [:a])]
(map #(-> % :attrs :href) all-refs))
would for example collect all the href
values from links