clojurehiccup

Is there a parser for html to hiccup structures?


I'm looking for a function that reverses clojure hiccup

so

   <html></html>

turns into

[:html]

etc.


Following up from the answer by @kotarak, This now works for me:

(use 'net.cgrand.enlive-html)
(import 'java.io.StringReader)

(defn enlive->hiccup
   [el]
   (if-not (string? el)
     (->> (map enlive->hiccup (:content el))
       (concat [(:tag el) (:attrs el)])
       (keep identity)
       vec)
     el))

(defn html->enlive 
  [html]
  (first (html-resource (StringReader. html))))

(defn html->hiccup [html]
  (-> html
      html->enlive
      enlive->hiccup))

=> (html->hiccup "<html><body id='foo'>hello</body></html>")
[:html [:body {:id "foo"} "hello"]]

Solution

  • You could html-resource from enlive to get a structure like this:

    {:tag :html :attrs {} :content []}
    

    Then traverse this and turn it into a hiccup structure.

    (defn html->hiccup
       [html]
       (if-not (string? html)
         (->> (map html->hiccup (:content html))
           (concat [(:tag html) (:attrs html)])
           (keep identity)
           vec)
         html))
    

    Here a usage example:

    user=>  (html->hiccup {:tag     :p
                           :content ["Hello" {:tag     :a
                                              :attrs   {:href "/foo"}
                                              :content ["World"]}
                                     "!"]})
    [:p "Hello" [:a {:href "/foo"} "World"] "!"]