regexclojureclojure-contrib

Multiple regular expressions in Clojure


What is the optimal way to couple multiple regular expressions within a Clojure function? I believe the function would start out as such:

(defn foo [x]
(re-seq #"some means to combine multiple regex")

but am not clear if this is will work, or the efficiency of such a function. To provide an example of possible regex coupling, one might consider a function which searched for both domain names and IP. For domain names I'd use a regex as such:

(re-seq #"\b([a-zA-Z0-9]([a-zA-Z0-9\-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,6}\b" x)

and for IP:

(re-seq #"\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b")

Solution

  • Regexs already allow for alternation with the | operator.

    user=> (re-seq #"\d+" "123 foo 345 bar")
    ("123" "345")
    user=> (re-seq #"[a-zA-Z]+" "123 foo 345 bar")
    ("foo" "bar")
    user=> (re-seq #"\d+|[a-zA-Z]+" "123 foo 345 bar")
    ("123" "foo" "345" "bar")
    

    You can programatically union the regex patterns if desired by interposing the | operator.

    (defn union-re-patterns [& patterns] 
        (re-pattern (apply str (interpose "|" (map #(str "(?:" % ")") patterns)))))
    
    user=> (union-re-patterns #"\d+" #"[a-zA-Z]+")
    #"(\d+)|([a-zA-Z]+)"
    user=> (map first (re-seq (union-re-patterns #"\d+" #"[a-zA-Z]+") "123 foo 345 bar"))
    ("123" "foo" "345" "bar")