clojurecascalog

Unable to resolve symbol in a predicate in Cascalog


I have this query:

(?<- (hfs-textline data-out :sinkmode :replace)
        [?item1 ?item2]
        ((hfs-textline data-in) ?line)
        (data-line? ?line)
        (filter-out-data (#(vector (s/split % #",")) ?line) :> ?item1 ?item2)
        )

(defn data-line? [^String row]
  (and (not= -1 (.indexOf row ","))
       (not (.endsWith row ","))
       (not (.startsWith row ","))))

(defn filter-out-data [data]
  (<- [?item1 ?item2]
      (data :#> 9 {4 ?item1
                  8 ?item2})))

The query reads CSV file line by line and checks for lines that meet valid data conditions (data-line?) - this part works. Then it is supposed to split the line by commas, and pass the vector to filter-out-data function, which in turn returns two items extracted from that vector. When I execute the query I get the following error: Unable to resolve symbol: ?line in this context.

I have been trying out different ways of passing the result of split (I would like it to be flexible as the split will differ in size). I am just starting with Clojure and Cascalog and I will be grateful if you could point me in the right direction. Thanks!


Solution

  • The function filter-out-data generates a subquery but you are trying to use it as a predicate and that is not going to work.

    I recommend you to move all the logic in the expression (#(vector (s/split % #",")) ?line) to a regular function that you can still call fill-out-data.

    (defn filter-out-data [data]
      (let [[_ _ _ item1 _ _ _ item2] (s/split % #"," data))]
        [item1 item2]))
    
    (?<- (hfs-textline data-out :sinkmode :replace)
        [?item1 ?item2]
        ((hfs-textline data-in) ?line)
        (data-line? ?line)
        (filter-out-data ?line :> ?item1 ?item2))
    

    However, you can simplify even more the code by using a CSV library like data.csv.