I have a directory of Json files that I want to process using cascalog. The solution I have right now requires me to remove all newline characters from my json files using a bash script. I am looking a better solution because I sync these files using rsync.
My question is can I read the contents of a file in Cascalog and return the contents of the file as one tuple. At present the function 'lfs-textline' returns a sequence of tuples for each line in the file, hence why I have to remove the newline characters. Preferably I want to return a sequence of tuples for each file.
(defn textline-parsed [dir]
(let [source (lfs-textline dir)]
(<- [?line]
(source ?line))))
Use hfs-wholefile from cascalog.more-taps to do this.
(:require [cascalog.more-taps :as taps])
(defn- byte-writable-to-str [bw]
"convert byte writable to stirng"
[(apply str (map char (. bw (getBytes))))])
And, use
(??<- [?str]
((taps/hfs-wholefile path) ?filename ?file-content)
(byte-writable-to-str ?file-content :> ?str)