scalagzipscalding

uncompress and read gzip file in scala


In Scala, how does one uncompress the text contained in file.gz so that it can be processed? I would be happy with either having the contents of the file stored in a variable, or saving it as a local file so that it can be read in by the program after.

Specifically, I am using Scalding to process compressed log data, but Scalding does not define a way to read them in FileSource.scala.


Solution

  • Here's my version:

    import java.io.BufferedReader
    import java.io.InputStreamReader
    import java.util.zip.GZIPInputStream
    import java.io.FileInputStream
    
    class BufferedReaderIterator(reader: BufferedReader) extends Iterator[String] {
      override def hasNext() = reader.ready
      override def next() = reader.readLine()
    }
    
    object GzFileIterator {
      def apply(file: java.io.File, encoding: String) = {
        new BufferedReaderIterator(
          new BufferedReader(
            new InputStreamReader(
              new GZIPInputStream(
                new FileInputStream(file)), encoding)))
      }
    }
    

    Then do:

    val iterator = GzFileIterator(new java.io.File("test.txt.gz"), "UTF-8")
    iterator.foreach(println)