mockingcascadingscalding

How to mock a TextLine for Scalding using the type safe API?


I am trying to mock a TextLine for a Scalding job, but the offset appears to be getting mixed in with the line, whether I express the offset explicitly or implicitly.

Here is my job:

package changed

import com.twitter.scalding._
import com.twitter.scalding.typed.TDsl._

class MyJob(args: Args) extends Job(args) { 
  val mySource = TextLine(args("input"))
  val myPipe : TypedPipe[String] = mySource
    .read
    .debug
    .toTypedPipe[String]('line)
    .debug
    .write(TypedTsv[String](args("output")))
}

Note that I log the tuples before and after converting to the type safe API.

Here is my test:

package changed

import com.twitter.scalding.{JobTest, TextLine, TypedTsv, TupleConversions}
import org.scalatest.FunSpec

class MyTest extends FunSpec with TupleConversions {
  val Input = "/tmp/testInput"
  val Output = "/tmp/testOutput"
  val Data1 = List((0 -> "line0", 1 -> "line1", 2 -> "line2"))
  val Data2 = List((0, "line0", 1, "line1", 2, "line2"))
  val Data3 = List(("line0", "line1", "line2"))

  JobTest("sandcrawler.MyJob")
    .arg("test", "")
    .arg("app.conf.path", "app.conf")
    .arg("output", Output)
    .arg("input", Input)
    .arg("debug", "true")
    .source(TextLine(Input), Data1)
    .sink[String](TypedTsv[String](Output)) {
      outputBuffer =>
      it("should return a 3-element list.") {
        assert(outputBuffer.size === 3)
      }
    }
    .run
    .finish
}

If I get the input from the constant List Data1, as shown above, the tuples output by the two calls to debug are (respectively):

['(0,line0)', '(1,line1)']
['(1,line1)']

If I get the input from Data2, the debug outputs are:

['0', 'line0']
['line0']

If I get the input from Data3, the debug outputs are:

['line0', 'line1']
['line1']

All runs fail the test with the same error message:

[info] MyTest:
[info] - should return a 3-element list. *** FAILED ***
[info]   1 did not equal 3 (MyTest.scala:23)

In other words, only a single tuple is written.

How should I represent/access my mock data?


Solution

  • In this specific case, the problem is the extra set of parenthesis around Data1. If you instead wrote:

    val Data1 = List(0 -> "line0", 1 -> "line1", 2 -> "line2")
    

    you should get the expected output:

    ['0', 'line0']
    ['line0']
    ['1', 'line1']
    ['line1']
    ['2', 'line2']
    ['line2']