scalahadoopmapreducecascadingscalding

Write to multiple outputs by key Scalding Hadoop, one MapReduce Job


How can you write to multiple outputs dependent on the key using Scalding(/cascading) in a single Map Reduce Job. I could of course use .filter for all the possible keys, but that is a horrible hack, which will fire up many jobs.


Solution

  • There is TemplatedTsv in Scalding (from version 0.9.0rc16 and up), exactly same as Cascading TemplateTsv.

    Tsv(args("input"), ('COUNTRY, 'GDP))
    .read
    .write(TemplatedTsv(args("output"), "%s", 'COUNTRY))
    // it will create a directory for each country under "output" path in Hadoop mode.