I am grouping by a custom type in my scalding job:
typedPipe
.map(someMapper)
.groupBy(_.nonPrimitiveField)
.sum
.write(sink)
In my output, the keys show up as the toString
output, which is not useful. How can I make scalding use a custom serializer for these keys?
My current workaround is to call toTypedPipe
and explicitly call my serialization function in the mappers, but this seems wasteful.
The sink is a TypedTsv[(Key, Value)]
, where Key
is the type of the field that I would like to serialize differently.
Well, Tsv
is a text format, so, in the end of the day, everything becomes a string.
The simplest way would be to just override .toString
on your Key
type, or wrap it into another object with .toString
overridden. Or, just replace it with a String
as a final step (I think, that's what you are already doing anyway). I am not sure what you mean when you say it is "wasteful". It does not add an extra step to the flow if that's your concern, and the conversion to string would have to happen in any case, so that cost is fixed.
typedPipe.
.map(someMapper)
.groupBy(x => beautifulString(x.nonPrimitiveField))
.sum
.write(sink)