hadoopapache-pigudfdatabags

"Flattening" a databag in Pig


Suppose I have a bunch of databags generated from a Pig UDF that holds several tuples of Strings. How can I pull all of them out of the databags and simple make each String its own "row" of data.

databags = FOREACH data GENERATE pigUdfThatMakesDataBags(data::someText); strings = FOREACH databags { ??? };


Solution

  • databags = FOREACH data GENERATE pigUdfThatMakesDataBags(data::someText);
    datatuples = FOREACH databags FLATTEN($0);      -- Bag to Tuples 
    strings = FOREACH datatuples FLATTEN(TOBAG(*)); -- Tuples to Tokens'
    DUMP strings;