syntaxapache-pigdelimiter

Issue with Comma as a Delimiter in Latin Pig for free text column


I am loading a file to PigStorage. The file has a column Newvalue, a free text column which includes commas in it. When I specify comma as delimiter this gives me problem. I am using following code.

inpt = load '/home/cd36630/CRM/1monthSample.txt' USING PigStorage(',') 
            AS (BusCom:chararray,Operation:chararray,OperationDate:chararray,
                ISA:chararray,User:chararray,Field:chararray,Oldvalue:chararray,
                Newvalue:chararray,RecordId:chararray);

Any help is appreciated.


Solution

  • If the input is in csv form then you can use CSVLoader to load it. This may fix your issue.

    If this doesn't work then you can load into a single chararray and then write a UDF to split the total line in a way that respects the spaces in Newvalue. EG:

    register 'myudfs.py' using jython as myudfs ;
    A = LOAD '/home/cd36630/CRM/1monthSample.txt' AS (total:chararray) ;
    B = FOREACH A GENERATE myudf.prepare_input(total) ;