I am loading a file to PigStorage. The file has a column Newvalue
, a free text column which includes commas in it. When I specify comma as delimiter this gives me problem. I am using following code.
inpt = load '/home/cd36630/CRM/1monthSample.txt' USING PigStorage(',')
AS (BusCom:chararray,Operation:chararray,OperationDate:chararray,
ISA:chararray,User:chararray,Field:chararray,Oldvalue:chararray,
Newvalue:chararray,RecordId:chararray);
Any help is appreciated.
If the input is in csv form then you can use CSVLoader
to load it. This may fix your issue.
If this doesn't work then you can load into a single chararray and then write a UDF to split the total line in a way that respects the spaces in Newvalue
. EG:
register 'myudfs.py' using jython as myudfs ;
A = LOAD '/home/cd36630/CRM/1monthSample.txt' AS (total:chararray) ;
B = FOREACH A GENERATE myudf.prepare_input(total) ;