I have this data below in a .csv file:
Needed_values,TEMP,Desc
,022.3,NewYork
3,022.30,India
,027.0,Australia
,027.00,Russia
1,027.1,Austria
,027.10,Norway
,036.2,Hungary
,036.20,Lithunia
2,785.52,Nigeria
I saw in one of the StackOverflow question the way to remove header using FILTER
. So,
When I load this file in my pig script and use Filter
to remove the header of my csv then all the NULL
values under Needed_values
also got removed!
LOAD_DATA = LOAD 'DATA.csv' Using PigStorage(',') as
(
NEEDED_VALUES:chararray,
TEMP:chararray,
DESC:chararray
);
FILTER_HEADER = FILTER LOAD_DATA BY NEEDED_VALUES != 'Needed_values';
ACTUAL OUTPUT:
(3,022.30,India)
(1,027.1,Austria)
(1,027.1,Austria)
I'm expecting the output to include everything except the headers- Needed_values,TEMP,Desc:
,022.3,NewYork
3,022.30,India
,027.0,Australia
,027.00,Russia
1,027.1,Austria
,027.10,Norway
,036.2,Hungary
,036.20,Lithunia
2,785.52,Nigeria
The null values will not pass the filter condition. Change the filter to:
FILTER_HEADER = FILTER LOAD_DATA BY NEEDED_VALUES != 'Needed_values' OR NEEDED_VALUES IS NULL;