bigdataapache-pig

Why am I not getting the NULL values when using FILTER to remove CSV Headers in PIG?


I have this data below in a .csv file:

Needed_values,TEMP,Desc
,022.3,NewYork
3,022.30,India
,027.0,Australia
,027.00,Russia
1,027.1,Austria
,027.10,Norway
,036.2,Hungary
,036.20,Lithunia
2,785.52,Nigeria

I saw in one of the StackOverflow question the way to remove header using FILTER. So, When I load this file in my pig script and use Filter to remove the header of my csv then all the NULL values under Needed_values also got removed!

LOAD_DATA = LOAD 'DATA.csv' Using PigStorage(',') as
(
NEEDED_VALUES:chararray,
TEMP:chararray,
DESC:chararray
);

FILTER_HEADER = FILTER LOAD_DATA BY NEEDED_VALUES != 'Needed_values';

ACTUAL OUTPUT:
(3,022.30,India)
(1,027.1,Austria)
(1,027.1,Austria)

I'm expecting the output to include everything except the headers- Needed_values,TEMP,Desc:

,022.3,NewYork
3,022.30,India
,027.0,Australia
,027.00,Russia
1,027.1,Austria
,027.10,Norway
,036.2,Hungary
,036.20,Lithunia
2,785.52,Nigeria


Solution

  • The null values will not pass the filter condition. Change the filter to:

    FILTER_HEADER = FILTER LOAD_DATA BY NEEDED_VALUES != 'Needed_values' OR NEEDED_VALUES IS NULL;