I have this set of data:
dump data;
This is a sample output: (this dataset is almost a million rows long).
("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")
I want to replace the values: Sat, Fri, Mon to numbers of week, I know how to use REPLACE for change just 1 value at a time, but I have to repeat it multiple times in order to change all days of the week:
data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');
Is there any way to do this in only one statement?
saph_top, was the closer one to answer my question, nonetheless after testing it was resolving blank output, I'm going to complement his answer:
'Mon' is not the same as "Mon", therefore when I was using: CASE WHEN $4 == 'Mon' THEN '1' It wasn´t replacing anything, resulting in blank result in: data_day_of_week.
To solve this, I just add " " (double quotes to the condition):
data_day_of_week = FOREACH data GENERATE
CASE
WHEN $4 == '"Mon"' THEN '1'
WHEN $4 == '"Tue"' THEN '2'
...
WHEN $4 == '"Sun"' THEN '7'
END AS day_number;
After that in order to rebuild the data I add the following to GENERATE Clause:
data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,
CASE
WHEN $4 == '"Mon"' THEN '1'
WHEN $4 == '"Tue"' THEN '2'
...
WHEN $4 == '"Sun"' THEN '7'
END AS day_number;
And the output was complete now: dump data_day_of_week;
("0",60,0,1,5)
("1",47,0,1,1)
("1",23,1,0,2)
("1",60,0,0,6)
("1",50,1,1,5)