This was a question asked for me in my hadoop interview . I have the table data like below.
I have taken a new bike and on the 1st day the distance I have travelled 20 km 2nd day the reading on the meter was 50(day 1 + day 2) 3rd day the reading on the meter was 60(day 1+ day 2+ day 3)
Day Distance
1 20
2 50
3 60
Now the question is , I want the output to be like below
Day Distance
1 20
2 30
3 10
i.e I want the distance travelled only on the 1st day, 2nd day and 3rd day.
Answer can be in Hive/Pig/MapReduce.
Thanks
This is a running totals like problem, you can resolve it by this Hive query
with b as (
select 0 as d, 0 as dst
union all
select d, dst from mytable
)
SELECT a.d, a.km-b.km new_dst from mytable a, b
where a.d-b.d==1