I want to export hive query result to single local file with pipe delimiter.
Hive query contains order by clause.
I have tried below solutions.
Solution1:
hive -e 'insert overwrite local directory '/problem1/solution' fields terminated by '|' select * from table_name order by rec_date'
This solution is creating multiple files. After merging files, it loosing data order.
Solution2:
beeline -u 'jdbc:hive2://server_ip:10000/db_name' --silent --outputformat=dsv --delimiterForDSV='|' -e 'select * from table_name order by rec_date' > /problem1/solution
This solution is creating single file but it has empty 2 lines at top and 2 lines at bottom.
I am removing empty lines using sed command. It takes very long time.
Is there any other efficient way to achieve this?
Try these settings for executing ORDER BY on single reducer:
set hive.optimize.sampling.orderby=false; --disable parallel ORDER BY
Or try to set the number of reducers manually:
set mapred.reduce.tasks=1;