shellhadoopcopyhdfsdistcp

Copy files from a hdfs folder to another hdfs location by filtering with modified date using shell script


I have 1 year data in my hdfs location and i want to copy data for last 6 months into another hdfs location. Is it possible to copy data only for 6 months directly from hdfs command or do we need to write shell script for copying data for last 6 months?

I have tried hdfs commands for performing this, but didn't work.

I tried with the below shell script and it was working fine till creating TempFile but throwing an error

$ sh scriptnew.sh
scriptnew.sh: line 8: syntax error: unexpected end of file

and script is not executed further.

Below is the shell script which i used.

#!/bin/bash
hdfs dfs -ls /hive/warehouse/data.db/all_history/ |awk 'BEGIN{ SIXMON=60*60*24*180; "date +%s" | getline NOW } { cmd="date -d'\''"$6" "$7"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-SIXMON; if(WHEN > DIFF){print $8}}' >> TempFile.txt
cat TempFile.txt |while read line
do
    echo $i
    hdfs dfs -cp -p $line /user/can_anns/all_history_copy/;
done

What might be the error and how to resolve this ?


Solution

  • For copying 6 months files from a hdfs location to another we can use the below script.

    script should be run from your local linux location.

    #!/bin/bash
    hdfs dfs -ls /hive/warehouse/data.db/all_history/ |awk 'BEGIN{ SIXMON=60*60*24*180; "date +%s" | getline NOW } { cmd="date -d'\''"$6" "$7"'\'' +%s"; cmd | getline WHEN; DIFF=NOW-SIXMON; if(WHEN > DIFF){print $8}}' >> TempFile.txt
    cat TempFile.txt |while read line
    do
       echo $i
       hdfs dfs -cp -p $line /user/can_anns/all_history_copy/;
    done
    

    Line 2 : We are copying list of files which are of max 180 days to a TempFile. Then we iterate through this Temp file and if match is found then copy the file.

    If you are writing the script from windows and copying to linux machine, sometimes it may not work showing syntax error. For avoiding the carriage return error, after copying the script to linux machine local path run the below command. sed -i 's/\r//' Then run the script >>> sh FileName.sh