sqoopduplicate-data

Partial and duplicate records while sqoop import


Sqoop import is resulting in duplicate/partial records when we are using the following setting

Verified the source data count say 1000 records

Verified the import data count say 1923 records


Solution

  • When using the split-by and field is non integer .

    Sqoop uses TextSplitter which provides a warning as follows :

    WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records
    
    WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.