Sqoop import is resulting in duplicate/partial records when we are using the following setting
--query
- Custom Query--split-by
- Non-integer column (char)--num-mappers
- More than 2Verified the source data count say 1000 records
Verified the import data count say 1923 records
When using the split-by
and field is non integer .
Sqoop uses TextSplitter which provides a warning as follows :
WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records
WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
--split-by
on the rank field --split-by
field in ascending order in the query